More

clusterhacks · 2025-12-12T18:31:07 1765564267

Wow, thanks for the link to Texerau. I had no idea a pdf was floating around and have wanted this book for some time. You video looks interesting, especially the part around Ronchi and Focault testing. I have 'Understanding Focault' but have to admit that reading it doesn't give me confidence.

One question I always think about is how much time and effort a "one-time" mirror maker should plan on making to exceed the quality of a generic 8" or 10" F/5-F/7 available from the Chinese mirror makers.

Zambuto seems to imply that whatever magic happens for his mirrors might be in very long, machine driven polishing to smooth out the final surface imperfections that cause scatter. With his retirement and with few mirror makers in the US, it seems like options for buying "high end" mirrors in the 6"- 10" size are very limited. I have been debating an 8" F/7 and would love to just purchase a relatively high quality mirror, but most of the mirror makers seem more taken with significantly larger mirrors.

clusterhacks · 2025-12-12T16:28:28 1765556908

Watch your local craigslist or facebook marketplace. With a little patience, you will probably find a good 8" or 10" dobsonian at a great price. I picked up a lovely 8" dob for less than $200. Most of the generic 8" F/6 dobsonians seem pretty decent.

Or check your local library. It may have a smaller Starblast table-top dobsonian you can check out - I did that when traveling once.

Whatever you do, do NOT buy a small cheap refractor on some flimsy mount. They are mostly awful.

clusterhacks · 2025-12-09T20:02:32 1765310552

You know, I haven't even been thinking about those AMD gpus for local llms and it is clearly a blind spot for me.

How is it? I'd guess a bunch of the MoE models actually run well?

stusmall · 2025-12-09T22:42:03 1765320123

I've been running local models on an AMD 7800 XT with ollama-rocm. I've had zero technical issues. It's really just the usefulness of a model with only 16GB vram + 64GB of main RAM is questionable, but that isn't an AMD specific issue. It was a similar experience running locally with an nvidia card.

clusterhacks · 2025-12-09T19:57:58 1765310278

All those choices seem to have very different trade-offs? I hate $5,000 as a budget - not enough to launch you into higher-VRAM RTX Pro cards, too much (for me personally) to just spend on a "learning/experimental" system.

I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system. I mean, if I was doing some more HPC/numerical programming (say, similarity search on GPUs :-) ), I could see just taking the hit and spending $15,000 on a workstation with an RTX Pro 6000.

For grins:

Max t/s for this and smaller models? RTX 5090 system. Barely squeezing in for $5,000 today and given ram prices, maybe not actually possible tomorrow.

Max CUDA compatibility, slower t/s? DGX Spark.

Ok with slower t/s, don't care so much about CUDA, and want to run larger models? Strix Halo system with 128gb unified memory, order a framework desktop.

Prefer Macs, might run larger models? M3 Ultra with memory maxed out. Better memory bandwidth speed, mac users seem to be quite happy running locally for just messing around.

You'll probably find better answers heading off to https://www.reddit.com/r/LocalLLaMA/ for actual benchmarks.

kpw94 · 2025-12-09T21:40:02 1765316402

> I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system.

That's a good idea!

Curious about this, if you don't mind sharing:

- what's the stack ? (Do you run like llama.cpp on that rented machine?)

- what model(s) do you run there?

- what's your rough monthly cost? (Does it come up much cheaper than if you called the equivalent paid APIs)

clusterhacks · 2025-12-09T22:46:12 1765320372

I ran ollama first because it was easy, but now download source and build llama.cpp on the machine. I don't bother saving a file system between runs on the rented machine, I build llama.cpp every time I start up.

I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap.

I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons.

Juminuvi · 2025-12-10T02:36:34 1765334194

I know you say you don't use the paid apis, but renting a gpu is something I've been thinking about and I'd be really interested in knowing how this compares with paying by the token. I think gpt-oss-120b is 0.10/input 0.60/output per million tokens in azure. In my head this could go a long way but I haven't used gpt oss agentically long enough to really understand usage. Just wondering if you know/be willing to share your typical usage/token spend on that dedicated hardware?

KronisLV · 2025-12-10T10:32:47 1765362767

For comparison, here's my own usage with various cloud models for development:

  * Claude in December: 91 million tokens in, 750k out
  * Codex in December: 43 million tokens in, 351k out
  * Cerebras in December: 41 million tokens in, 301k out
  * (obviously those figures above are so far in the month only)
  * Claude in November: 196 million tokens in, 1.8 million out
  * Codex in November: 214 million tokens in, 4 million out
  * Cerebras in November: 131 million tokens in, 1.6 million out
  * Claude in October: 5 million tokens in, 79k out
  * Codex in October: 119 million tokens in, 3.1 million out

As for Cerebras in October, I don't have the data because they don't show the Qwen3 Coder model that was deprecated, but it was way more: https://blog.kronis.dev/blog/i-blew-through-24-million-token...

In general, I'd say that for the stuff I do my workloads are extremely read heavy (referencing existing code, patterns, tests, build and check script output, implementation plans, docs etc.), but it goes about like this:

  * most fixed cloud subscriptions will run out really quickly and will be insufficient (Cerebras being an exception)
  * if paying per token, you *really* want the provider to support proper caching, otherwise you'll go broke
  * if you have local hardware that is great, but it will *never* compete with the cloud models, so your best bet is to run something good enough, basically cover all of your autocomplete needs, and also with tools like KiloCode an advanced cloud model can do the planning and a simpler local model do the implementation, then the cloud model validate the output

adam_patarino · 2025-12-12T16:12:38 1765555958

This is the perfect use case for local models. It's why we set out to create cortex.build! A local LLM

clusterhacks · 2025-12-10T14:32:28 1765377148

Sorry, I don't much track or keep up with those specifics other than knowing I'm not spending much per week. My typical scenario is to spin up an instance that costs less than $2/hr for 2-4 hours. It's all just exploratory work really. Sometimes I'm running a script that is making a call to the LLM server api, other times I'm just noodling around in the web chat interface.

bigiain · 2025-12-10T01:19:21 1765329561

I don't suppose you have (or would be interested in writing) a blog post about how you set that up? Or maybe a list of links/resources/prompts you used to learn how to get there?

clusterhacks · 2025-12-10T02:22:19 1765333339

No, I don't blog. But I just followed the docs for starting an instance on lambda.ai and the llama.cpp build instructions. Both are pretty good resources. I had already setup an SSH key with lambda and the lambda OS images are linux pre-loaded with CUDA libraries on startup.

Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp.

I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai.

connected from terminal on my box at home and setup the ssh tunnel.

ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard>

  Started building llama.cpp from source, history:    
     21  git clone   https://github.com/ggml-org/llama.cpp
     22  cd llama.cpp
     23  which cmake
     24  sudo apt list | grep libcurl
     25  sudo apt-get install libcurl4-openssl-dev
     26  cmake -B build -DGGML_CUDA=ON
     27  cmake --build build --config Release

MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build

     28  cmake --build build --config Release -j 16
     29  ls
     30  ls build
     31  find . -name "llama.server"
     32  find . -name "llama"
     33  ls build/bin/
     34  cd build/bin/
     35  ls
     36  ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja

MISTAKE, didn't specify the port number for the llama-server

     37  clear;history
     38  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking -c 0 --jinja --port 11434
     39  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking.gguf -c 0 --jinja --port 11434
     40  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking-GGUF -c 0 --jinja --port 11434
     41  clear;history

I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface.

Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.

bigiain · 2025-12-10T07:17:02 1765351022

Thanks, much appreciated.

clusterhacks · 2025-12-04T22:24:22 1764887062

>whole point of the time compression is to spread the grades out

I suspect that is true for standardized tests like the SAT, ACT, or GRE.

I suspect in classroom environments that there isn't any intent at all on test timing other than most kids will be able to attempt most problems in the test time window. As far as I can tell, nobody cares much about spreading grades out at any level these days.

clusterhacks · 2025-12-04T21:06:56 1764882416

I share your paranoia.

My kids use personal computing devices for school, but their primary platform (just like their friends) is locked-down phones. Combining that usage pattern with business incentives to lock users into walled gardens, I kind of worry we are backing into the destruction of personal computing.

clusterhacks · 2025-12-04T20:37:35 1764880655

Why?

How strong is the argument that a student completing a test in 1 hour with the same score as a student who took 10 hours that the first student performed "better" or had a greater understanding of the material?

throwaway314155 · 2025-12-04T20:50:01 1764881401

> Why?

Teachers have lives, including needing to eat and sleep.

clusterhacks · 2025-12-04T21:15:50 1764882950

Sure, but that answer doesn't address the questions of the value of time limits on assessment.

What if instead we are talking about a paper or project? Why isn't time-to-complete part of the grading rubric?

Do we penalize a student who takes 10 hours on a project vs the student who took 1 hour if the rubric gives a better grade to the student who took 10 hours?

Or assume teacher time isn't a factor - put two kids in a room with no devices to take an SAT test on paper. Both kids make perfect scores. You have no information on which student took longer. How are the two test takers different?

throwaway314155 · 2025-12-05T22:02:40 1764972160

Not arguing with any of that, just stating plainly that there are practical reasons for time limits and one of the many reasons is that tests are done supervised and thus must have _some_ sort of time limit. Everything else is you projecting an argument onto me that I didn't make.

clusterhacks · 2025-12-03T15:09:39 1764774579

I started my career as a software performance engineer. We measured everything across different code implementations, multiple OS, hardware systems, and in various network configurations.

It was amazing how often people wanted to optimize stuff that wasn't a bottleneck in overall performance. Real bottlenecks were often easy to see when you measured and usually simple to fix.

But it was also tough work in the org. It was tedious, time-consuming, and involved a lot of experimental comp sci work. Plus, it was a cost center (teams had to give up some of their budget for perf engineering support) and even though we had racks and racks of gear for building and testing end-to-end systems, what most dev teams wanted from us was to give them all our scripts and measurement tools to "do it themselves" so they didn't have to give up the budget.

mikepurvis · 2025-12-03T15:27:56 1764775676

That sounds like fascinating work, but also kind of a case study in what a manager's role is to "clear the road" and handle the lion's share of that internal advocacy and politicking so that ICs don't have to deal with it.

PunchyHamster · 2025-12-03T16:30:37 1764779437

It's because patting yourself on the back for getting 5x performance increase in microbenchmark feels good and looks good on yearly review.

> But it was also tough work in the org. It was tedious, time-consuming, and involved a lot of experimental comp sci work. Plus, it was a cost center (teams had to give up some of their budget for perf engineering support) and even though we had racks and racks of gear for building and testing end-to-end systems, what most dev teams wanted from us was to give them all our scripts and measurement tools to "do it themselves" so they didn't have to give up the budget.

Misaligned budgeting and goals is bane of good engineering. I've seen some absolutely stupid stuff like outsourcing hosting a simple site to us, because client would rather hire 3rd party to buy domain and put a simple site there (some advertising), than to deal with their own security guys and host it on their own infrastructure.

"It's a cost center" "So is fucking HR, why you don't fire them ?" "Uh, I'll ignore that, pls just invoice anything you do to other teams" ... "Hey, they bought cloud solution that doesn't work/they can't figure it out, can you help them" "But we HAVE stuff doing that cheaper and easier, why they didn't come to us" "Oh they thought cloud will be cheaper and just work after 5 min setup"

loeg · 2025-12-03T16:55:34 1764780934

In an online services company, a perf team can be net profitable rather than a "cost center." The one at my work routinely finds quantifiable savings that more than justify their cost.

There will be huge mistakes occasionally, but mostly it is death by a thousand cuts -- it's easy to commit a 0.1% regression here or there, and there are hundreds of other engineers per performance engineer. Clawing back those 0.1% losses a couple times per week over a large deployed fleet is worthwhile.

clusterhacks · 2025-12-03T01:45:03 1764726303

I was playing around with Qwen3-VL to parse PDFs - meaning, do some OCR data extraction from a reasonably well-formated PDF report. Failed miserably, although I was using the 30B-A3B model instead of the larger one.

I like the Qwen models and use them for other tasks successfully. It is so interesting how LLMs will do quite well in one situation and quite badly in another.

totetsu · 2025-12-03T02:36:32 1764729392

The opus models seems pretty adept and extracting structured data from ocr https://www.ocrarena.ai/battle

clusterhacks · 2025-11-20T14:12:21 1763647941

I never looked at short videos and couldn't understand how my friends and family would open their phones given even a minute of quiet time for these things. I viewed some Youtube shorts (maybe the least effective short video provider in terms of content and the recommendation algorithm?) and was shocked at how easy it was to burn time looking at crap. The experience really opened my eyes about how a person can be pulled into endless viewing.

I think the crack house comparison is entirely appropriate. The brain is weird . . .