Hacker Newsnew | past | comments | ask | show | jobs | submit | mlyle's commentslogin

> It was a real time computer NOT designed for speed but real time operations.

More than anything, it was designed to be small and use little power.

But these little ARM Cortex M4F that we're comparing to are also designed for embedded, possibly hard-real-time operations. And dominant factors in experience on playback through earbuds are response time and jitter.

If the AGC could get a capsule to the moon doing hard real-time tasks (and spilling low priority tasks as necessary), a single STM32F405 with a Cortex M4F could do it better.

Actually, my team is going to fly a STM32F030 for minimal power management tasks-- but still hard real-time-- on a small satellite. Cortex-M0. It fits in 25 milliwatts vs 55W. We're clocked slow, but still exceed the throughput of the AGC by ~200-300x. Funnily enough, the amount of RAM is about the same as the AGC :D It's 70 cents in quantity, but we have to pay three whole dollars at quantity 1.

> NASA used a lot of supercomputers here on earth pior to mission start.

Fine, let's compare to the CDC 6600, the fastest computer of the late 60's. M4F @ 300MHz is a couple hundred single precision megaflops; CDC6600 was like 3 not-quite-double-precision megaflops. The hacky "double single precision" techniques have comparable precision-- figure that is probably about 10x slower on average, so each M4F could do about 20 CDC-6600 equivalent megaflops or is roughly 5-10x faster. The amount of RAM is about the same on this earbud.

His 486-25 -- if a DX model with the FPU -- was probably roughly twice as fast as the 6600 and probably had 4x the RAM, and used 2 orders of magnitude less power and massed 3 orders of magnitude less.

Control flow, integer math, etc, being much faster than that.

Just a few more pennies gets you a microcontroller with a double precision FPU, like a Cortex-M7F with the FPv4-SP-D16, which at 300MHz is good for maybe 60 double precision megaflops-- compared to the 6600, 20x faster and more precision.


I have thought about this a little more, and looked into things. Since NASA used the 360/91, and had a lot of 360's and 7900's... all of NASA's 60's computing couldn't quite fit into a single 486DX-25. You'd be more like 486DX2-100 era to replace everything comfortably, and you'd want a lot of RAM-- like 16MB.

It looks like NASA had 5 360/75's plus a 360/91 by the end, plus a few other computers.

The biggest 360/75's (I don't know that NASA had the highest spec model for all 5) were probably roughly 1/10th of a 486-100 plus 1 megabyte of RAM. The 360/91 that they had at the end was maybe 1/3rd of a 486-100 plus up to 6 megabytes of RAM.

Those computers alone would be about 85% of a 486-100. Everything else was comparatively small. And, of course, you need to include the benefits from getting results on individual jobs much faster, even if sustained max throughput is about the same. So all of NASA, by the late 60's, probably fits into one relatively large 486DX4-100.

Incidentally, one random bit of my family lore; my dad was an IBM man and knew a lot about 360's and OS/360. He received a call one evening from NASA during Apollo 13 asking for advice about how they could get a little bit more out of their machines. My mom was miffed about dinner being interrupted until she understood why :D


What's your project/ cubesat name?

Ps. Try msp430 f model for low power. These can be CRAZY efficient.

Ps. Don't forget to short circuit the solar panel directly to system: then your satellite might talk even 50 years from now such as some HAM satellites from cold war (Oscar 7 I think)


> What's your project/ cubesat name?

NyanSat; I'm PI and mentor for a team of high school students that were selected by NASA CSLI.

> Ps. Try msp430 f model for low power. These can be CRAZY efficient.

Yah, I've used MSP430 in space. STM32F0 fits what we're using it for. The main flight computer we designed, and it's RP2350 with MRAM. Some of the avionics details are here: https://github.com/OakwoodEngineering/ObiWanKomputer

> Ps. Don't forget to short circuit the solar panel directly to system: then your satellite might talk even 50 years from now such as some HAM satellites from cold war (Oscar 7 I think)

Current ITU guidelines make it clear this is something we're not supposed to do to ensure that we can actually end transmissions by the satellite. We'll re-enter/burn up within


This is the "little part" of what fits into an earpiece. Each of those cores is maybe 0.04 square millimeters of die on e.g. 28nm process. RAM takes some area, but that's dwarfed by the analog and power components and packaging. The marginal cost of the gates making up the processors is effectively zero.

so 1mm2 peppered by those cores at 300MHz will give you 4 Tflops. And whole 200mm wafer - 100 Petaflops, like 10 B200s, and just at less than $3K/wafer. Giving half area to memory we'll get 50 PFlops with 300Gb RAM. Power draw is like 10-20KW. So, giving these numbers i'd guess Cerebras has tremendous margin and is just printing money :)

Yes, assuming you don't need to connect anything together and that RAM is tinier than it really is, sure. At 28nm, 3megabits/square millimeter is what you get of SRAM, so an entire wafer only gets you ~12 gigabytes of memory.

And, of course, most of Cerebras' costs are NRE and the stuff like getting heat out of that wafer and power in.


Why not ddram?

Same reason why Cerebras doesn't use DRAM. The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.

Also, process that is good at making logic isn't necessarily good for making DRAM. Yes, eDRAM exists, but most designs don't put DRAM on the same die as logic and instead stack it or put it off-chip.

Almost all these microcontrollers that are single-die have flash+SRAM. Almost all microprocessor cache designs are SRAM for these reasons (with some designs using off-die L3 DRAM)-- for these reasons.


CPU cache is understandably SRAM.

>The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.

When the access patterns are well established and understood, like in the case of transformers, you can mitigate latency by prefetch (we can even have very beefed up prefetch pipeline knowing that we target transformers), while putting memory on the same chip gives you huge number of data lines thus resulting in huge bandwidth.


With embedded SRAM close, you get startling amounts of bandwidth -- Cerebras claims to attain >2 bytes/FLOP in practice -- vs H200 attaining more like 0.001-0.002 to the external DRAM. So we're talking about a 3 order of magnitude difference.

Would it be a little better with on-wafer distributed DRAM and sophisticated prefetch? Sure, but it wouldn't match SRAM, and you'd end up with a lot more interconnect and associated logic. And, of course, there's no clear path to run on a leading logic process and embed DRAM cells.

In turn, you batch for inference on H200, where Cerebras can get full performance with very small batch sizes.


Marginal cost of a small microprocessor in an ASIC is nothing.

The RAM costs a little bit, but if you want to firmware update in a friendly way, etc, you need some RAM to stage the updates.


The pay-per-use API sucks. If you end up on the $50/mo plan, it's better, with caveats:

1 million tokens per minute, 24 million tokens per day. BUT: cached tokens count full, so if you have 100,000 tokens of context you can burn a minute of tokens in a few requests.


It’s wild that cached tokens count full - what’s in it for you to care about caching at all then? Is the processing speed gain significant?

Not really worth it, in general. It does reduce latency a little. In practice, you do have a continuing context, though, so you end up using it whether you care or not.

Try a nano-gpt subscription. Not going to be as fast as cerebras obviously but it's $8/mo for 60,000 requests

Drones and 2d compositing could do a lot. They would excel in some areas used in the video, require far more resources than this technique in others, and be completely infeasible on a few.

They would look much better in a very "familiar" way. They would have much less of the glitch and dynamic aesthetic that makes this so novel.


Yah. A lot of the complexity in data movement or processing is unneeded. But decent standardized orchestration, documentation, and change management isn't optional even for the 20 line shell script. Thankfully, that stuff is a lot easier for the 20 line standard shell script.

Or python. The python3 standard library is pretty capable, and it's ubiquitous. You can do a lot in 50-100 lines (counting documentation) with no dependencies. In turn it's easy to plug into the other stuff.


It's a page layout / word processing program. I see the icon and I think "maybe text editor, maybe drawing program".

#4 or #5 are best at conveying what it is for and for being distinct from other icons.


Everyone is going to be hurt, but if you're not the US you need to hedge. Being firmly aligned with the US is too dangerous right now. Lots of negative costs and outcomes come with that hedging.

Not really sure who it's going to hurt most.


China is the only vertically integrated economy left. In a multipolar/bifurcated/low trade world they will be the strongest.

The NAFTA/EU trade blocks were extraordinarily strong, this Greenland business is exactly the kind of issue which can shatter the entire block. It benefits no one to give Greenland to the US, so they won’t do it without a fight. It provides no benefit to the US to take it.

The only thing that would really be settled by the US annexing another country on a presidents whim is the formal end of the U.S. separation of powers.


Yup. They're damned if they do and damned if they don't. They question is, place your eggs in one basket and become subservient to one country or diversify and try to play others against each other.

I'm looking forward to the Telo-- if they get to market. It's absolutely all about utility. It will be interesting to see if people only want pickups as a fashion statement or if a weird, very practical vehicle can win.

(Same bed-size as Tacoma; midgate that folds down to hold a full sheet of plywood; seats 4 people comfortably; same length as a Mini Cooper SE).


I'd love it if Telos were cheaper, though. $40-50k is enough to keep me buying used cars.

Data rates are almost always multiplied by powers of 10, because they're based on symbol/clock rates which tend to be related to powers of 10. There's no address lines, etc, to push us to powers of 2 (though we may get a few powers of 2 from having a power of 2 number of possible symbols).

So telco rates which are multiples of 56000 or 64000; baud rates which are multiples of 300; ethernet rates which are mostly just powers of 10; etc etc etc.

Of course, there's occasional weird stuff, but usually things have a lot of factors of 5 in there and seem more "decimal-ish" than "binary-ish".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: