More

sigbottle · 2026-02-13T18:47:48 1771008468

In some sense, it's good to talk about what you aren't saying, to be more informative and precise.

But like, all of these statements are basically ampliative statements, to make it more grand and even more ambiguous.

sigbottle · 2026-02-13T15:14:37 1770995677

Unironically, this is great training data for humans.

No sane person would say this kind of stuff out loud; this often happens behind closed doors, if at all (because people don't or can't express their whole train of thought). Especially not on the internet, at least.

Having AI write like this is pretty illustrative of what a self-consistent, narcissistic narrative looks like. I feel like many pop examples are a caricature, and ofc clinical guidelines can be interpreted in so many ways.

sigbottle · 2026-02-09T13:45:00 1770644700

I feel like literally any linguistic tool can be repurposed in the right way to serve your ends. Demarcation problems in philosophy are nigh impossible.

So I've taken more of your kind of approach. Either both people are willing to understand each other at that time or are not - and heck you may be wrong in thinking the other person wants to engage / thinking they don't want to engage and make the wrong call. But in a controller-feedback system, you'll inevitably be wrong sometimes - the point is to course correct.

But then you even run into edge cases with this - sometimes people want to engage on a physical level, but not really engage and try to understand. But then how do you differentiate that from the possibility that you see that in others unfairly?

Ah, life is complicated.

catapart · 2026-02-09T14:15:09 1770646509

Personally, I just go for "plateaus of understanding". As a filthy socialist in a deep red area, I never had a chance at convincing people all in one go. So I give them just enough that they'll come back around for another bite at convincing me why I'm wrong. Where I'm from, that mostly looks like waiting on the news cycle to say something bad about a democrat and then letting them bring it up purely to dunk on me, rather than to actually engage with the conversation. So I take the opportunity to redirect the dunk. "Oh, that IS bad. Wow, they really fucked up. Isn't this like how So and So did something similar?" Now they are explaining to me the nuances of difference between the situations. That's what they'll remember later - the arguments THEY made. And when they start comparing that to other things, reason wears them down like it does anyone else.

In effect, you end up getting them to agree with some otherwise unthinkable positions, just one plateau at a time. There's only so much erosion that can take place before they fall back to their own lines of thought termination (like "all I care about is immigration", or whatever). So you end up with a kind of "anchor" that we can both agree on (ex: corporations are fucking us), which still has the hard edge of politics. At that point, all it takes is for the politics to do enough that the hard edges start to erode. But, there's no accounting for that. Just gotta assume the people who are doing wrong will keep proving it (as they historically have been unable to avoid, no matter how hard they try or how long they are successful at it prior).

As you say: life is complicated. I know people will roll their eyes at this answer as much as any other response I could give you. And I know that a political-focused answer isn't directly analogous to many other situations. But, my answer is as simple as I can think to make it. Just meet people where they're meeting you and don't worry about forcing a point.

vacuity · 2026-02-09T15:20:52 1770650452

Thank you for your comment. I will try to remember and apply it.

I have a similar concept, which is roughly described as "don't extend your own arm onto the chopping block, but make use of other people doing the same". Don't make presumptions, and you will never be wrong. Don't dig holes for yourself. Be kind, respectful, genuine. Don't throw the first punch, but don't let innocent people get hurt. Find common ground. When they extend out their arm, take their hand and guide them towards truth and goodness.

mapontosevenths · 2026-02-09T14:38:52 1770647932

> But, there's no accounting for that. Just gotta assume the people who are doing wrong will keep proving it (as they historically have been unable to avoid, no matter how hard they try or how long they are successful at it prior).

“Just go forward in all your beliefs and prove to me that I am not mistaken in mine.” ― William Hartnell, Doctor Who

sigbottle · 2026-02-06T22:49:51 1770418191

I tend to remember techniques more than specific algorithms, but:

One really fun algorithm involved optimizing an n^2 naive tree algorithm to O(n), ignoring logs.

For me, the way I reasoned about it was expanding the number of objects to consider (n^3), then there were observations you could apply to bring it down to O(n).

If you asked me what exactly the correspondence was between the final reduction and the original problem statement, I couldn't tell you. Maybe there was a more direct way to get to the solution.

But that style of thinking carries on with me to real tasks too. Sometimes it's easier to simplify, other times it might actually be easier to complexify, as long as you trust that you're not going down "arbitrary" complexity.

sigbottle · 2026-02-06T15:33:20 1770392000

I think one of the best ways to understand the "nice property" of compilers we like isn't necessarily determinacy, but "programming models".

There's this really good blog post about how autovectorization is not a programming model https://pharr.org/matt/blog/2018/04/18/ispc-origins

The point is that you want to reliably express semantics in the top level language, tool, API etc. because that's the only way you can build a stable mental model on top of that. Needing to worry about if something actually did something under the hood is awful.

Now of course, that depends on the level of granularity YOU want. When writing plain code, even if it's expressively rich in the logic and semantics (e.g. c++ template metaprogramming), sometimes I don't necessarily care about the specific linker and assembly details (but sometimes I do!)

The issue I think is that building a reliable mental model of an LLM is hard. Note that "reliable" is the key word - consistent. Be it consistently good or bad. The frustrating thing is that it can sometimes deliver great value and sometimes brick horribly and we don't have a good idea for the mental model yet.

To constrain said possibility space, we tether to absolute memes (LLMs are fully stupid or LLMs are a superset of humans).

Idk where I'm going with this

whattheheckheck · 2026-02-07T05:18:04 1770441484

Now you know how directors and executives feel

sigbottle · 2026-02-06T13:14:13 1770383653

Even with all the caveats:

- trained on all the GCC/clang source - pulled down a kernel branch, presumably with extensive tests in source - used GCC as an oracle

I certainly wouldn't be able to do this.

I flip flop man.

sigbottle · 2026-02-06T12:42:29 1770381749

Completely irrelevant but this exchange reminded me of two Greek philosophers saying "everything is change" versus "nothing ever changes" LOL

sigbottle · 2026-02-04T14:00:55 1770213655

Yup, yup! There's so many different ways of thinking hard.

For me, thinking about an extremely technical TCS problem, for example, is my version of actively, tirelessly thinking hard. I'm logging a ton of observations, trying new ideas and hypotheses, using a mix of computer simulation and math to try and arrive at a concrete framing and answer.

On the other end of the specturm, I have philosophy. It's definitely a different type of hard. Most of my "Aha!" moments come from when I realize I've been strawmanning some argument, and not actually understanding what the person is saying. Why is the person saying this, relative to what, why is this a new observation, etc. Things are so amorphus and you can tweak the problem parameters in so many ways, and it's really tempting to either be too fluid and pretend you understand the thinker (because it's a subset of some conception you already have), or be too rigid and dissolve the thinker as a category error / meaningless. I've never felt the same feeling as I did when doing TCS research, but the feeling was definitely hard thinking nonetheless.

In terms of extremely nitty-gritty technical things, like linker bullshit and linux kernel programming, I'm much more familiar with, and these things are more about reading documentation (because the tool won't behave like you want it to) and iteration / testing (because... the tool won't behave like you want it to, so you need to make sure it behaves like you want it to!). This is also a type of thinking - I would call it hard as in the physiological response I have is similar to that of research in the very bad moments, but in terms of my lofty ideals, I don't want to call this hard.... it's very "accidental" complexity, but it's what I get paid to do :/

At work, you have a huge idea space to consider, both problem and solution framings, mixing in "bullshit" constraints like business ones. You also throw in the real-time aspect of it, so I can't just either armchair on a problem for a month (unlike Philosophy) or deep dive on a problem for a month (unlike research). I'm technically doing the third type of programming right now, but we'll see how long that lasts and I get put on a new project.

I'm not even sure if there's a clean demarcation between any of these. These are certainly better than brainrotting youtube though.

sigbottle · 2026-02-03T23:27:50 1770161270

Oh wow there's still work being done on ampere?

I was wondering - I've been thinking about switching to AI systems programming (I know, easy task), but from what I understand, industry cloud GPUs are the main winners, right? Nobody's going to pay me (assuming I even had the skills) to optimize for consumer GPUs?

From what I understand, it's not just number + capacity + performance, it's literal core primitives. I don't think any of the "Blackwell" chips like the grace one or rtx 5090 have for example SM pairs in their ISA? And likewise similar fundamental differences between consumer and cloud hopper (where the majority of the perf is the cloud one's ISA?)

So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?

g947o · 2026-02-04T01:48:27 1770169707

Why does publishing papers require the latest and greatest GPUs? My understanding is that the paper talks about very general principles.

> So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?

Unless you have money to throw around, you'd better start working on something, write some code and get them running on a leased GPU, before deciding on a long term plan

nl · 2026-02-04T11:59:12 1770206352

> My understanding is that the paper talks about very general principles.

This isn't really true.

In this case it's specific to NVidia's tensor matrix multiply-add (MMA) instructions, which lets it use silicon that would otherwise be unusued at that point.

> Why does publishing papers require the latest and greatest GPUs?

You really do need to test these things on real hardware and across hardware. When you are doing unexpected things there are lots of unexpected interaction effects.

g947o · 2026-02-04T13:00:42 1770210042

It's supported on Ampere, so it's good enough.

As a reminder, the context is "require the latest and greatest GPUs", responding to the parent comment. "General" doesn't mean "you can do this on an Intel Arc GPU" level of general.

That said, my comment could have used a bit more clarity.

saagarjha · 2026-02-04T04:45:51 1770180351

> Nobody's going to pay me (assuming I even had the skills) to optimize for consumer GPUs?

People will but probably less, not many people are doing AI at the edge that can pay the mega millions

> And likewise similar fundamental differences between consumer and cloud hopper (where the majority of the perf is the cloud one's ISA?)

I think Hopper was the version where they did a clean split and it’s only for datacenter

> So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?

You can do performance work on any system you have really it’s just that the details change depending on what you’re targeting. You can definitely learn the basics on like a 3060 by following blog posts

mips_avatar · 2026-02-04T06:02:25 1770184945

You should check out nanochat. I would personally appreciate it if someone implemented hardware optimized flash attention for my 3090

coolsunglasses · 2026-02-04T00:44:07 1770165847

I do CUDA for a living (not inference) and for the life of me (and a couple of LLMs for that matter) I cannot figure out what you mean by "SM pairs".

Do you mean the coupled dies on stuff like the B200? An NVidia chip die has many SMs if so.

Do you mean TMEM MMA cooperative execution? I'm guessing that must be it given what the paper is about.

sigbottle · 2026-02-04T00:53:39 1770166419

https://hazyresearch.stanford.edu/blog/2025-03-15-tk-blackwe...

cooperative execution yeah

as you can tell I do not do CUDA for a living :D

storus · 2026-02-04T00:06:06 1770163566

I still have 2x NVLinked A6000 and they aren't that bad compared to a single RTX 6000 Pro.

Maxious · 2026-02-03T23:49:29 1770162569

yep, https://github.com/poad42/cuda-fp8-ampere recently another attempt at squeezing whatever's left from ampere

vlovich123 · 2026-02-04T00:16:54 1770164214

Look at am the email addresses. If you’ll recall there’s an embargo on China.

sigbottle · 2026-02-02T18:37:32 1770057452

GPT models definitely seem stronger when they "get it" and in the types of problems they "get", while claude seems more holistic but not "as smart" as some of the spikes GPT can get.