More

skinner_ · 2026-01-08T23:16:02 1767914162

Then I think you’ll like our project which aims to find the missing link between transformers and swarm simulations:

https://github.com/danielvarga/transformer-as-swarm

Basically a boid simulation where a swarm of birds can collectively solve MNIST. The goal is not some new SOTA architecture, it is to find the right trade-off where the system already exhibits complex emergent behavior while the swarming rules are still simple.

It is currently abandoned due to a serious lack of free time (*), but I would consider collaborating with anyone willing to put in some effort.

(*) In my defense, I’m not slacking meanwhile: https://arxiv.org/abs/2510.26543 https://arxiv.org/abs/2510.16522 https://www.youtube.com/watch?v=U5p3VEOWza8

skinner_ · 2025-12-16T21:54:12 1765922052

https://www.astralcodexten.com/p/in-search-of-ai-psychosis is very relevant, but the main reason I’m posting it here is that, unlike this paper, it takes the opportunity to build the cleverest pun out of the same ingredients:

Folie A Deux Ex Machina

skinner_ · 2025-11-13T13:38:57 1763041137

I interpreted it loosely, as "be aware of the possibility, and stop looking at it at the first signs of issues".

nkrisc · 2025-11-13T15:30:47 1763047847

That seems to me to be a VERY generous interpretation of:

> I would check to make sure this can't trigger migraines or seizures. Maybe it's just me, but also, please double check.

skinner_ · 2025-11-10T06:58:43 1762757923

100% frontpage-worthy! Frankly I was already bored with all those pelicans, and a bit worried that the labs are overfitting on pelicans specifically. This test clearly demonstrates that they are not.

skinner_ · 2025-10-24T21:02:56 1761339776

That's very cool, but it's not an apples to apples comparison. The reasoning model learned how to do long multiplication. (Either from the internet, or from generated examples of long multiplication that were used to sharpen its reasoning skills. In principle, it might have invented it on its own during RL, but no, I don't think so.)

In this paper, the task is to learn how to multiply, strictly from AxB=C examples, with 4-digit numbers. Their vanilla transformer can't learn it, but the one with (their variant of) chain-of-thought can. These are transformers that have never encountered written text, and are too small to understand any of it anyway.

skinner_ · 2025-10-24T19:31:40 1761334300

If being probabilistic prevented learning deterministic functions, transformers couldn’t learn addition either. But they can, so that can't be the reason.

wat10000 · 2025-10-24T19:37:37 1761334657

People are probabilistic, and I've been informed that people are able to perform multiplication.

ddingus · 2025-10-24T20:10:06 1761336606

Yes, and unlike the LLM they can iterate on a problem.

When I multiply, I take it in chunks.

Put the LLM into a loop, instruct it to keep track of where it is and have it solve a digit at a time.

I bet it does just fine. See my other comment as to why I think that is.

krackers · 2025-10-25T03:57:33 1761364653

Are you sure? I bet you if you pull 10 people off the street and ask them to multiply 5 digit by 5 digit numbers by hand, you won't have a 100% success rate.

wat10000 · 2025-10-25T13:09:01 1761397741

The pertinent fact is that there exist people who can reliably perform 5x5 multiplication, not that every single person on the planet can do it.

emp17344 · 2025-10-25T14:47:07 1761403627

I bet with a little training, practically anyone could multiply 5 digit numbers reliably.

skinner_ · 2025-10-08T21:12:19 1759957939

> But which contributes more, they ask? Who gives a shit, really?

Funding agencies? Should they prioritize established researchers or newcomers? Should they support many smaller grant proposals or fewer large ones?

skinner_ · 2025-04-18T01:38:49 1744940329

My uninformed and perhaps overly charitable interpretation: he warned them they were going to be steamrolled, they built their product anyway, and now OpenAI is buying them because (1) OpenAI doesn't want the negative publicity of steamrolling them all, and (2) OpenAI has the money and is a bit too lazy to build a clone.

skinner_ · 2025-03-13T01:07:37 1741828057

Amazing! I looked into your ADAM claim, and it checks out. Thanks! Now I'm curious. I you have the time, could you please follow up with the 'etc...'?

nybsjytm · 2025-03-13T19:03:40 1741892620

There's a related section about 'mathiness' in section 3.3 of the article "Troubling Trends in Machine Learning Scholarship" https://arxiv.org/abs/1807.03341. I would say the situation has only gotten worse since that paper was written (2018).

However the discussion there is more about math which is unnecessary to a paper, not so much about the problem of math which is unintelligible or, if intelligible, then incorrect. I don't have other papers off the top of my head, although by now it's my default expectation when I see a math-centric AI paper. If you have any such papers in mind, I could tell you my thoughts on it.

skinner_ · 2025-02-18T17:44:04 1739900644

You dismiss parent's example test because it's in the training data. I assume you also dismiss the Sally-Ann test, for the same reason. Could you please suggest a brand new test not in the training data?

FWIW, I tried to confuse 4o using the now-standard trick of changing the test to make it pattern-match and overthink it. It wasn't confused at all:

https://chatgpt.com/share/67b4c522-57d4-8003-93df-07fb49061e...

zipy124 · 2025-02-19T10:09:08 1739959748

I can't suggest a new test no, it is a hard problem and identifying problems is usually easier than solving them.

I'm just trying to say that strong claims require strong evidence, and a claim that LLM's can have theory of mind and thus "understand that other people have different beliefs, desires, and intentions than you do" is a very strong claim.

It's like giving students the math problem of 1+1=2 and loads of examples of it solved in front of them, and then testing them on you have 1 apple, and I give you another apple, how many do you have, and then when they are correct saying that they can do all additive based arithmetic.

This is why most benchmark tests have many many classes of examples, for example looking at current theory of mind benchmarks [1], we can see slightly more up to date models such as o1-preview still scoring substantially below human performance. More importantly by simply changing the perspective from first to third person, accuracy drops in LLM models by 5-15% (percent score, not relative to its performance), whilst it doesn't change for human participants, which tells you that something different is going on there.

[1]: https://arxiv.org/html/2410.06195v1