More

kadushka · 2026-01-29T19:06:23 1769713583

Can you imagine writing code for 100 years?

kadushka · 2026-01-29T18:52:49 1769712769

the larger the trial size, the smaller the outcome

I find this a bit surprising. Could there be something else affecting the accuracy of larger trials? Perhaps they are not as careful, or cutting corners somewhere?

lamename · 2026-01-29T19:35:56 1769715356

Maybe. Those could be the case. But ignoring all confounding factors, this phenomenon is possible with numerical experiments alone. One of the meanings of "the Law of Small Numbers".

Basically, the possibility that the small study was underpowered, and just lucky...then the large studies with more power are closer to the truth. https://en.wikipedia.org/wiki/Faulty_generalization

kadushka · 2026-01-29T19:40:07 1769715607

Sure, could be just lucky. But if there are several successful small studies, and several unsuccessful large ones (no idea if this is the case here), we should probably look for a better explanation.

svara · 2026-01-29T20:02:10 1769716930

It does not require more explanation: publication bias means null results aren't in the literature; do enough small low quality trials and you'll find a big effect sooner or later.

Then the supposed big effect attracts attention and ultimately properly designed studies which show no effect.

hirvi74 · 2026-01-29T19:21:32 1769714492

Just my hypothesis, but I wonder if larger sample sizes provide a more diverse population.

A study with 1000 individuals is likely a poor representation of a species of 8.2 billion. I understand that studies try to their best to use a diverse population, but I often question how successful many studies are at this endeavor.

kadushka · 2026-01-29T19:36:50 1769715410

use a diverse population

If that's the case, we should question whether different homogeneous population groups respond differently to the substance under test. After all, we don't want to know the "average temperature of patients in a hospital", do we?

hirvi74 · 2026-01-30T16:47:48 1769791668

> If that's the case, we should question whether different homogeneous population groups respond differently to the substance under test.

In terms of psychological treatments, I am honestly in support of this. Many mental illnesses can have a cultural component to them.

> After all, we don't want to know the "average temperature of patients in a hospital", do we?

No, I don't think we do. Am I understanding you correctly?

habinero · 2026-01-30T10:21:43 1769768503

No, the other way around. It's the combination of two well known effects. Well, three if you're uncharitable.

1. Small studies are more likely to give anomalous results by chance. If I pick three people at random, it's not that surprising if I happened to get three women. It would be a lot different if I sampled 1,000 people.

2. Studies that show any positive result tend to get published, and ones that don't tend to get binned.

Put those together, and you see a lot of tiny studies with small positive results. When you do a proper study, the effect goes away. Exactly as you would expect.

The less charitable effect is "they made it up". It happens.

kadushka · 2026-01-28T18:22:45 1769624565

This will break down when >30% of people are unemployed

kadushka · 2026-01-11T19:45:56 1768160756

https://nord-ursus.livejournal.com/171473.html

kadushka · 2025-12-22T21:15:48 1766438148

Maybe so, but this particular blog post was the first and is still the best explanation of how transformers work.

kadushka · 2025-12-12T04:05:05 1765512305

What kind of improvements do you expect when going from 5 straight to 6?

kadushka · 2025-11-18T16:40:08 1763484008

Would you rather not have LLMs?

foobarchu · 2025-11-18T17:04:59 1763485499

Absolutely. They have dramatically worsened the world, with little to no net positive impact. Nearly every (if not all) positive impacts have an associated negative that that dwarfs it.

LLMs aren't going anywhere, but the world would be a better place if they hadn't been developed. Even if they had more positive impacts, those would not outweigh the massive environmental degradation they are causing or the massive disincentive they created against researching other, more useful forms of AI.

kazen44 · 2025-11-18T22:03:45 1763503425

LLM's to me sound like a "boiling the ocean" kind of approach to solving a problem.

j2kun · 2025-11-18T16:49:04 1763484544

IMO LLMs have been a net negative on society, including my life. But I'm merely pointing out the stark contrast on this website, and that fact that we can choose to live differently.

kadushka · 2025-11-18T19:32:52 1763494372

Are you anti-AI in general, or are you unhappy about the current LLMs?

j2kun · 2025-11-18T19:39:34 1763494774

I am not anti-AI, nor unhappy about how any current LLM works. I'm unhappy about how AI is used and abused to collective detriment. LLM scraper spam leading to increased centralization and wider impacting failures is just one example.

kadushka · 2025-11-18T20:58:25 1763499505

Your position is similar to saying that medical drugs have been a net negative on society, because some drugs have been used and abused to collective detriment (and other negative effects, such as doctors prescribing pills instead of suggesting lifestyle changes). Does it mean that we would be better off without any medical drugs?

j2kun · 2025-11-18T23:39:54 1763509194

My position is that the negatives outweigh the positives, and I don't appreciate your straw man response. It's clear your question is not genuine and you're here to be contrarian.

kadushka · 2025-11-19T22:01:41 1763589701

I honestly wanted to understand your position, but after such a reaction, I'm not going to engage in any discussions with you.

ToucanLoucan · 2025-11-18T17:31:49 1763487109

Yes.

A solid secondary option is making LLM scraping for training opt-in, and/or compensating sites that were/are scraped for training data. Hell, maybe then you could not knock websites over incentivizing them to use Cloudflare in the first place.

But that means LLM researchers have to respect other people's IP which hasn't been high on their todo lists as yet.

bUt ThAT dOeSn'T sCaLe - not my fuckin problem chief. If you as an LLM developer are finding your IP banned or you as a web user are sick of doing "prove you're human" challenges, it isn't the website's fault. They're trying to control costs being arbitrarily put onto them by a disinterested 3rd party who feels entitled to their content, which it costs them money to deliver. Blame the asshole scraping sites left and right.

Edit: and you wouldn't even need to go THAT far. I scrape a whole bunch of sites for some tools I built and a homemade news aggregator. My IP has never been flagged because I keep the number of requests down wherever possible, and rate-limit them so it's more in line with human like browsing. Like so much of this could be solved with basic fucking courtesy.

lenerdenator · 2025-11-18T17:56:35 1763488595

Not to speak for the other poster, but... That's not a good-faith question.

Most of the problems on the internet in 2025 aren't because of one particular technology. They're because the modern web was based on gentleman's agreements and handshakes, and since those things have now gotten in the way of exponential profit increases on behalf of a few Stanford dropouts, they're being ignored writ large.

CF being down wouldn't be nearly as big of a deal if their service wasn't one of the main ways to protect against LLM crawlers that blatantly ignore robots.txt and other long-established means to control automated extraction of web content. But, well, it is one of the main ways.

Would it be one of the main ways to protect against LLM web scraping if we investigated one of the LLM startups for what is arguably a violation of the Computer Fraud and Abuse Act, arrested their C-suite, and sent each member to a medium-security federal prison (I don't know, maybe Leavenworth?) for multiple years after a fair trial?

Probably not.

chasing0entropy · 2025-11-18T18:01:47 1763488907

I'm Sure there will be an investigation... By the SEC when the bubble pops and takes the S&P with it. No prison though, probably jobs at the next ponzi scheme

j2kun · 2025-11-18T19:41:34 1763494894

Well said.

captainkrtek · 2025-11-18T17:33:33 1763487213

hard yes, all of the technical discussion aside, the constant advertising deluge of every company touting AI is mind numbing.

seanw444 · 2025-11-18T18:16:10 1763489770

It's helped me learn some things quicker, but I definitely prefer the old days.

salawat · 2025-11-18T17:22:11 1763486531

Can I raise that to no LLMs or SEO?

worik · 2025-11-18T18:36:31 1763490991

Yes

LLMs have become a crucial compendium of knowledge, that had become hidden behind SEO

davidhaymond · 2025-11-18T17:33:53 1763487233

Absolutely. And while we're at it, let's do away with social media.

BrenBarn · 2025-11-18T20:32:32 1763497952

Good lord yes. No question.

stalfosknight · 2025-11-18T19:14:52 1763493292

inferiorhuman · 2025-11-18T16:44:09 1763484249

nhhvhy · 2025-11-18T18:30:11 1763490611

LtWorf · 2025-11-18T18:26:27 1763490387

therein · 2025-11-18T18:42:04 1763491324

Yes, they are terrible and more a negative force than a positive one in every way imaginable. I would take no LLMs all day every day.

pixl97 · 2025-11-19T15:40:18 1763566818

I'd also take no war, no murder, and no disease, but that's not the world we live in.

kadushka · 2025-11-15T01:50:20 1763171420

Connections between LLM neurons also change during training.

advisedwang · 2025-11-17T17:14:40 1763399680

a) "during training" is a huuuuge asterisk

b) Do you have a citation for that? my understanding is that while some weights can go to zero and effectively be removed, no (actually used in prod) network architecture or training method allows arbitrary connections.

kadushka · 2025-10-24T18:28:50 1761330530

devil is in the details

kadushka · 2025-10-05T01:00:47 1759626047

Mainly because global video data corpus is > 100k larger than global text corpus, so you will need to train much larger models for much longer (than current LLMs).