Hacker Newsnew | past | comments | ask | show | jobs | submit | kadushka's commentslogin

Can you imagine writing code for 100 years?

the larger the trial size, the smaller the outcome

I find this a bit surprising. Could there be something else affecting the accuracy of larger trials? Perhaps they are not as careful, or cutting corners somewhere?


Maybe. Those could be the case. But ignoring all confounding factors, this phenomenon is possible with numerical experiments alone. One of the meanings of "the Law of Small Numbers".

Basically, the possibility that the small study was underpowered, and just lucky...then the large studies with more power are closer to the truth. https://en.wikipedia.org/wiki/Faulty_generalization


Sure, could be just lucky. But if there are several successful small studies, and several unsuccessful large ones (no idea if this is the case here), we should probably look for a better explanation.

It does not require more explanation: publication bias means null results aren't in the literature; do enough small low quality trials and you'll find a big effect sooner or later.

Then the supposed big effect attracts attention and ultimately properly designed studies which show no effect.


Just my hypothesis, but I wonder if larger sample sizes provide a more diverse population.

A study with 1000 individuals is likely a poor representation of a species of 8.2 billion. I understand that studies try to their best to use a diverse population, but I often question how successful many studies are at this endeavor.


use a diverse population

If that's the case, we should question whether different homogeneous population groups respond differently to the substance under test. After all, we don't want to know the "average temperature of patients in a hospital", do we?


> If that's the case, we should question whether different homogeneous population groups respond differently to the substance under test.

In terms of psychological treatments, I am honestly in support of this. Many mental illnesses can have a cultural component to them.

> After all, we don't want to know the "average temperature of patients in a hospital", do we?

No, I don't think we do. Am I understanding you correctly?


No, the other way around. It's the combination of two well known effects. Well, three if you're uncharitable.

1. Small studies are more likely to give anomalous results by chance. If I pick three people at random, it's not that surprising if I happened to get three women. It would be a lot different if I sampled 1,000 people.

2. Studies that show any positive result tend to get published, and ones that don't tend to get binned.

Put those together, and you see a lot of tiny studies with small positive results. When you do a proper study, the effect goes away. Exactly as you would expect.

The less charitable effect is "they made it up". It happens.


This will break down when >30% of people are unemployed


Maybe so, but this particular blog post was the first and is still the best explanation of how transformers work.


What kind of improvements do you expect when going from 5 straight to 6?


Would you rather not have LLMs?


Absolutely. They have dramatically worsened the world, with little to no net positive impact. Nearly every (if not all) positive impacts have an associated negative that that dwarfs it.

LLMs aren't going anywhere, but the world would be a better place if they hadn't been developed. Even if they had more positive impacts, those would not outweigh the massive environmental degradation they are causing or the massive disincentive they created against researching other, more useful forms of AI.


LLM's to me sound like a "boiling the ocean" kind of approach to solving a problem.


IMO LLMs have been a net negative on society, including my life. But I'm merely pointing out the stark contrast on this website, and that fact that we can choose to live differently.


Are you anti-AI in general, or are you unhappy about the current LLMs?


I am not anti-AI, nor unhappy about how any current LLM works. I'm unhappy about how AI is used and abused to collective detriment. LLM scraper spam leading to increased centralization and wider impacting failures is just one example.


Your position is similar to saying that medical drugs have been a net negative on society, because some drugs have been used and abused to collective detriment (and other negative effects, such as doctors prescribing pills instead of suggesting lifestyle changes). Does it mean that we would be better off without any medical drugs?


My position is that the negatives outweigh the positives, and I don't appreciate your straw man response. It's clear your question is not genuine and you're here to be contrarian.


I honestly wanted to understand your position, but after such a reaction, I'm not going to engage in any discussions with you.


Yes.

A solid secondary option is making LLM scraping for training opt-in, and/or compensating sites that were/are scraped for training data. Hell, maybe then you could not knock websites over incentivizing them to use Cloudflare in the first place.

But that means LLM researchers have to respect other people's IP which hasn't been high on their todo lists as yet.

bUt ThAT dOeSn'T sCaLe - not my fuckin problem chief. If you as an LLM developer are finding your IP banned or you as a web user are sick of doing "prove you're human" challenges, it isn't the website's fault. They're trying to control costs being arbitrarily put onto them by a disinterested 3rd party who feels entitled to their content, which it costs them money to deliver. Blame the asshole scraping sites left and right.

Edit: and you wouldn't even need to go THAT far. I scrape a whole bunch of sites for some tools I built and a homemade news aggregator. My IP has never been flagged because I keep the number of requests down wherever possible, and rate-limit them so it's more in line with human like browsing. Like so much of this could be solved with basic fucking courtesy.


Not to speak for the other poster, but... That's not a good-faith question.

Most of the problems on the internet in 2025 aren't because of one particular technology. They're because the modern web was based on gentleman's agreements and handshakes, and since those things have now gotten in the way of exponential profit increases on behalf of a few Stanford dropouts, they're being ignored writ large.

CF being down wouldn't be nearly as big of a deal if their service wasn't one of the main ways to protect against LLM crawlers that blatantly ignore robots.txt and other long-established means to control automated extraction of web content. But, well, it is one of the main ways.

Would it be one of the main ways to protect against LLM web scraping if we investigated one of the LLM startups for what is arguably a violation of the Computer Fraud and Abuse Act, arrested their C-suite, and sent each member to a medium-security federal prison (I don't know, maybe Leavenworth?) for multiple years after a fair trial?

Probably not.


I'm Sure there will be an investigation... By the SEC when the bubble pops and takes the S&P with it. No prison though, probably jobs at the next ponzi scheme


Well said.


hard yes, all of the technical discussion aside, the constant advertising deluge of every company touting AI is mind numbing.


It's helped me learn some things quicker, but I definitely prefer the old days.


Can I raise that to no LLMs or SEO?


Yes

LLMs have become a crucial compendium of knowledge, that had become hidden behind SEO


Absolutely. And while we're at it, let's do away with social media.


Good lord yes. No question.


Yes


Yes


Yes.


Yes.


Yes, they are terrible and more a negative force than a positive one in every way imaginable. I would take no LLMs all day every day.


I'd also take no war, no murder, and no disease, but that's not the world we live in.


Connections between LLM neurons also change during training.


a) "during training" is a huuuuge asterisk

b) Do you have a citation for that? my understanding is that while some weights can go to zero and effectively be removed, no (actually used in prod) network architecture or training method allows arbitrary connections.


devil is in the details


Mainly because global video data corpus is > 100k larger than global text corpus, so you will need to train much larger models for much longer (than current LLMs).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: