That presumes that languages didn't evolve independently across different communities. The fact that different ancient languages have completely different grammatical structures, for example, provides some evidence of this.
> The fact that different ancient languages have completely different grammatical structures, for example, provides some evidence of this
It really doesn't provide that evidence. Proto-Afroasiatic the oldest agreed upon hypothetical proto-language probably only dates back 18,000 years. The modern brain, vocal, and tongue structures linked to complex speech were in place 100,000 years ago, and its thought that complex speech was in place by the time Homo Sapiens left Africa 50-70,000 years ago. That's a long time for grammar to diverge. Just in recorded history plenty of languages have gained and lost very complex grammatical features. Old Chinese for example was not a tonal language, but evolved tones. Small isolated languages can change rapidly, and trade languages tend to simplify.
A simple counter example here is instinctual behaviour. A sea turtle is born, and with little to no guidance, experimentation, or exploration heads to the sea. That knowledge is embedded at birth.
I think the analogy of the brain as hardware devices ("neural processor", "I/0 devices", etc) is misleading. I think I understand the very strict mind-matter dualism you're alluding to here. But so far attempts at using actual computer hardware to reproduce human-like cognition has gotten nowhere close, despite consuming order of magnitude more energy and data.
> The idea is that if you can produce an accurate probably distribution over the next bit/byte/token...
But how can you get credible probability distributions from the LLMs? My understanding is that the outputs specifically can't be interpreted as a probability distribution, even though superficially they resemble a PMF, due to the way the softmax function tends to predict close to 100% for the predicted token. You can still get an ordered list of most probable tokens (which I think beam search exploits), but they specifically aren't good representations of the output probability distribution since they don't model the variance well.
My understanding is that minimizing perplexity (what LLMs are generally optimized for) is equivalent to finding a good probably distribution over the next token.
I think they are accounting for the entire context, they specifically write out:
>> P(next_word|previous_words)
So the "next_word" is conditioned on "previous_words" (plural), which I took to mean the joint distribution of all previous words.
But, I think even that's too reductive. The transformer is specifically not a function acting as some incredibly high-dimensional lookup table of token conditional probabilities. It's learning a (relatively) small amount of parameters to compress those learned conditional probabilities into a radically lower-dimensional embedding.
Maybe you could describe this as a discriminative model of conditional probability, but at some point, we start describing that kind of information compression as semantic understanding, right?
It's reductive because it obscures just how complicated that `P(next_word|previous_words)` is, and it obscures the fact that "previous_words" is itself a carefully-constructed (tokenized & vectorized) representation of a huge amount of text. One individual "state" in this Markov-esque chain is on the order of an entire book, in the bigger models.
It doesnt matter how big it is, it's properties dont change. eg., it never says, "I like what you're wearing" because it likes what I'm wearing.
It seems there's an entire generation of people taken-in by this word, "complexity" and it's just magic sauce that gets sprinkled over ad-copy for big tech.
We know what it means to compute P(word|words), we know what it means that P("the sun is hot") > P("the sun is cold") ... and we know that by computing this, you arent actaully modelling the temperature of the sun.
It's just so disheartening how everyone becomes so anthropomorphically credulous here... can we not even get sun worship out of tech? Is it not possible for people to understand that conditional probability structures do not model mental states?
No model of conditional probabilities over text tokens, no matter how many text tokens it models, ever says, "the weather is nice in august" because it means the weather is nice in august. It has never been in an august; or in weahter; nor does it have the mental states for preference, desire.. nor has it's text generation been caused by the august weather.
This is extremely obvious, as in, simply refelect on why the people who wrote those historical text did so.. and reflect on why an LLM generates this text... and you can see that even if an LLM produced word-for-word MLK's I have a dream speech, it does not have a dream. It has not suffered any oppression; nor organised any labour; nor made demands on the moral conscience of the public.
This shouldnt need to be said to a crowd who can presumably understand what it means to take a distribution of text tokens and subset them. It doesnt matter how complex the weight structure of an NN is: this tells you only how compressed the conditional probability distribution is over many TBs of all of text history.
You're tilting at windmills here. Where in this thread do you see anyone taking about the LLM as anything other than a next-token prediction model?
Literally all of the pushback you're getting is because you're trivializing the choice of model architecture, claiming that it's all so obvious and simple and it's all the same thing in the end.
Yes, of course, these models have to be well-suited to run on our computers, in this case GPUs. And sure, it's an interesting perspective that maybe they work well because they are well-suited for GPUs and not because they have some deep fundamental meaning. But you can't act like everyone who doesn't agree with your perspective is just an AI hypebeast con artist.
ah, well there's actually two classes of replies and maybe i'm confusing one for the other here.
My claim regarding architecture follows just formally: you can take any statistical model trained via gd and phrase it as a kNN. The only difference is how hard it is to produce such a model from fitting to data, rather than from rephrasing.
The idea that there's something special about architecture is, really, a hardware illusion. Any empirical function approximation algorithm, designed to find the same conditional probability structure, will in the limit t->inf, approximate the same structure (ie., the actual conditional joint distribution of the data).
> The idea that there's something special about architecture is, really, a hardware illusion. Any empirical function approximation algorithm, designed to find the same conditional probability structure, will in the limit t->inf, approximate the same structure (ie., the actual conditional joint distribution of the data).
But it's not just about hardware. Maybe it would be, if we had access to an infinite stream of perfectly noise-free training data for every conceivable ML task. But we also need to worry about actually getting useful information out of finite data, not just finite computing resources. That's the limit you should be thinking about: the information content of input data, not compute cycles.
And yes, when trying to learn something as tremendously complicated as a world-model of multiple languages and human reasoning, even a dataset as big as The Pile might not be big enough if our model is inefficient at extracting information from data. And even with the (relatively) data-efficient transformer architecture, even a huge dataset has an upper limit of usefulness if it contains a lot of junk noise or generally has a low information density.
I put together an example that should hopefully demonstrate what I mean: https://paste.sr.ht/~wintershadows/7fb412e1d05a600a0da5db2ba.... Obviously this case is very stylized, but the key point is that the right model architecture can make good use of finite and/or noisy data, and the wrong model architecture cannot, regardless of how much compute power you throw at the latter.
It's Shannon, not Turing, who will get you in the end.
text is not a valid measure of the world, so there is no "informative model" ie., a model of the data generating process to fit it to. there is no sine curve, indeed there is no function from world->text -- there are an infinite family of functions, none of which is uniquely sampled by what happens to be written down
transformers, certainly, arent "informative" in this sense: they start with no prior model of how text would be distributed given the structure of the world.
these arguments all make radical assumptions that we are in somethihng like a physics experiment -- rather than scraping glyphs from books and replaying their patterns
I read through the whole LW post, and think there's enough troubling evidence here that she shouldn't be dismissed. It certainly shouldn't be flagged.
I initially was leaning to this being a high possibility of a delusion springing from a mentally unstable person, for all the reasons other commentators are mentioning. But, two things in particular struck me that changed my mind:
1. She apparently mentioned the abuse to her mother as a child.
2. She describes childhood behaviour consistent with someone who has experienced sexual abuse (i.e. thoughts of suicide, weird night behaviour like taking baths, body issues as she got older).
A small child doesn't have any incentive to make accusations, or to pretend to have been assaulted. If true, this should be taken seriously. Her mother is still alive, and there may be doctors, relatives or others that would be able to substantiate these points.
Finally, why has this post (and previous related posts) been repeatedly flagged? It's very troubling, I expect this from some HN users, but would have thought the HN moderators would have unflagged (or reposted) it upon consideration of the seriousness, importance of the subject matter, and undeniable relevance to the tech industry. At minimum, you would think someone would have unflagged them to avoid the appearance of bias and favorable treatment to the former YC president. At this point HN looks really sleazy.
I share your opinion on the likely veracity of the allegations and would also like an explanation for the flagging of this post and the repeated deletions of similar posts.
It’s also a thing that I know for sure happens in families and gets swept under the rug. The trauma it causes is deep and complex and I don’t think as a society we have any idea how often it happens. My bet is it’s far more common than we know.
> behaviour consistent with someone who has experienced sexual abuse (i.e. thoughts of suicide, weird night behaviour like taking baths, body issues as she got older)
Can you link to something about it? That behavior rings a bell
> Finally, why has this post (and previous related posts) been repeatedly flagged? It's very troubling, I expect this from some HN users
I'm not too sure if it's concerning YC in some way or just the techbro crowd being itself. Downvotes are typical here with child abuse related topics in general and especially if it concerns tech. But also there's a possibility that's just a random person and not really his sister.
Regardless of whether or not corruption is the case, it only has forward guarantees if there is a hint of blackmail involved.
Donating a bunch of money to a group gives you no power over that group unless they think there is more on the table. When you’re about to get arrested they will cut all ties immediately unless there is an actual obligation to support you.
I've definitely seen several people claim it on HN. Almost invariably, such comments were downvoted into oblivion though, or had replies basically saying something along the lines of "what crack are you smoking?"
and several similar comments scattered throughout, that Dems will pardon him. So yes, delusion on HN is quite real. And crypto news attracts a deeply paranoid, cynical sort of mind, who have seen a glimpse of some big conspiracy and can't let go of it.
I have seen a some posters on HN and a lot of posters else where trying to use this case to bash the Democratic party. It very annoying because they never provide evidence to back up their claims.
You also see a lot of people bashing Sam Bankman-Fried’s parents or accusing them of committing crimes without providing any evidence.
To be fair, 90% of all people won't admit they were wrong.
Sidenote, I think HN will be wrong on Elon Musk and X as well over the long term (personal opinion of course). But the way the discourse around X has changed on HN is incredible.
When Parag Agarwal was made CEO, everyone on HN complained about the platform and how a subscription model was the way to go. Now that Elon Musk has instated a subscription model, (seemingly) all on HN agrees he's running the company into the ground.
Seems to be happening a lot more often over the past 4 or 5 years.
Wondering if its perhaps two different subsets of people, with differing opinions that haven’t so much shifted, as that primarily just one’s been highly motivated to engage at a time? (In the same way that say surveys might draw disproportionately more engagement from those that feel extremely dissatisfied)
If you like HN commentators being wrong, you’ll love this initial thread on Alameda being insolvent [0]. I post only in jest, as hindsight makes much of this hilarious.
Eh, he must be pretty much broke now so why would politicians from either party care about him anymore? He made his donations and they looked the other way on regulating crypto.
His campaign finance trial is March 11, sentencing March 28. I think it’s reasonable to speculate on whether he’ll plead guilty in exchange for clemency at sentencing. Who knows. I don’t think anybody reasonable has ever believed a pardon is on the table for the largest fraud ever.
I don't care either way, but I would guess most people expect that the sentencing to be very light if you have friends in important places.. not that you're just totally let off on all charges.
I disagree with other commentators here arguing that there's no benefit this monitor would bring over simply reducing screen brightness. But I also think the claim made by the article here is at best, miscommunicated: that the bounced light improves the monitor light quality itself. On this point, I think the criticism from HN is correct, there wouldn't be a meaningful difference between equivalent bounced light, and light from the monitor. There might be a possible benefit from the lack of light flickering from AC-driven electric lights, but that is only true if the space is daylit.
However, I think there is a quantifiable benefit from making the monitor light directly dependent on environmental light, which forces our perception to adjust to a more, low contrast, diffuse environmental context.
Part of the problem is that "brightness" in the context of monitors is different from how "brightness" is used in science of light, where it is defined as the subjective perception of light that changes relative to differences in light levels[1]. So you can see your way to the washroom in the middle of the night with no lights, but can't see your way back after you turned on the washroom light, because your subjective perception of light (brightness) has changed, even though objectively the amount of visible light (illuminance) has not changed.
Therefore, having a screen lit by the environment, would shift your perception of light to better see duller, low-light conditions, which is better for our eyes, since more uniform, diffuse light causes less strain than strong, directed light.
> There might be a possible benefit from the lack of light flickering from AC-driven electric lights, but that is only true if the space is daylit.
The change isn't from AC-driven lighting to non-AC-driven lighting. It's the other way around. I'm seeing a regular AC-driven lightbulb above the screen. Monitor backlights are DC-driven.
Thanks for the correction, my understanding of electricity is shaky, and I was assuming everything connected to a wall outlet is AC-driven, unless there's a boxy inverter along the cord like a laptop, but you're correct, that's not neccessarily correct and monitors are DC-driven.
Unfortunately, the HN crowd has an incredibly naive view of what journalism is. I think it's mainly a product of ignorance, typically conflating op-eds with the journalism, and false-equivalencies that lump the Washington Post into the same bucket as Fox News, and People's Daily.
I'm always surprised the degree to which this kind of anti-journalism rhetoric ignores the role of journalism holding powerful governments and corporations accountable.
Whatever good journalists do in holding governments and corporations accountable does not absolve them of their responsibility to link to their sources.
reply