Hacker Newsnew | past | comments | ask | show | jobs | submit | mumblemumble's commentslogin

Perhaps only if you can also be very certain that the output is correct whenever the logprobs don't trigger the filter.

If that's not the case then it might just trigger bad risk compensation behavior in the model's human operators.


I'm not an expert, either, but I've poked at this a little. From what I've seen, token logprobs are correlated enough with correctness of the answer to serve as a useful signal at scale, but it's a weak enough correlation that it probably isn't great for evaluating any single output.

My best guess is that somewhere close to the root of the problem is that language models still don't really distinguish syntagmatic and paradigmatic relationships. The examples in this article are a little bit forced in that respect because the alternatives it shows in the illustrations are all paradigmatic alternatives but roughly equivalent from a syntax perspective.

This might relate to why, within a given GPT model generation, the earlier versions with more parameters tend to be more prone to hallucination than the newer, smaller, more distilled ones. At least for the old non-context-aware language models (the last time I really spent any serious time digging deep into language models), it was definitely the case that models with more parameters would tend to latch onto syntagmatic information so firmly that it could kind of "overwhelm" the fidelity of representation of semantics. Kind of like a special case of overfitting just for language models.


maybe this signal needs to be learned in the final step of reinforcement learning where people decide whether "I don't know" is the right answer


I'm not sure it's easy to understand what a big change there has been in the perceived pace of computer technology development if you weren't there. I'm typing this on a laptop that I purchased 11 years ago, in 2013. It's still my one and only home computer, and it hasn't given me any trouble.

In 1994, though, an 11 year old computer would already be considered vintage. In 1983 the hot new computer was the Commodore 64. In 1994 everyone was upgrading their computers with CD-ROM drives so they could play Myst.


Hilariously enough, you could still purchase a brand new Commodore 64 in 1994... albeit right before Commodore went bankrupt in May of that year. I vaguely remember some local electronics store in Pittsburgh having Commodore 64s for sale on the store shelves for really low prices back in the day. Admittedly, this was an unusual sight to behold in the US, because we had well since moved on to IBM PC Compatibles by then. In Europe, C64s were a tad bit easier to source.

It was definitely more of a curiosity and a toy rather than a serious computer in 1994.


Dark matter is not a theory, per se. There are many, many theories that attempt to explain dark matter. Some of them have yet to produce testable hypotheses, others have already been tested.


Thank you. Dark matter is the issue in cosmology that "it appears as though undetectable matter is present in the universe causing X, Y, Z phenomenon."

The issue that I have with people calling dark matter a theory is that they think it requires matter to solve. It doesn't. MOND is a dark matter theory. It explains(in part) why it appears as though undetectable matter is present in galaxies causing disc velocity to not match expectations.


Testable hypotheses are at the core of the scientific method, yes. But that's not just limited to the actual testing of hypotheses. All the work that goes into formulating hypotheses is also explicitly part of the scientific method.

Worth noting, too, that the paper outlines several possible experiments. It also specifically mentions some relative shortcomings of the model, and lists existing observations that they haven't tried to reconcile with it yet.


I think that similar arguments were made about Ma Bell and Bell Labs back in the day. And it's true, a lot of great things did come out of Bell Labs.

In fact, it almost seems like the only people able to produce great things in the 1970s were massive entrenched corporations like Ma Bell.

Funny, that.

Come to think of it, wasn't there a much more vibrant browser ecosystem in the late 90s and early 2000s, before Google used its dominant position in the ad market to undercut the competition? There used to be a lot more mobile operating systems out there, too.

I wonder what happened to all that competition? It's almost like some sort of massive anti-competitive influence came into force in the tech scene somewhere in the 2000s. . .


I recall a video essay years ago that made the case that the reason companies like Bell and RCA were so successful and produced so many R&D products was because the tax code incentivized reinvesting profits into the company, and the pursuit of patents to license to other companies specializing in manufacturing as a means of revenue as opposed to vertical conglomerates. Wish I'd bookmarked it, because they did an excellent job citing sources as well.

My subjective experiences tell me that, just like the early days of the microcomputer revolution, anything is possible with talented nerds who don't have to worry about grinding at their day jobs to survive. Early markets are often defined by those with the privilege to innovate absent the need to work to survive, and sharing the fruits of their labors with the masses because that's their entire intent - and that being able to live off of that income instead of a corporate gig was a nice bonus.

If you want more innovation, focus on eliminating societal precarity instead of slashing regulations or growing monopolies.


>Come to think of it, wasn't there a much more vibrant browser ecosystem in the late 90s and early 2000s, before Google used its dominant position in the ad market to undercut the competition?

No? It used to be IE and Mozilla, and now it's Edge, Chrome, and Mozilla. Opera existed then and now, and probably more people use it now, but it's still small enough that no one cares. I suppose you could make a point about Edge using Chrome's engine, but that's because the IE one sucked and the new one Microsoft made for Edge sucked so they eventually switched to using Chrome's. But the idea that the browser market was somehow better back in the day is hilarious and wrong.

>There used to be a lot more mobile operating systems out there, too.

Not really. I suppose early on during the smartphone era, blackberry was still around, but they mostly lost out due to Apple finally getting decent MDM, and not bothering to improve their product after a while, than the fact that Android was growing in popularity. Microsoft entered kinda late and never really developed their phone OS enough and eventually gave up, but that's because their product wasn't good enough, not because of anything the others were doing to stop them.


Edge is reskinned Chrome. Opera is also reskinned Chrome. The whole point of having multiple browsers is to get multiple competing implementations of web standards, so a single vendor can't force unilaterally force its features or its particular interpretation of a feature over the entire market.


Before: Trident, Webkit, Gecko, Presto After: Blink/Webkit, Gecko/Quantum

We're seeing less engines which is far more important than the browser wrapper. Also Quantum's development is pretty much driven by a desire to maintain feature parity with Blink which means Google gets control over what the web is according to every major browser. The fact that there are a variety of companies whose browsers are under Google's control is irrelevant in terms of anti-competitive discussion.


> The fact that there are a variety of companies whose browsers are under Google's control

Que?

If Google did some heinous stuff, tomorrow Microsoft would hard-fork Chromium and Brave et al would just switch their upstream to Edge.


> Microsoft would hard-fork Chromium and Brave et al would just switch their upstream to Edge.

Doubt.

I'll believe it when I see it. Maintaining a hard fork is almost as hard as a greenfield browser like old Edge or old Opera. There are no serious competitors doing hard Chromium forks besides Apple. (Afraid to admit Firefox isn't a serious competitor anymore.)


>We're seeing less engines which is far more important than the browser wrapper.

That's moving the goalposts, but honestly in the past it was IE and sometimes mozilla deciding how the web was going to work and anyone else playing catch up, which is essentially still what it is.


> Microsoft entered kinda late and never really developed their phone OS enough and eventually gave up, but that's because their product wasn't good enough, not because of anything the others were doing to stop them.

Not sure if this is meant tongue-in-cheek.

Google very aggressively chased any 3rd-party Windows Phone apps out of town that were Google—services compatible, whilst refusing to release 1st party apps themselves.

Microsoft shares a fair part in the blame because they made developers switch frameworks like… 5 times (?) in the span of 3 OS versions. Not to mention the constant sunsetting of devices.

The UI was amazing though. All content and no dressing, performant on low-end hardware, had dark mode half a decade before Android / iOS.


Microsoft just didn't want it bad enough. It's a similar situation to when they joined the video game market, except they wanted that and took a loss to stay in the market and now are essentially the main console.


I agree on the Bell Labs analogy

Most browsers have consolidated over time because we are constantly updating web standards and bar for security is so high. On top of that everything has to be insanely backward compatible

WebGPU is a good example. Implementing that securely in a nightmare


It might be time more of us think about the browser/chromium like Linux/kernel

There are lots of distos out there, but we all use the same core and make it better & safer together.


> It might be time more of us think about the browser/chromium like Linux/kernel

Coming from Enterprise Architecture world, if you're not already treating browsers as full-fledged operating systems to manage and secure, then you're operating dangerously. In fact, that's actually why I'm resistant to further "webification" of software and applications, as it's the same drawbacks as nested virtualization: now we have the OS layer that makes the computer run and the web browser layer to interact with stuff to worry about, both of which have their own performance penalties and threat profiles.

As much as I love REST APIs (and boy, do I love them and their simplicity), I don't like the idea of everything running a web server when it doesn't have to be.


I agree with your statements

What I was trying to say is that we only have a single kernel in the linux world without complaint, so having a single browser "kernel" (chromium) can be seen as a good thing. We have multiple distros (chrome, edge, brave, etc) for the browser as well


For what it’s worth, I have been making the exact same argument for a few years now. At this point, Blink has become the kernel for the web, so why not focus all our efforts there?

Hell, even Firefox could relatively easily swap to running on Blink since most of their UI these days is CSS+JS.


The central premise of the article seems to me to be a likely misunderstanding of the problem. I would bet that a Scrum Product Owner who uses a "command and control" leadership style and doesn't know how to properly delegate authority would be doing the same under any other development framework, too.

In general I'm a big fan of the "single wringable neck" principle. I've seen it put to great effect in the hands of a skilled leader, in both Scrum and non-Scrum teams. Better yet, when the leader isn't managing things well, it also leaves no question that they're the one who needs to figure out how to set things straight. Same goes for their delegates.

And for the ICs it makes collaboration easier - and therefore, ironically, enables them to work more autonomously. When everyone unambiguously knows what they're in charge of and what the team's big-picture objectives are, they have everything they need to independently figure out how best to make it happen. And it's also a lot easier to figure out who to talk to when they need to call attention to a problem.

I've seen a lot less luck with shared authority and informal delegation. On the best of days, it turns decisionmaking into an unnecessarily political process. More likely, the team will settle into an informal consensus process that typically operates as "rule by the obdurate" in practice. And when things get tough, the leaders will tend to slide into unproductive bickering that all but precludes actually fixing the problem.

Favorite readings that touch on this kind of thing: The Tyranny of Structurelessness by Jo Freeman, and Turn the Ship Around! by David Marquet.


The old chestnut about AI just being a term for things we haven't quite figured out yet might apply here. "Products that are well-known... and are used on a daily basis by a significant amount of people" are almost by definition not AI.

But here are some examples of things that used to fall under the AI umbrella but don't really anymore:

  - Fulltext search with decent semantic hit ranking (Google)
  - Fulltext search with word sense disambiguation (Google)
  - Fulltext search with decent synonym hits (Google)
  - Machine translation
  - Text to speech
  - Speech to text
  - Automated biometric identification (Like for unlocking your phone)
If you're more specifically asking for everyday applications of GPT-style generative large language models, I don't think that's going to happen for cost reasons. These things are still far too expensive for use in everyday consumer products. There's ChatGPT, but it's kind of an open secret that OpenAI is hemorrhaging money on ChatGPT.


But do I even want an actually smart Siri?

Microsoft's been trying to ram essentially that down my throat for the better part of a year now, and it's mostly convinced me that the answer is "no". I don't want to have arbitrary conversations with my computer.

I still just want the same thing I've been wanting from my digital assistant for 30 years now: fewer "eat up Martha" moments, and handling more intents so that I can ask "When does the next east-bound bus come?" and it stops answering questions like "Will it rain today?" as if I had asked "Is it raining right now?". None of those are particularly appropriate problems for a GPT-style model.


I would like to maybe be able to say:

Set two timers, one for 20 minutes, and another for 50 minutes.

or

Turn off the lights in my living room, and my office.

That's about as advanced as I want my home assistants to be though.


Clarke's Third Law is, has always been, and always will be the best explanation for how futurists think about these kinds of things.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: