Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not for me.

Wikipedia had its day, in between print encyclopedias and quick query AI. Its place in history is now set.

Something else will come along soon enough.



Until LLMs gain the ability to cite their sources, they will be, at best, a search engine on top of Wikipedia, and not a replacement for it.


Most/all LLMs have already been able to cite sources for quite some time now.


Citing is not referencing links, it's finding primary sources and fact-checking. Kurzgesagt[1], an informational YouTube channel, has had issues with LLMs citing LLM generated content.

[1]: https://youtube.com/watch?v=_zfN9wnPvU0&t=175


LLMs aren't arbiters of truth - they're just natural language search engines. In a way it's quite similar to Wikipedia. Something being on Wikipedia doesn't mean that it's true, but rather it means that a "reliable source", which are not infrequently less than reliable, said so. And even that often doesn't hold as there's another layer of indirection where it's an editor saying this is what a source said, which is again not infrequently not exactly accurate. And then there's 'citogenesis' [1] where imaginary facts can be circularly whisked into existence.

This is why Wikipedia is not a source, but can provide links to sources (which then, in turn, often send you down a rabbit hole trying to find their sources), and it's then up to you to determine the value and accuracy of those sources. For instance I enjoy researching historic economic issues and you'll often find there's like 5 layers of indirection before you can finally get to a first party source, and at each step along the road it's like a game of telephone of being played. It's the exact same with LLMs.

[1] - https://xkcd.com/978/


I've had LLMs cite me bullshit many times, links that don't exist and claiming it does. It even cited a very realistic git commit log entry about a feature that never existed.

Haven't yet had the same issue with Wikipedia.


…. And it will be worse


Disagree. Say I'm looking for a list of countries and their populations.

Wikipedia almost certainly has this in a nice table, which I can sort by any column, and all the countries are hyperlinked to their own articles, and it probably links to the concept of population estimation too.

There will be a primary source - But would a primary source also have articles on every country? That are ad-free, that follow a consistent format? That are editable? Then it's just Wikipedia again. If not, then you have to rely on the LLM to knit together these sources.

I don't see wikis dying yet.

At work, I had rigged one of my internal tools so that when you were looking at a system's health report, it also linked to an internal wiki page where we could track human-edited notes about that system over time. I don't think an AI can do this, because you can't fine-tune it, you can't be sure it's lossless round-tripping, and if it has to do a web search, then it has to search for the wiki you said is obsolete.

OpenStreetMap does the same thing. Their UIs automatically deep-link every key into their wiki. So if you click on a drinking fountain, it will say something like "amenity:drinking_water" and the UI doesn't know what that is, but it links you to the wiki page where someone's certainly put example pictures and explained the most useful ways to tag it.

There has to be a ground truth. Wikipedia and alike are a very strong middle point on the Pareto frontier between primary sources (or oral tradition, for OSM) and LLM summary


Without Wikipedia, where will AIs get their (factual) training data? Reddit?


Torrented books


Backups of wikipedia.


LLMs are useless without source material.

AI companies should be donating large sums of money to Wikipedia and other such sites to keep them healthy. Without good sources, we’re going to have AI training off AI slop.


They should be, yes, but they won't. We already know this from the way those same companies have treated open source projects that they depend on.

One thing that I would really like to see is some kind of hefty tax on any kind of income derived from models trained on Wikipedia. Basically, make it legal to train, to share weights etc freely, and hosting them locally. But the moment you start charging people for subscription, the society should start charging you to maintain the commons that you are profiting from.

(This likely goes for more than Wikipedia, but that case is especially simple since there's a single legal entity that could be given the money.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: