Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Based on the encyclopedic knowledge LLMs have of written works I assume all parties did the same.

I don't understand why you wouldn't just buy copies of the books. Seems like such a relatively inexpensive way to strengthen your legal case.



Buying a copy of the book doesn’t grant you the right to copy it. That is what copyright is for.


It grants you the right to read & study it though.


The right to read and study you have by default. It's getting your hands on a book that has legal caveats attached.


Yes, but getting your hands on the material isn't a very interesting legal question IMO.

Whether you can train your LLM on it is a very interesting question.

I've personally never been in favor of punishing people for downloading (or seeding) things.


One which buying books for your LLM doesn't answer either. In analogy to humans, you might as well give your LLM a library card.


They might even have gotten away with legitimate use argument if it was not seeded.


Pretty sure that even if you gave a purchasing team enough money for retail price and a list of all books ever published, they wouldn't be able to buy even a quarter of them.


Plus some people will just not sell at any price.


Buying the books won't automatically give you permission to use the content commercially


thanks to the byzantine copyright system, you can't easily do it. Plus, just speculating, but maybe by paying, it establishes "consideration" for some implied contract? "You implicitly entered a contract with us by purchasing the book, then violated the contract by 'distributing' the material for commercial use" ?


There must be a publisher out there that forbids you from training an AI on the copy you buy from them by now.


Anna's Archive has 40 million books and 100 million papers. It's unlikely they could achieve similar coverage.


Too much paperwork, too much effort. These are important people, doing much more important stuff than whatever book authors do.

Or so they think, I think.


I doubt they think that way, but even if they did, they'd be right - for 99% of the works in question, the biggest value they gave to the world is, by far, being part of the LLM training corpus.

There's lots of content out there. Most of it is noise. People forget because they're only ever exposed to an aggressively curated fraction of it.


No, it's not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: