authors can claim that they allow for public use unless it's used for training LLMs. And all of training work would fall under 2 because they would be used against the copyright.
I think they would need to have some explicit contract every time they want to sell the book then, though. I don’t think I am bound by some random terms someone writes into a book I’m buying. Those probably are only binding if a reasonable person would notice them before sale.
If you arrive at the point of being able to buy that book, it means it has passed the publisher's hands and I would think, that the publisher was OK with those terms then, and limiting the usage of the text may in fact be effective.
If it was self-published, then even more so.
But the license restriction would have to apply both to the publisher and the customer.
If I go to the bookstore, buy the book, make a scan, and train an LLM with it, how would you enforce your license as an author? The customer never knew that he shouldn’t have been allowed to train LLMs.
Edit: I think I misunderstood the original comment, I thought the idea was to sell books and restrict use for LLM training. If we’re only talking about stuff that’s publicly released, the restriction should be possible.
But the license doesn’t apply to me as a customer if I can’t be expected to even notice it. If I buy a book in a bookstore, no one would assume that training LLMs on it would be explicitly forbidden. And adding a note to the book would probably not be binding because no one is expected to read the legal notice in a book.
It would still be unenforceable because there's no consideration.
There is nothing of value that the license gives me that I wouldn't already have if the contract didn't exist. I can already read the book, merely by having it in front of me.
How does that give you the right to train an LLM on it?
Or are we talking about training an LLM on it and never releasing that LLM to anyone ever? Then I guess it wouldn't matter. But if that LLM is released to anyone, shouldn't the author of the book have a say on it?
I felt for a long time that it should be fair use. If an LLM can abstract what it learns from the copyrighted work, then that seems "fair" because that's what humans do.
But ... as I've thought about it more, it doesn't really feel just to me. The kind of value reaped from the works seems to suggest that the creator is due some portion of that value. Also, in practice - there's just an absolutely enormous amount of knowledge that can be consumed from the public domain. Even if Meta, OpenAI and friends decided to license a ~small handful of the long-term archives of some globally-read newspapers, they could get very broad and deep knowledge about the events, trends, terms of the last century to fill in a lot of gaps.
I see that lots of people mention it's missing E2E, but that's also the convenient part of it. You have all your chat history in the cloud, can open it at any new device including huge sizes of video and images.
Signal is good for E2E but not comparable in terms of convenience.
Several problems:
On mobile. Desktop not supported. Everything else is clear text and readable by telegram and authorities. Of course it’s missing encryption!
Really, telegram is being naked for 99% of the time, with an optional clothes feature that is limited and mobile only.
Whereas signal is fully dressed, all the time.
Signal is 100% encrypted, all groups encrypted, all messages encrypted, all contacts encrypted. The vendor knows when your registered… that is all.
Of course people who don’t understand the sheer dystopia of cleartext coms and visible social graphs —
They will say “when I need them “ I will encrypt. No, you must encrypt all the time. Messages must be indistinguishable. You don’t just encrypt the sensitive stuff, you should encrypt everything!
It is effectively missing E2E, because it misses the point completely. The point of E2E (and, particularly, PFS) is to a) exist before you even realize you need it; and b) make your private and not-so-private conversations indistinguishable from each other.
In other words, you want your chats with news and memes E2E by default, so when you chat about something sensitive you a) don't have to do anything, b) won't forget about it until it's too late and c) won't give away the fact that this particular conversation suddenly went private.
Telegram management is weirdly stubborn af in this regard. Which could be either "we know better" syndrome, simple ignorance, or even malice. They, however, undeniably know their ways with UX and marketing, so, once again, as in an old Russian proverb, we end up with a barrel of honey, with a spoonful of tar - the nicest-looking but crappily implemented tech always wins.
Majority of Telegram users don't really care about E2EE, IMO. I've been using the platform since 2016 and have been very happy thus far. What we care about are the ease of use, richness of the platform and rapid release of new features. Things you don't see on E2EE-focused services.
It's unethical to be the honey in the surveillance trap of any communications system that is not e2ee always and by default. You're exposing your friends and associates to easily avoidable and unnecessary hazards.
This week, Upstash, a leading serverless data platform, unveiled its latest product - Vector. Utilizing Upstash Vector allows you to effortlessly store and retrieve the most similar vectors based on your specified distance metric.
Using Upstash and Huggingface, you can build a face similarity system, all within a serverless environment.
I launched an open source side project of mine on ProductHunt! Would be really happy if you could support with upvotes and feedbacks if you like the project. Thank you
Memory-Vault
Whisper to future, Digital sticky notes, Learning machine
IMO there's a hack about this,
authors can claim that they allow for public use unless it's used for training LLMs. And all of training work would fall under 2 because they would be used against the copyright.