Hacker Newsnew | past | comments | ask | show | jobs | submit | farukozderim's commentslogin

good distinction

IMO there's a hack about this,

authors can claim that they allow for public use unless it's used for training LLMs. And all of training work would fall under 2 because they would be used against the copyright.


I think they would need to have some explicit contract every time they want to sell the book then, though. I don’t think I am bound by some random terms someone writes into a book I’m buying. Those probably are only binding if a reasonable person would notice them before sale.


If you arrive at the point of being able to buy that book, it means it has passed the publisher's hands and I would think, that the publisher was OK with those terms then, and limiting the usage of the text may in fact be effective. If it was self-published, then even more so.


But the license restriction would have to apply both to the publisher and the customer.

If I go to the bookstore, buy the book, make a scan, and train an LLM with it, how would you enforce your license as an author? The customer never knew that he shouldn’t have been allowed to train LLMs.

Edit: I think I misunderstood the original comment, I thought the idea was to sell books and restrict use for LLM training. If we’re only talking about stuff that’s publicly released, the restriction should be possible.


Whether you make a scan of it or not, the license applies to the IP, I guess (IANAL).

Whether the shop makes a scan should not affect you as the buyer of the actual book. What does the scan have to do with you?

Whether the author learns about that scan and perhaps training of some LLM using the scan or not, does not change the legality of it.


But the license doesn’t apply to me as a customer if I can’t be expected to even notice it. If I buy a book in a bookstore, no one would assume that training LLMs on it would be explicitly forbidden. And adding a note to the book would probably not be binding because no one is expected to read the legal notice in a book.


Ah, I assumed, that the clauses regarding the use in training of an LLM are printed inside the book somewhere.


It would still be unenforceable because there's no consideration.

There is nothing of value that the license gives me that I wouldn't already have if the contract didn't exist. I can already read the book, merely by having it in front of me.


How does that give you the right to train an LLM on it?

Or are we talking about training an LLM on it and never releasing that LLM to anyone ever? Then I guess it wouldn't matter. But if that LLM is released to anyone, shouldn't the author of the book have a say on it?


> How does that give you the right to train an LLM on it?

Fair use gives me that right, not a contract or license.


Whether that falls under fair use is highly debatable.


It's going through the courts right now. We'll probably have an answer in a year or two.


I felt for a long time that it should be fair use. If an LLM can abstract what it learns from the copyrighted work, then that seems "fair" because that's what humans do.

But ... as I've thought about it more, it doesn't really feel just to me. The kind of value reaped from the works seems to suggest that the creator is due some portion of that value. Also, in practice - there's just an absolutely enormous amount of knowledge that can be consumed from the public domain. Even if Meta, OpenAI and friends decided to license a ~small handful of the long-term archives of some globally-read newspapers, they could get very broad and deep knowledge about the events, trends, terms of the last century to fill in a lot of gaps.


I see that lots of people mention it's missing E2E, but that's also the convenient part of it. You have all your chat history in the cloud, can open it at any new device including huge sizes of video and images.

Signal is good for E2E but not comparable in terms of convenience.


I don’t understand why people say telegram is missing e2e. There is a convenient way to create e2e chat when you need them (which is “rare” for me).


Several problems: On mobile. Desktop not supported. Everything else is clear text and readable by telegram and authorities. Of course it’s missing encryption!

Really, telegram is being naked for 99% of the time, with an optional clothes feature that is limited and mobile only. Whereas signal is fully dressed, all the time.

Signal is 100% encrypted, all groups encrypted, all messages encrypted, all contacts encrypted. The vendor knows when your registered… that is all.

Of course people who don’t understand the sheer dystopia of cleartext coms and visible social graphs — They will say “when I need them “ I will encrypt. No, you must encrypt all the time. Messages must be indistinguishable. You don’t just encrypt the sensitive stuff, you should encrypt everything!


It is effectively missing E2E, because it misses the point completely. The point of E2E (and, particularly, PFS) is to a) exist before you even realize you need it; and b) make your private and not-so-private conversations indistinguishable from each other.

In other words, you want your chats with news and memes E2E by default, so when you chat about something sensitive you a) don't have to do anything, b) won't forget about it until it's too late and c) won't give away the fact that this particular conversation suddenly went private.

Telegram management is weirdly stubborn af in this regard. Which could be either "we know better" syndrome, simple ignorance, or even malice. They, however, undeniably know their ways with UX and marketing, so, once again, as in an old Russian proverb, we end up with a barrel of honey, with a spoonful of tar - the nicest-looking but crappily implemented tech always wins.


Majority of Telegram users don't really care about E2EE, IMO. I've been using the platform since 2016 and have been very happy thus far. What we care about are the ease of use, richness of the platform and rapid release of new features. Things you don't see on E2EE-focused services.


It's unethical to be the honey in the surveillance trap of any communications system that is not e2ee always and by default. You're exposing your friends and associates to easily avoidable and unnecessary hazards.


the fact that e2ee chats are tied to the device you start them on makes it useless, it's clearly not a priority for them

In my case it is not rare to prefer my conversations not to be stored in plaintext on someone else's computer


There are messengers where you can have both


This week, Upstash, a leading serverless data platform, unveiled its latest product - Vector. Utilizing Upstash Vector allows you to effortlessly store and retrieve the most similar vectors based on your specified distance metric.

Using Upstash and Huggingface, you can build a face similarity system, all within a serverless environment.

Try out the demo, it's fun :)


Hello There!

I launched an open source side project of mine on ProductHunt! Would be really happy if you could support with upvotes and feedbacks if you like the project. Thank you

Memory-Vault

Whisper to future, Digital sticky notes, Learning machine

ProductHunt

https://www.producthunt.com/posts/memory-vault

LandingPage

https://github.com/FarukOzderim/Memory-Vault/blob/master/REA...

You can use it for

1.Habit Building

2.Language Learning

3.Learning the way of Entrepreneurship

4.Remembering names

5.Notetaking

6.Or, Anything Custom, Memory Vault is very flexible and general solution!


You can now build a space comparing Hugging Face Models and Spaces or create clones of them with Model Comparator Space Builder :=)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: