And many of the papers in medical area are published with closed data because collecting that data is so expensive and everyone wants to hold onto it. Nobody can verify the results. Yet they are marked as "peer reviewed".
> The handful of megacorps that have access to the compute and troves of stolen IP to train their secret models on have no incentive to contribute back.
Meta and Anthropic both trained on pirated books and there were not required to destroy their models. I simply don't get it. It just encourages to do things first and see later what happens. Regulations are just a small business cost.
> Note that Anthropic has committed not to train models on logged data, so I don’t understand some of the concerns here. What exactly is your threat model? That
Like Meta had committed to respect your privacy. Replace the name of the company with any of the top 50 companies in the world and go back how many have hold their promises - or just doing fine when breaking the rules. There is no legislation in the U.S. that can bankrupt the company for violating this? So there are no guarantees.
Meta openly torrented books and nobody asked them to remove/destroy their AI models. Similarly, for Anthropic, it was just a business cost. They were allowed to keep the models. No real consequences for breaking the rules.
If they’re going to retain any data, they have to allow for possibility of the legal system to require any of it to be used in some legal proceeding at some point.
You can’t tell a judge who’s ordered you to retain something that you can’t because you said you wouldn’t.
> The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
Because this is not related to the GDPR at all, but the Digital Markets Act (DMA). It's purpose is to enable competition and not allow big tech to abuse their market dominance (e.g. in this case Apple not wanting to grant any competition the same access to MacOS so that they don't have to face competition for Siri AI).
Depending of the scenario, it can be very fine. E.g. if you just need one or two function call from the dependency. However, for some complex binary protocols it might be better to stick with libraries.
reply