Data Poisoning could be a tool we use to identify AI that has used copyritten material

ekZepp@lemmy.world · 8 days ago

Data Poisoning could be a tool we use to identify AI that has used copyritten material

Arthur Besse@lemmy.ml · 6 days ago

Yep. But just providing a list of millions of URLs and saying “we trained on this” as some models in the past have done also didn’t make it possible to replicate; by the time anyone re-fetches them all, many of the URLs will inevitably have changed or disappeared.

YourMomsTrashman@lemmy.world · 6 days ago

That’s exactly why projects like the common crawl exist though !

Data Poisoning could be a tool we use to identify AI that has used copyritten material

Data Poisoning could be a tool we use to identify AI that has used copyritten material

Poison Your Data. Fight Back Against AI.