In a paper, media mogul Tim O'Reilly and economist Ilan Strauss say OpenAI likely trained GPT-4o on paywalled O'Reilly Media books without a licensing agreement.

Tea@programming.dev · 2 days ago

In a paper, media mogul Tim O'Reilly and economist Ilan Strauss say OpenAI likely trained GPT-4o on paywalled O'Reilly Media books without a licensing agreement.

Echo Dot@feddit.uk · edit-2 1 day ago

The other problem is that even if their books are in the data set there’s no evidence that they were taken directly from the source. OpenAI scrape websites right, and O’Reilly books are often pirated because of their predatory business model (they change their textbooks every year meaning you can’t use a previous year’s secondhand book). So it’s entirely possible, although unlikely, that the content got in there from scraping it from a pirate site.

Dadifer@lemmy.world · 23 hours ago

For copywrite, it doesn’t matter if it was taken directly from the source.

In a paper, media mogul Tim O'Reilly and economist Ilan Strauss say OpenAI likely trained GPT-4o on paywalled O'Reilly Media books without a licensing agreement.

In a paper, media mogul Tim O'Reilly and economist Ilan Strauss say OpenAI likely trained GPT-4o on paywalled O'Reilly Media books without a licensing agreement.

Original Research: Not-So-Original Assertions about Content Appropriation