• Echo Dot@feddit.uk
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 day ago

    The other problem is that even if their books are in the data set there’s no evidence that they were taken directly from the source. OpenAI scrape websites right, and O’Reilly books are often pirated because of their predatory business model (they change their textbooks every year meaning you can’t use a previous year’s secondhand book). So it’s entirely possible, although unlikely, that the content got in there from scraping it from a pirate site.