For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.

      • Echo Dot@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        3 days ago

        But we don’t know what the false positive rate is either? How many submissions were blocked that shouldn’t have been, it seems like you don’t have a way to even find that metric out unless somebody complained about it.

    • acosmichippo@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      ·
      4 days ago

      90% errors isn’t accurate. It’s not that 90% of all facts in wikipedia are wrong. 90% of the featured articles contained at least one error, so the articles were still mostly correct.

      • pulsewidth@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 days ago

        And the featured articles are usually quite large. As an example, today’s featured article is on a type of crab - the article is over 3,700 words with 129 references and 30-something books in the bibliography.

        It’s not particularly unreasonable or unsurprising to be able to find a single error amongst articles that complex.