@Robin

Robin@lemmy.world · 5 days ago

Playing devil’s advocate here. Mouse movements and key presses have been commonly used as bot detection method for a decade now. Like that captcha service that is just a checkbox, that’s part of how they guessed that you are not a bot.

Robin@lemmy.world · 5 days ago

Training data for these models used to be text off of the internet and some manually generated Q&A examples to make it behave more like a chat bot (instruction tuning). Because there is still a need for more data they have started adding AI generated text to the dataset. This technique doesn’t add new knowledge but it has shown to reduce hallucinations. Likely because this data is more focussed, truthful and structured than the median text from the existing datasets. They would probably have data from every major chat provider in there, especially the big boys.