

If I understand correctly this is essentially how condensed models like Deepseek work and how they’re able to attain similar performance on much cheaper hardware. If all still goes through the LLM but LLM is a lot lighter because it has this sort of thing built in. That’s all a vast oversimplification.
I make an intentional point not to say please and thank you to these things, voice assistants like Alexa, and other computers that want to talk to me. Do the people who insist on thanking these things also say you’re welcome to the self checkout machine at Walmart when it says “thank you for shopping at Walmart?” It’s absurd.