When AI is trained for treachery, it becomes the perfect agent

Opinion Last year, The Register reported on AI sleeper agents. A major academic study explored how to train an LLM to hide destructive behavior from its users, and how to find it before it triggered. The answers were unambiguously asymmetric - the first is easy, the second very difficult. Not what anyone wanted to hear....