Google DeepMind's Latest AI Agent Learned To Play Goat Simulator 3
Will Knight, writing for Wired: Goat Simulator 3 is a surreal video game in which players take domesticated ungulates on a series of implausible adventures, sometimes involving jetpacks. That might seem an unlikely venue for the next big leap in artificial intelligence, but Google DeepMind today revealed an AI program capable of learning how to complete tasks in a number of games, including Goat Simulator 3. Most impressively, when the program encounters a game for the first time, it can reliably perform tasks by adapting what it learned from playing other games. The program is called SIMA, for Scalable Instructable Multiworld Agent, and it builds upon recent AI advances that have seen large language models produce remarkably capable chabots like ChatGPT. [...] DeepMind's latest video game project hints at how AI systems like OpenAI's ChatGPT and Google's Gemini could soon do more than just chat and generate images or video, by taking control of computers and performing complex commands. "The paper is an interesting advance for embodied agents across multiple simulations," says Linxi "Jim" Fan, a senior research scientist at Nvidia who works on AI gameplay and was involved with an early effort to train AI to play by controlling a keyboard and mouse with a 2017 OpenAI project called World of Bits. Fan says the Google DeepMind work reminds him of this project as well as a 2022 effort called VPT that involved agents learning tool use in Minecraft. "SIMA takes one step further and shows stronger generalization to new games," he says. "The number of environments is still very small, but I think SIMA is on the right track." [...] For the SIMA project, the Google DeepMind team collaborated with several game studios to collect keyboard and mouse data from humans playing 10 different games with 3D environments, including No Man's Sky, Teardown, Hydroneer, and Satisfactory. DeepMind later added descriptive labels to that data to associate the clicks and taps with the actions users took, for example whether they were a goat looking for its jetpack or a human character digging for gold. The data trove from the human players was then fed into a language model of the kind that powers modern chatbots, which had picked up an ability to process language by digesting a huge database of text. SIMA could then carry out actions in response to typed commands. And finally, humans evaluated SIMA's efforts inside different games, generating data that was used to fine-tune its performance. Further reading: DeepMind's blog post.
Read more of this story at Slashdot.