Eureka: With GPT-4 overseeing training, robots can learn much faster
Enlarge / In this still captured from a video provided by Nvidia, a simulated robot hand learns pen tricks, trained by Eureka, using simultaneous trials. (credit: Nvidia)
On Friday, researchers from Nvidia, UPenn, Caltech, and the University of Texas at Austin announced Eureka, an algorithm that uses OpenAI's GPT-4 language model for designing training goals (called "reward functions") to enhance robot dexterity. The work aims to bridge the gap between high-level reasoning and low-level motor control, allowing robots to learn complex tasks rapidly using massively parallel simulations that run through trials simultaneously. According to the team, Eureka outperforms human-written reward functions by a substantial margin.
Before robots can interact with the real world successfully, they need to learn how to move their robot bodies to achieve goals-like picking up objects or moving. Instead of making a physical robot try and fail one task at a time to learn in a lab, researchers at Nvidia have been experimenting with using video game-like computer worlds (thanks to platforms called Isaac Sim and Isaac Gym) that simulate three-dimensional physics. These allow for massively parallel training sessions to take place in many virtual worlds at once, dramatically speeding up training time.
"Leveraging state-of-the-art GPU-accelerated simulation in Nvidia Isaac Gym," writes Nvidia on its demonstration page, "Eureka is able to quickly evaluate the quality of a large batch of reward candidates, enabling scalable search in the reward function space." They call it "rapid reward evaluation via massively parallel reinforcement learning."