OpenAI Launches Operator, an AI Agent That Can Operate Your Computer

hubie

from SoylentNews on 2025-01-31 03:55 (#6TYS9)

New research "Computer-Use Agent" AI model can perform multi-step tasks through a web browser:

On Thursday, OpenAI released a research preview of "Operator," a web automation tool that uses a new AI model called Computer-Using Agent (CUA) to control computers through a visual interface. The system performs tasks by viewing and interacting with on-screen elements like buttons and text fields similar to how a human would.
Operator is available today for subscribers of the $200-per-month ChatGPT Pro plan at operator.chatgpt.com. The company plans to expand to Plus, Team, and Enterprise users later. OpenAI intends to integrate these capabilities directly into ChatGPT and later release CUA through its API for developers.
Operator watches on-screen content while you use your computer and executes tasks through simulated keyboard and mouse inputs. The Computer-Using Agent processes screenshots to understand the computer's state and then makes decisions about clicking, typing, and scrolling based on its observations.
OpenAI's release follows other tech companies as they push into what are often called "agentic" AI systems, which can take actions on a user's behalf. Google announced Project Mariner in December 2024, which performs automated tasks through the Chrome browser, and two months earlier, in October 2024, Anthropic launched a web automation tool called "Computer Use" focused on developers that can control a user's mouse cursor and take actions on a computer.
"The Operator interface looks very similar to Anthropic's Claude Computer Use demo from October," wrote AI researcher Simon Willison on his blog, "even down to the interface with a chat panel on the left and a visible interface being interacted with on the right."
To use your PC like you would, the Computer-Using Agent works in multiple steps. First, it captures screenshots to monitor your screen, then analyzes those images (using GPT-4o's vision capabilities with additional reinforcement learning) to process raw pixel data. Next, it determines what actions to take and then performs virtual inputs to control the computer. This iterative loop design reportedly lets the system recover from errors and handle complex tasks across different applications.
While it's working, Operator shows a miniature browser window of its actions.

Source	RSS or Atom Feed
Feed Location	https://soylentnews.org/index.rss
Feed Title	SoylentNews
Feed Link	https://soylentnews.org/
Feed Copyright	Copyright 2014, SoylentNews