Anthropic’s Updated AI Models Can Control Your Computer for You
If you've always wanted to offload some of your tedious computing busywork to artificial intelligence, that future is now a little closer: The updated Claude 3.5 Sonnet AI model that Anthropic just released is capable of taking over your mouse and keyboard, and completing tasks on its own.
Right now, this is only in beta testing, and only available to developers with access to the Claude API, but further down the line, we could all be getting AI to fill out forms, move files around, look for information on the web, and do all the other tasks we've previously relied on our fingers and thumbs for.
First up though, the updated Claude models: Anthropic has now pushed out Claude 3.5 Sonnet users, which it says offers "across-the-board improvements" and particularly significant upgrades in terms of coding capabilities, with significant performance bumps across standard benchmarking tests (including SWE-bench, based on GitHub).
Then there's Claude 3.5 Haiku, a new version of the faster, more lightweight, less expensive, and less powerful AI model offered by Anthropic. Again, all-around performance has been improved, the company says, and as with Sonnet, there are particular gains in terms of coding capabilities.
It's the computer use capabilities that are going to get the most attention though, enabled as part of the Claude 3.5 Sonnet update, and that promise to take desktop automation to the next level. For now, though, Anthropic emphasizes that it's very much a beta product.
Computer use in Claude 3.5 SonnetIn the Anthropic demo video below, you can see the Claude AI being tasked with filling out a form. The various bits of information needed for this form need to be grabbed from different databases and browser tabs-but all the user has to do is ask for the form to be filled out, and give an indication of where the necessary info can be found.
As Claude works through the tasks, it takes screenshots and studies them to see what it's looking at-which is along similar lines to the image recognition and analysis capabilities that AI is already well known for. It then figures out what it needs to do next based on what's on screen and the instructions it has been given.
In this case, the AI is smart enough to realize that it needs to switch to a different browser tab and run a search for a company name to find some of the information it's looking for. Cursor moving, cursor clicking, and typing are all handled by Claude along the way. The bot is able to identify the right data, and copy it over to the correct fields on the form.
At the end, Claude is smart enough to spot and select the form submission button on screen, which then completes the task-all while the user looks on. Right out of the gate, it seems the AI model is capable of understanding what's on screen, and figuring out how to manipulate that to complete tasks.
However, Anthropic notes that basic tasks like scrolling, dragging, and zooming still "present challenges" for Claude, and beta testers are being encouraged to test it using "low-risk" scenarios for the time being. In the OSWorld benchmark, which measures how well AI can perform computing tasks, Claude 3.5 Sonnet apparently scores 14.9% (humans typically score around 70-75%).
Claude can now follow prompts to carry out computer tasks. Credit: AnthropicThe developers behind the new capabilities haven't been afraid to point out some of the errors that can occur: In one test, Claude cancelled a screen recording for no apparent reason. In other, the bot suddenly and randomly switched from a coding task to start browsing online photos of Yellowstone National Park.
Anthropic also notes that each step forward in AI can bring with it new safety worries. As per an audit by its internal trust and safety teams, the computer use capabilities as they stand right now don't pose a heightened risk to system security-though this will be continually re-evaluated. What's more, no user-submitted data (including captured screenshots) will be used to train the Claude AI models.