Article 75KRY Google's AI-enabled mouse pointer understands 'this' and 'that'

Google's AI-enabled mouse pointer understands 'this' and 'that'

by
from www.theregister.com - Articles on (#75KRY)
Story ImageGoogle doesn't design mouse traps, so it's trying to design a better mouse. Google DeepMind announced a research effort to transform the standard computer mouse cursor into a context-aware, AI-powered tool, marking what the company described as the first major rethinking of the cursor in more than 50 years. The project by researchers Adrien Baranes and Rob Marchant integrated Google's Gemini AI model with an experimental context-aware mouse pointer. In this way, the company said, the system can understand where a user clicks, what they are clicking on, and the likely intent behind the interaction. Researchers said there is a persistent friction in how people currently interact with AI tools. Most AI assistants today live in a separate window, requiring users to copy, paste, or drag content into a chat interface before receiving help. The new approach aims to reverse that dynamic. "We want the opposite: intuitive AI that meets users across all the tools they use, without interrupting their flow," the researchers stated in the blog post. The mouse pointer works alongside the computer's microphone, allowing Gemini to listen as the user points. This lets users refer to features on the screen with object pronouns like this" and that." In a demonstration website, a user can hover a cursor over a crab and say move this here," and the system understands enough context to grab the crab and move it to where the cursor indicates. The first computer mouse, a one-button prototype with metal wheels for the x- and y-axis, was built out of wood in 1964 and was patented in 1970 by its inventors Doug Engelbart and Bill English, who worked at the Stanford Research Institute. Engelbart foresaw a day when humans and computers would interact more easily and naturally, which he talked about during his 1997 acceptance speech for the Lemelson-MIT Prize. The computer technology, the digital capabilities, it's affecting communications, displays, storage, computer processing. It's affecting the way you can interface to things a lot more flexibly," he said. That's going to be so pervasively high-impact in our society and our organizations that it's more than anything we've had to cope with evolutionary wise." Maintain the flow At Google, the team said it laid out four design principles guiding the project. The first, which the researchers called "Maintain the flow," stated that AI capabilities should work across all applications rather than forcing users into separate AI-specific environments. Under this principle, a user could point at a PDF and request a summary, or hover over a statistics table and ask for a chart, all without leaving the current application. The next, "Show and tell," addressed the burden of prompt writing. The researchers stated that an AI-enabled pointer could capture visual and semantic context from the screen, reducing the need for users to write detailed text instructions to the model. They also developed the AI cursor based on how humans naturally communicate using short phrases and gestures like this" and that." The researchers stated that the system would allow users to issue commands like "Fix this" or "Move that here" while the AI fills in the contextual gaps. The fourth principle, "Turn pixels into actionable entities," lets the pointer recognize structured objects within on-screen content. The researchers stated that this capability could turn a photo of a handwritten note into an interactive to-do list, or convert a paused video frame showing a restaurant into a booking link. In the blog, the researchers said that Google DeepMind has already begun integrating the lessons learned into products. A feature called Magic Pointer will soon roll out on the forthcoming Googlebook laptop platform, which The Chocolate Factory introduced earlier this week. The company said the technology will also allow users of Gemini in Chrome to point at specific parts of a webpage and ask questions, rather than composing a full text prompt. Experimental demos of the AI-enabled pointer are currently available through Google AI Studio, where users can test image-editing and map-based interactions using the point-and-speak approach. The company said it plans to continue testing the concept across additional platforms, including Google Labs' Disco. (R)
External Content
Source RSS or Atom Feed
Feed Location http://www.theregister.co.uk/headlines.atom
Feed Title www.theregister.com - Articles
Feed Link https://www.theregister.com/
Reply 0 comments