Article 5YGVE MIT's Computer Vision (CV) Algorithm Identifies Images Down to the Pixel

MIT's Computer Vision (CV) Algorithm Identifies Images Down to the Pixel

by
janrinok
from SoylentNews on (#5YGVE)

upstart writes:

MIT's newest computer vision algorithm identifies images down to the pixel:

For humans, identifying items in a scene [...] is as simple as looking at them. But for artificial intelligence and computer vision systems, developing a high-fidelity understanding of their surroundings takes a bit more effort. Well, a lot more effort. Around 800 hours of hand-labeling training images effort, if we're being specific. To help machines better see the way people do, a team of researchers at MIT CSAIL in collaboration with Cornell University and Microsoft have developed STEGO, an algorithm able to identify images down to the individual pixel.

Normally, creating CV training data involves a human drawing boxes around specific objects within an image - say, a box around the dog sitting in a field of grass - and labeling those boxes with what's inside ("dog"), so that the AI trained on it will be able to tell the dog from the grass. STEGO (Self-supervised Transformer with Energy-based Graph Optimization), conversely, uses a technique known as semantic segmentation, which applies a class label to each pixel in the image to give the AI a more accurate view of the world around it.

Whereas a labeled box would have the object plus other items in the surrounding pixels within the boxed-in boundary, semantic segmentation labels every pixel in the object, but only the pixels that comprise the object - you get just dog pixels, not dog pixels plus some grass too. It's the machine learning equivalent of using the Smart Lasso in Photoshop versus the Rectangular Marquee tool.

The problem with this technique is one of scope. Conventional multi-shot supervised systems often demand thousands, if not hundreds of thousands, of labeled images with which to train the algorithm. Multiply that by the 65,536 individual pixels that make up even a single 256x256 image, all of which now need to be individually labeled as well, and the workload required quickly spirals into impossibility.

Instead, "STEGO looks for similar objects that appear throughout a dataset," the CSAIL team wrote in a press release Thursday. "It then associates these similar objects together to construct a consistent view of the world across all of the images it learns from."

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments