Article 3ZC4C AI learns to decipher images based on spoken words—almost like a toddler

AI learns to decipher images based on spoken words—almost like a toddler

by
Timothy B. Lee
from Ars Technica - All content on (#3ZC4C)
Screen-Shot-2018-09-21-at-3.36.21-PM-800

Enlarge / Given this picture and audio of the word "airliner," a neural network identifies the portions of the image where there's an airplane (indicated by the red lines). The software learned to do this entirely by looking at 400,000 pictures, each paired with a brief, free-form spoken description of the scene. (credit: David Harwath et al.)

Babies learn words by matching images to sounds. A mother says "dog" and points to a dog. She says "tree" and points to a tree. After repeating this process thousands of times, babies learn to recognize both common objects and the words associated with them.

Researchers at MIT have developed software with the same ability to learn to recognize objects in the world using nothing but raw images and spoken audio. The software examined about 400,000 images, each paired with a brief audio clip describing the scene. By studying these labels, the software was able to correctly label which portions of the picture contained each object mentioned in the audio description.

For example, this image comes with the caption "a white and blue jet airliner near trees at the base of a low mountain."

Read 7 remaining paragraphs | Comments

index?i=hrCx8a2Wypw:eNyKrKjUlFI:V_sGLiPB index?i=hrCx8a2Wypw:eNyKrKjUlFI:F7zBnMyn index?d=qj6IDK7rITs index?d=yIl2AUoC8zA
External Content
Source RSS or Atom Feed
Feed Location http://feeds.arstechnica.com/arstechnica/index
Feed Title Ars Technica - All content
Feed Link https://arstechnica.com/
Reply 0 comments