What this futuristic Olympics video says about the state of generative AI

James O'Donnell

from MIT Technology Review on 2024-09-03 15:08 (#6QEMW)

The Olympic Games in Paris just finished last month and the Paralympics are still underway, so the 2028 Summer Olympics in Los Angeles feel like a lifetime from now. But the prospect of watching the games in his home city has Josh Kahn, a filmmaker in the sports entertainment world who has worked in content creation for both LeBron James and the Chicago Bulls, thinking even further into the future: What might an LA Olympics in the year 3028 look like?

It's the perfect type of creative exercise for AI video generation, which came into the mainstream with the debut of OpenAI's Sora earlier this year. By typing prompts into generators like Runway or Synthesia, users can generate fairly high-definition video in minutes. It's fast and cheap, and it presents few technical obstacles compared with traditional creation techniques like CGI or animation. Even if every frame isn't perfect-distortions like hands with six fingers or objects that disappear are common-there are, at least in theory, a host of commercial applications. Ad agencies, companies, and content creators could use the technology to create videos quickly and cheaply.

Kahn, who has been toying with AI video tools for some time, used the latest version of Runway to dream up what the Olympics of the future could look like, entering a new prompt in the model for each shot. The video is just over one minute long and features sweeping aerial views of a futuristic version of LA where sea levels have risen sharply, leaving the city crammed right up to the coastline. A football stadium sits perched on top of a skyscraper, while a dome in the middle of the harbor contains courts for beach volleyball.

The video, which was shared exclusively with MIT Technology Review, is meant less as a road map for the city and more as a demonstration of what's possible now with AI.

We were watching the Olympics and the amount of care that goes into the cultural storytelling of the host city," Kahn says. There's a culture of imagination and storytelling in Los Angeles that has kind of set the tone for the rest of the world. Wouldn't it be cool if we could showcase what the Olympics would look like if they returned to LA 1,000 years from now?"

More than anything, the video shows what a boon the generative technology may be for creators. However, it also indicates what's holding it back. Though Kahn declined to share his prompts for the shots or specify how many prompts it took to get each take right, he did caution that anyone wishing to create good content with AI must be comfortable with trial and error. Particularly challenging in his futuristic project was getting the AI model to think outside the box in terms of architecture. A stadium hovering above water, for example, is not something most AI models have seen many examples of in their training data.

With each shot requiring a new set of prompts, it's also hard to instill a sense of continuity throughout a video. The color, angle of the sun, and shapes of buildings are difficult for a video generation model to keep consistent. The video also lacks any close-ups of people, which Kahn says AI models still tend to struggle with.

These technologies are always better on large-scale things right now as opposed to really nuanced human interaction," he says. For this reason, Kahn imagines that early filmmaking applications of generative video might be for wide shots of landscapes or crowds.

Alex Mashrabov, an AI video expert who left his role as director of generative AI at Snap last year to found a new AI video company called Higgsfield AI, agrees on the current failures and flaws of AI video. He also points out that good dialogue-heavy content is hard to produce with AI, as it tends to hinge upon subtle facial expressions and body language.

Some content creators may be reluctant to adopt generative video simply because of the amount of time required to prompt the models again and again to get the end result right.

Typically, the success rate is one out of 20," Mashrabov says, but it's not uncommon to need 50 or 100 attempts.

For many purposes, though, that's good enough. Mashrabov says he's seen an uptick in AI-generated video advertisements from massive suppliers like Temu. In goods-producing countries like China, video generators are in high demand to quickly make in-your-face video ads for particular products. Even if an AI model might require lots of prompts to yield a usable ad, filming it with real people, cameras, and equipment might be 100 times more expensive. Applications like this might be the first use of generative video at scale as the technology slowly improves, he says.

Although I think this is a very long path, I'm very confident there are low-hanging fruits," Mashrabov says. We're figuring out the genres where generative AI is already good today."

Source	RSS or Atom Feed
Feed Location	https://www.technologyreview.com/stories.rss
Feed Title	MIT Technology Review
Feed Link	https://www.technologyreview.com/