Article 6GGK8 From Toy to Tool: DALL-E 3 is a Wake-Up Call for Visual Artists—and the Rest of Us

From Toy to Tool: DALL-E 3 is a Wake-Up Call for Visual Artists—and the Rest of Us

by
hubie
from SoylentNews on (#6GGK8)

Freeman writes:

https://arstechnica.com/information-technology/2023/11/from-toy-to-tool-dall-e-3-is-a-wake-up-call-for-visual-artists-and-the-rest-of-us/

In October, OpenAI launched its newest AI image generator-DALL-E 3-into wide release for ChatGPT subscribers. DALL-E can pull off media generation tasks that would have seemed absurd just two years ago-and although it can inspire delight with its unexpectedly detailed creations, it also brings trepidation for some. Science fiction forecast tech like this long ago, but seeing machines upend the creative order feels different when it's actually happening before our eyes.

"It's impossible to dismiss the power of AI when it comes to image generation," says Aurich Lawson, Ars Technica's creative director. "With the rapid increase in visual acuity and ability to get a usable result, there's no question it's beyond being a gimmick or toy and is a legit tool."

[...] ChatGPT and DALL-E 3 currently work hand-in-hand, making AI art generation into an interactive and conversational experience. You tell ChatGPT (through the GPT-4 large language model) what you'd like it to generate, and it writes ideal prompts for you and submits them to the DALL-E backend. DALL-E returns the images (usually two at a time), and you see them appear through the ChatGPT interface, whether through the web or via the ChatGPT app.

[...] However, those scraped captions-written by humans-aren't always detailed or accurate, which leads to some faulty associations that reduce an AI model's ability to follow a written prompt.

To get around that problem, OpenAI decided to use AI to improve itself. As detailed in the DALL-E 3 research paper, the team at OpenAI trained this new model to surpass its predecessor by using synthetic (AI-written) image captions generated by GPT-4V, the visual version of GPT-4. With GPT-4V writing the captions, the team generated far more accurate and detailed descriptions for the DALL-E model to learn from during the training process.

Original Submission

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments