This article was published on April 8, 2022

OpenAI’s new image generator sparks both excitement and fear

Commercial artists may want to look away now


OpenAI’s new image generator sparks both excitement and fear

OpenAI has unveiled a new AI tool that turns text into images — and the results are stunning.

Named DALL-E 2, the system is the successor to a model unveiled last year. While its predecessor generated some impressive outputs, the new version is a major upgrade.

This picture in the style of Claude Monet illustrates the improvements of the new model.
This picture in the style of Claude Monet illustrates the improvements. Credit: OpenAI

DALL-E-2 adds enhanced textual comprehension, faster image generation, and four times greater resolution.

“When approaching DALL-E 2 we focused on improving the image resolution quality and improving latency, rather than building a bigger system,” OpenAI researcher Aditya Ramesh told TNW.

The new tool also introduces two extra capabilities: reinterperations of existing images and an editing feature called inpainting.

Inpainting makes edits to an existing image by analyzing a natural language caption.

It can add and remove components, while integrating the expected changes to shadows, reflections, and textures.

DALL·E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account.
In this example, a corgi is placed in the user’s choice of location. Credit: OpenAI

DALL·E 2 was trained on pairs of images and their corresponding captions, which taught the model about the relationships between pictures and words.

New images are generated through a process called diffusion.

This begins with a pattern of random dots. The system then gradually transforms the pattern into a picture when it recognizes specific aspects of that image.

 

DALL-E 2 can combine concepts, attributes, and styles in its creations.
DALL-E 2 can combine concepts, attributes, and styles in its creations. Credit: OpenAI

Some of DALL-E 2’s creations look almost too good to be true. Yet the researchers say the system tends to generate visually coherent images for most captions that people try.

The above pictures of an astronaut, for example, were curated from a set of nine produced by the model. Prafulla Dhariwal a research scientist at OpenAI, said the results are generally consistent:

Sometimes, it can be helpful to iterate with the model in a feedback loop by modifying the prompt based on its interpretation of the previous one or by trying a different style like ‘an oil painting,’ ‘digital art,’ ‘a photo,’ ‘an emoji,’ etcetera. This can be helpful for achieving a desired style or aesthetic.

Budding fraudsters note that Klimt's less renowned Adele Bloch-Bauer I sold for $135 million in 2006.
A reimagining of Gustav Klimt’s “The Kiss.” Budding fraudsters may recall that a Klimt painting sold for $150 million in 2016. Credit: OpenAI

DALL-E 2’s potential uses are vast.

Graphic designers, app developers, media outlets, architects, commercial illustrators, and product designers could all use the tool for inspiration, new creations, and editing.

Commercial artists may be nervous about their future employment prospects. Ramesh acknowledges that many jobs could change:

We have seen AI be a good tool for people in the creative space. For example, as photo editing software has become more powerful and accessible it has allowed more people to enter the photography field. In recent years, we’ve also seen artists use AI to create new kinds of art.

It’s hard to predict the future, but we do know AI will have an impact on jobs much like personal computers did. The nature of many jobs will change, jobs that never existed before will be created, and others may be eliminated.

 

The system hasn’t yet been released to the public. OpenAI CEO Sam Altman hopes to launch the product this summer, but the researchers first want to investigate the risks.

They plan to integrate safeguards that prevent the system from generating deceptive and otherwise harmful content.

In addition, DALL·E 2 inherits various biases from its training data — and its outputs sometimes reinforce societal stereotypes.

The team has already removed explicit content from the training data and banned violent, hateful, and adult content in their content policy.

If filters identify images and text prompts that break the rules, the system won’t generate the outputs. Automated and human monitoring systems have also been implemented as safeguards against misuse.

 

 

View this post on Instagram

 

A post shared by DALL·E by OpenAI (@openaidalle)

Altman believes DALL-E’s mechanism could change how we interact with machines.

“This is another example of what I think is going to be a new computer interface trend: you say what you want in natural language or with contextual clues, and the computer does it,” he said in a blogpost.

DALL-E may also boost our understanding of how AI sees the world. OpenAI hopes this helps them create systems that benefit humanity — and aren’t manipulated to foster hatred and deception.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with