OpenAI, the company co-founded by Elon Musk and backed by Microsoft, has already mastered Dota 2 and the art of writing fake news. Now, it has reached another milestone with DALL-E (a portmanteau of “Wall-E” and “Dali”), an AI app that can create an image out of nearly any description. For example, if you ask for “a cat made of sushi” or a “high quality illustration of a giraffe turtle chimera,” it will deliver those things, often with startlingly good quality (and sometimes not).
DALL-E can create images based on a description of its attributes, like “a pentagonal green clock,” or “a collection of glasses is sitting on a table.” In the latter example, it places both drinking and eye glasses on a table with varying degrees of success.
It can also draw and combine multiple objects and provide different points of view, including cutaways and object interiors. Unlike past text-to-image programs, it even infers details that aren’t mentioned in the description but would be required for a realistic image. For instance, with the description “a painting of a fox sitting in a field during winter,” the agent was able to determine that a shadow was needed.
“Unlike a 3D rendering engine, whose inputs must be specified unambiguously and in complete detail, DALL·E is often able to ‘fill in the blanks’ when the caption implies that the image must contain a certain detail that is not explicitly stated,” according to the OpenAI team.
OpenAI also exploits a capability called “zero-shot reasoning.” This allows an agent to generate an answer from a description and cue without any additional training, and has been used for translation and other chores. This time, the researchers applied it to the visual domain to perform both image-to-image and text-to-image translation. In one example, it was able to generate an image of a cat from a sketch, with the cue “the exact same cat on the top as the sketch on the bottom.”
The system has numerous other talents, like understanding how telephones and other objects change over time, grasping geographic facts and landmarks and creating images in photographic, illustration and even clip-art styles.
For now, DALL-E is pretty limited. Sometimes, it delivers what you expect from the description and other times you just get some weird or crappy images. As with other AI systems, even the researchers themselves don’t understand exactly how it produces certain images due to the black box nature of the system.
Still, if developed further, DALL-E has vast potential to disrupt fields like stock photography and illustration, with all the good and bad that entails. “In the future, we plan to analyze how models like DALL·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer term ethical challenges implied by this technology,” the team wrote. To play with DALL-E yourself, check out OpenAI’s blog.