The system doesn't find an existing image based on your input, but creates real drawing. "If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch," said principal researcher Xiaodong He in a statement. "These birds may not exist in the real world — they are just an aspect of our computer's imagination of birds."
While the current form of this drawing technology isn't perfect, it's not hard to imagine a future where it could function as a sketch assistant for painters and interior designers or a tool to refine photos based on voice input. Farther out, researcher He imagines animated movies generated from a written script.
The team began its research into computer vision and natural language processing with the CaptionBot, an AI system that automatically writes captions for photos, then created a system to answer questions people ask about images called SeeingAI, which can be extra helpful if you're blind. The current technology consists of two parts: one that generates images known as a Generative Adversarial Network (GAN) and one that judges the quality of the images generated, known as a discriminator. The drawing bot was trained on pairs of images and captions, which teach the AI to learn what words go with which images. The team also created a mathematical representation of human attention, which is what we all use when we draw pictures from complex descriptions: a red wing, a sharp beak, a yellow wing. "Attention is a human concept; we use math to make attention computational," said He.