AI Learns Semantic Image Manipulation | Two Minute Papers #217

AI Learns Semantic Image Manipulation | Two Minute Papers #217

January 14, 2020 52 By Peter Engel


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. This technique is about creating high resolution
images from semantic maps. A semantic map is a colorful image where each
of the colors denote an object class, such as pedestrians, cars, traffic signs and lights,
buildings, and so on. Normally, we use light simulation programs
or rasterization to render such an image, but AI researchers asked the question: why
do we even need a renderer if we can code up a learning algorithm that synthesizes the
images by itself? Whoa. This generative adversarial network takes
this input semantic map, and synthesizes a high-resolution photorealistic image from
it. Previous techniques were mostly capable of
creating coarser, lower resolution images, and also they were rarely photorealistic. And get this, this one produces 2k by 1k pixel
outputs, which is close to full HD in terms of pixel count. If we wish to change something in a photorealistic
image, we’ll likely need a graphic designer and lots of expertise in photoshop and similar
tools. In the end, even simpler edits are very laborious
to make because the human eye is very difficult to fool. An advantage of working with these semantic
maps is that they are super easy to edit without any expertise. For instance, we can exert control on the
outputs by choosing from a number of different possible options to fill the labels. These are often not just reskinned versions
of the same car or road but can represent a vastly different solution, like changing
the material of the road from concrete to dirt. Or, it is super easy to replace trees with
buildings, all we have to do is rename the labels in the input image. These results are not restricted to outdoors
traffic images. Individual parts of human faces are also editable. For instance, adding a moustache has never
been easier. The results are compared to a previous technique
by the name pix2pix and against cascaded refinement networks. You can see that the quality of the outputs
vastly outperforms both of them, and the images are also of visibly higher resolution. It is quite interesting to say that these
are “previous work”, because both of these papers came out this year, for instance, our
episode on pix2pix came 9 months ago and it has already been improved by a significant
margin. The joys of machine learning research. Part of the trick is that the semantic map
is not only used by itself, but a boundary map is also created to encourage the algorithm
to create outputs with better segmentation. This boundary information turned out to be
just as useful as the labels themselves. Another trick is to create multiple discriminator
networks and run them on a variety of coarse to fine scale images. There is much, much more in the paper, make
sure to have a look for more details. Since it is difficult to mathematically evaluate
the quality of these images, a user study was carried out in the paper. In the end, if we take a practical mindset,
these tools are to be used by artists and it is reasonable to say that whichever one
is favored by humans should be accepted as a superior method for now. This tool is going to be a complete powerhouse
for artists in the industry. And by this, I mean right now because the
source code of this project is available to everyone, free of charge. Yipee! In the meantime, we have an opening at our
Institute at the Vienna University of Technology for one PhD student and one PostDoc. The link is available in the video description,
read it carefully to make sure you qualify, and if you apply through the e-mail address
of Professor Michael Wimmer, make sure to mention Two Minute Papers in your message. This is an excellent opportunity to turn your
life around, live in an amazing city, learn a lot and write amazing papers. It doesn’t get any better than that. Deadline is end of January. Thanks for watching and for your generous
support, and I’ll see you next time!