1612.05424

ICCV 2017

[arxiv 1612.05424] Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [PDF] [notes]

Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

read 03/08/2017

Objective

Map synthetic image to real one at pixel level in order to provide new labeled samples that allow for training that generalizes to real images

Allow for creation of infinite quantity of synthetic images

Synthesis

Avoid mode collapse by enforcing pixel similarity regularization

Generated image is conditioned on both source image and noise vector, this allows to increase the variability of generated images by varying the noise vector starting from one given source image

Structure

Encourages production of images that are similar to the target domain images

Generator : during training, maps a source image and a noise vector to an adapted image. Resnet based convolutional network.

Discriminator : tries to distinguish between real and generated images

Classifier : assigns task-specific labels to images both from the generated and from the target distribution

Training

The objective is to minimize the Classifier and Generator losses while maximizing the Discriminator loss

An additional loss, the content-similarity loss penalizes large differences between foreground pixels in original and generated images (foreground being the part rendered by the engine).

This loss is a masked pairwise mean squared loss, which penalizes the difference between corresponding foreground pixels. This loss is scale invariant. For this loss to be small, the difference between pairs of pixels in the original image should be close to the one in the final image. (more on this loss on page 4 of Depth Map Prediction from a Single Image using a Multi-Scale Deep Network)

The classifier is trained with both adapted and non-adapted source images

Two steps during training:

Update task-specific and discriminator parameters while keeping generator fixed
Update generator parameters while fixing discriminator and task-specific

Experiments

On MNIST,

State of the art on MNIST to USPS (95% accuracy)

State of the art on Synthetic Cripped Linemod to Cropped Linemod (where the task is instance recognition and 3D pose estimation and synthetic data is generated from the 3D model of the instances). It achieves almost 100% accuracy in classification and 23 degrees of mean angle error (vs min 53 degrees with other methods)