-
Notifications
You must be signed in to change notification settings - Fork 7
1612.05424
[arxiv 1612.05424] Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [PDF] [notes]
Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan
read 03/08/2017
Map synthetic image to real one at pixel level in order to provide new labeled samples that allow for training that generalizes to real images
Allow for creation of infinite quantity of synthetic images
Avoid mode collapse by enforcing pixel similarity regularization
Generated image is conditioned on both source image and noise vector, this allows to increase the variability of generated images by varying the noise vector starting from one given source image
Encourages production of images that are similar to the target domain images
Generator : during training, maps a source image and a noise vector to an adapted image. Resnet based convolutional network.
Discriminator : tries to distinguish between real and generated images
Classifier : assigns task-specific labels to images both from the generated and from the target distribution
The objective is to minimize the Classifier and Generator losses while maximizing the Discriminator loss
An additional loss, the content-similarity loss penalizes large differences between foreground pixels in original and generated images (foreground being the part rendered by the engine).
This loss is a masked pairwise mean squared loss, which penalizes the difference between corresponding foreground pixels. This loss is scale invariant. For this loss to be small, the difference between pairs of pixels in the original image should be close to the one in the final image. (more on this loss on page 4 of Depth Map Prediction from a Single Image using a Multi-Scale Deep Network)
The classifier is trained with both adapted and non-adapted source images
Two steps during training:
- Update task-specific and discriminator parameters while keeping generator fixed
- Update generator parameters while fixing discriminator and task-specific
On MNIST,
State of the art on MNIST to USPS (95% accuracy)
State of the art on Synthetic Cripped Linemod to Cropped Linemod (where the task is instance recognition and 3D pose estimation and synthetic data is generated from the 3D model of the instances). It achieves almost 100% accuracy in classification and 23 degrees of mean angle error (vs min 53 degrees with other methods)