How DeepNude works?

DeepNude uses a slightly modified version of the pix2pixHD GAN architecture. If you are interested in the details of the network you can study this amazing project provided by NVIDIA.

A GAN network can be trained using both paired and unpaired dataset. Paired datasets get better results and are the only choice if you want to get photorealistic results, but there are cases in which these datasets do not exist and they are impossible to create. DeepNude is a case like this. A database in which a person appears both naked and dressed, in the same position, is extremely difficult to achieve, if not impossible.

We overcome the problem using a divide-et-impera approach. Instead of relying on a single network, we divided the problem into 3 simpler sub-problems:

Generation of a mask that selects clothes
Generation of a abstract representation of anatomical attributes
Generation of the fake nude photo

Original problem:

Divide-et-impera problem:

This approach makes the construction of the sub-datasets accessible and feasible. Web scrapers can download thousands of images from the web, dressed and nude, and through photoshop you can apply the appropriate masks and details to build the dataset that solve a particular sub problem. Working on stylized and abstract graphic fields the construction of these datasets becomes a mere problem of hours working on photoshop to mask photos and apply geometric elements. Although it is possible to use some automations, the creation of these datasets still require great and repetitive manual effort.

Computer Vision Optimization

To optimize the result, simple computer vision transformations are performed before each GAN phase, using OpenCV. The nature and meaning of these transformations are not very important, and have been discovered after numerous trial and error attempts.

Considering these additional transformations, and including the final insertion of watermarks, the phases of the algorithm are the following:

dress -> correct [OPENCV]
correct -> mask [GAN]
mask -> maskref [OPENCV]
maskref -> maskdet [GAN]
maskdet -> maskfin [OPENCV]
maskfin -> nude [GAN]
nude -> watermark [OPENCV]