The model is now available HERE
(Requires to sign End User License Agreement)
Baris Gecer 1,2, Alexander Lattas 1,2, Stylianos Ploumpis 1,2, Jiankang Deng 1,2, Athanasios Papaioannou 1,2, Stylianos Moschoglou 1,2, & Stefanos Zafeiriou 1,2
1 Imperial College London
2 FaceSoft.io
Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was demonstrated that Generative Adversarial Networks (GANs) can be used for generating high-quality textures of faces. Nevertheless, the generation process either omits the geometry and normals, or independent processes are used to produce 3D shape information. In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. The qualitative results shown in this paper are compressed due to size limitations, full-resolution results and the accompanying video can be found in the supplementary documents.
- Download the model after signing the agreement and place it under '/results' directory
- Install menpo3d by
pip install menpo3d
- And then Run the test script:
python test.py
The TBGAN code repository contains a command-line tool for recreating bit-exact replicas of the datasets that we used in the paper. The tool also provides various utilities for operating on the datasets:
usage: dataset_tool.py [-h] <command> ...
display Display images in dataset.
extract Extract images from dataset.
compare Compare two datasets.
create_from_pkl_img_norm Create dataset from a directory full of texture, normals and shape.
Type "dataset_tool.py <command> -h" for more information.
Please ignore other functions. The main function to prepare tf_records is 'create_from_pkl_img_norm'
The datasets are represented by directories containing the same image data in several resolutions to enable efficient streaming. There is a separate *.tfrecords
file for each resolution, and if the dataset contains labels, they are stored in a separate file as well:
> python dataset_tool.py create_from_pkl_img_norm datasets/tf_records datasets/texture(/*.png) dataset/shape(/*.pkl) dataset/normals(/*.pkl)
The create_*
commands take the standard version of a given dataset as input and produce the corresponding *.tfrecords
files as output.
Please see how to start training with a PROGAN
Additionally, you will need to add
> "dynamic_range=[-1,1],dtype = 'float32'"
arguments to 'dataset' EasyDict() in config.py
Once the necessary datasets are set up, you can proceed to train your own networks. The general procedure is as follows:
- Edit
config.py
to specify the dataset and training configuration by uncommenting/editing specific lines. - Run the training script with
python train.py
. - The results are written into a newly created subdirectory under
config.result_dir
- Wait several days (or weeks) for the training to converge, and analyze the results.
By default, config.py
is configured to train a 1024x1024 network for CelebA-HQ using a single-GPU. This is expected to take about two weeks even on the highest-end NVIDIA GPUs. The key to enabling faster training is to employ multiple GPUs and/or go for a lower-resolution dataset. To this end, config.py
contains several examples for commonly used datasets, as well as a set of "configuration presets" for multi-GPU training. All of the presets are expected to yield roughly the same image quality for CelebA-HQ, but their total training time can vary considerably:
preset-v1-1gpu
: Original config that was used to produce the CelebA-HQ and LSUN results shown in the paper. Expected to take about 1 month on NVIDIA Tesla V100.preset-v2-1gpu
: Optimized config that converges considerably faster than the original one. Expected to take about 2 weeks on 1xV100.preset-v2-2gpus
: Optimized config for 2 GPUs. Takes about 1 week on 2xV100.preset-v2-4gpus
: Optimized config for 4 GPUs. Takes about 3 days on 4xV100.preset-v2-8gpus
: Optimized config for 8 GPUs. Takes about 2 days on 8xV100.
For reference, the expected output of each configuration preset for CelebA-HQ can be found in networks/tensorflow-version/example_training_runs
Other noteworthy config options:
fp16
: Enable FP16 mixed-precision training to reduce the training times even further. The actual speedup is heavily dependent on GPU architecture and cuDNN version, and it can be expected to increase considerably in the future.BENCHMARK
: Quickly iterate through the resolutions to measure the raw training performance.BENCHMARK0
: Same asBENCHMARK
, but only use the highest resolution.syn1024rgb
: Synthetic 1024x1024 dataset consisting of just black images. Useful for benchmarking.VERBOSE
: Save image and network snapshots very frequently to facilitate debugging.GRAPH
andHIST
: Include additional data in the TensorBoard report.
Training results can be analyzed in several ways:
- Manual inspection: The training script saves a snapshot of randomly generated images at regular intervals in
fakes*.png
and reports the overall progress inlog.txt
. - TensorBoard: The training script also exports various running statistics in a
*.tfevents
file that can be visualized in TensorBoard withtensorboard --logdir <result_subdir>
. - Generating images and videos: At the end of
config.py
, there are several pre-defined configs to launch utility scripts (generate_*
). For example:- Suppose you have an ongoing training run titled
010-pgan-celebahq-preset-v1-1gpu-fp32
, and you want to generate a video of random interpolations for the latest snapshot. - Uncomment the
generate_interpolation_video
line inconfig.py
, replacerun_id=10
, and runpython train.py
- The script will automatically locate the latest network snapshot and create a new result directory containing a single MP4 file.
- Suppose you have an ongoing training run titled
- Quality metrics: Similar to the previous example,
config.py
also contains pre-defined configs to compute various quality metrics (Sliced Wasserstein distance, Fréchet inception distance, etc.) for an existing training run. The metrics are computed for each network snapshot in succession and stored inmetric-*.txt
in the original result directory.
Baris Gecer is supported by the Turkish Ministry of National Education, Stylianos Ploumpis by the EPSRC Project EP/N007743/1 (FACER2VM), and Stefanos Zafeiriou by EPSRC Fellowship DEFORM (EP/S010203/1).
Code borrows heavily from NVIDIA's PRO-GAN implementation, please check and comply with its License. and cite their paper:
@inproceedings{karras2018progressive,
title={Progressive Growing of GANs for Improved Quality, Stability, and Variation},
author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
booktitle={International Conference on Learning Representations},
year={2018}
}
If you find this work is useful for your research, please cite our paper:
@inproceedings{gecer2020tbgan,
title={Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks},
author={{Gecer}, Baris and {Lattas}, Alexander and {Ploumpis}, Stylianos and
{Deng}, Jiankang and {Papaioannou}, Athanasios and
{Moschoglou}, Stylianos and {Zafeiriou}, Stefanos},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
year={2020},
organization={Springer}
doi = {10.1007/978-3-030-58526-6_25}
}