Skip to content

Latest commit

 

History

History
148 lines (111 loc) · 6.21 KB

README.md

File metadata and controls

148 lines (111 loc) · 6.21 KB

Detecting fake images

Towards Universal Fake Image Detectors that Generalize Across Generative Models
Utkarsh Ojha*, Yuheng Li*, Yong Jae Lee
(*Equal contribution)
CVPR 2023

[Project Page] [Paper]

>
Using images from one type of generative model (e.g., GAN), detect fake images from other breeds (e.g., Diffusion models)

Contents

Setup

  1. Clone this repository
git clone https://github.com/Yuheng-Li/UniversalFakeDetect
cd UniversalFakeDetect
  1. Install the necessary libraries
pip install torch torchvision

Data

  • Of the 19 models studied overall (Table 1/2 in the main paper), 11 are taken from a previous work. Download the test set, i.e., real/fake images for those 11 models given by the authors from here (dataset size ~19GB).
  • Download the file and unzip it in datasets/test. You could also use the bash scripts provided by the authors, as described here in their code repository.
  • This should create a directory structure as follows:

datasets
└── test					
      ├── progan	
      │── cyclegan   	
      │── biggan
      │      .
      │      .
	  
  • Each directory (e.g., progan) will contain real/fake images under 0_real and 1_fake folders respectively.
  • Dataset for the diffusion models (e.g., LDM/Glide) can be found here. Note that in the paper (Table 2/3), we had reported the results over 10k randomly sampled images. Since providing that many images for all the domains will take up too much space, we are only releasing 1k images for each domain; i.e., 1k images fake images and 1k real images for each domain (e.g., LDM-200).
  • Download and unzip the file into ./diffusion_datasets directory.

Evaluation

  • You can evaluate the model on all the dataset at once by running:
python validate.py  --arch=CLIP:ViT-L/14   --ckpt=pretrained_weights/fc_weights.pth   --result_folder=clip_vitl14 
  • You can also evaluate the model on one generative model by specifying the paths of real and fake datasets
python validate.py  --arch=CLIP:ViT-L/14   --ckpt=pretrained_weights/fc_weights.pth   --result_folder=clip_vitl14  --real_path datasets/test/progan/0_real --fake_path datasets/test/progan/1_fake

Note that if no arguments are provided for real_path and fake_path, the script will perform the evaluation on all the domains specified in dataset_paths.py.

  • The results will be stored in results/<folder_name> in two files: ap.txt stores the Average Prevision for each of the test domains, and acc.txt stores the accuracy (with 0.5 as the threshold) for the same domains.

Training

  • Our main model is trained on the same dataset used by the authors of this work. Download the official training dataset provided here (dataset size ~ 72GB).

  • Download and unzip the dataset in datasets/train directory. The overall structure should look like the following:

datasets
└── train			
      └── progan			
           ├── airplane
           │── bird
           │── boat
           │      .
           │      .
  • A total of 20 different object categories, with each folder containing the corresponding real and fake images in 0_real and 1_fake folders.
  • The model can then be trained with the following command:
python train.py --name=clip_vitl14 --wang2020_data_path=datasets/ --data_mode=wang2020  --arch=CLIP:ViT-L/14  --fix_backbone
  • Important: do not forget to use the --fix_backbone argument during training, which makes sure that the only the linear layer's parameters will be trained.

Deploying Model

The provided Dockerfile can be used to create an image:

export DOCKER_REGISTRY="hannahyk" # Put your Docker Hub username here  
# Build the Docker image for runtime
docker build -t "$DOCKER_REGISTRY/hannah-ufd" -f Dockerfile .

Run this Docker image locally on a GPU to test that it can run inferences as expected:

docker run --gpus=all -d --rm -p 80:8000 --env SERVER_PORT=8000  --name "hannah-ufd" "$DOCKER_REGISTRY/hannah-ufd"

In a separate terminal, run the following command one or more times

curl -X GET http://localhost:80/healthcheck

until you see {"healthy":true}.

Then, test that inference can be run as expected:

curl -X POST http://localhost:80/predict \
    -H "Content-Type: application/json" \
    --data '{"file_path":"https://uploads.civai.org/files/jhxTVhsg/b751515306e7.jpg"}'

Finally, if successful, push the docker image to docker hub:

docker login

docker push "$DOCKER_REGISTRY/hannah-ufd:latest"

Acknowledgement

We would like to thank Sheng-Yu Wang for releasing the real/fake images from different generative models. Our training pipeline is also inspired by his open-source code. We would also like to thank CompVis for releasing the pre-trained LDMs and LAION for open-sourcing LAION-400M dataset.

Citation

If you find our work helpful in your research, please cite it using the following:

@inproceedings{ojha2023fakedetect,
      title={Towards Universal Fake Image Detectors that Generalize Across Generative Models}, 
      author={Ojha, Utkarsh and Li, Yuheng and Lee, Yong Jae},
      booktitle={CVPR},
      year={2023},
}