Skip to content

[ICLR 2025] EdgeRunner: Auto-regressive Auto-encoder for Efficient Mesh Generation

License

Notifications You must be signed in to change notification settings

NVlabs/EdgeRunner

Repository files navigation

EdgeRunner

This is the official implementation of EdgeRunner: Auto-regressive Auto-encoder for Efficient Mesh Generation.

edgerunner.mp4
  • Release training and inference code.
  • Release pretrained checkpoints.

Install

Make sure torch with CUDA is correctly installed. For training, we rely on flash-attn (requires at least Ampere GPUs like A100). For inference, older GPUs like V100 are also supported, although slower.

# clone
git clone https://github.com/NVlabs/EdgeRunner
cd EdgeRunner

# install flash-attn
pip install flash-attn --no-build-isolation

# install meto the mesh tokenizer
pip install -e ./meto

# install other dependencies
pip install -r requirements.txt

We also provide a Dockerfile for easy setup.

Training

NOTE: Since the dataset used in our training is based on AWS, it cannot be directly used for training in a new environment. We provide the necessary training code framework, please check and modify the dataset implementation!

# debug training
accelerate launch --config_file acc_configs/gpu1.yaml main.py ArAE --workspace workspace_train
accelerate launch --config_file acc_configs/gpu1.yaml main_dit.py DiT --workspace workspace_train_dit

# single-node training (use slurm for multi-nodes training)
accelerate launch --config_file acc_configs/gpu8.yaml main.py ArAE --workspace workspace_train
accelerate launch --config_file acc_configs/gpu8.yaml main_dit.py DiT --workspace workspace_train_dit

Training details:

  • We train the ArAE model on 64 A100 (80GB) GPUs for approximately one week. At a batch size of 4, each training iteration takes about 4 seconds. Expected training loss at convergence should be around 0.315.
  • We train the DiT model on 16 A100 (40GB) GPUs for approximately one week. At a batch size of 32, each training iteration takes about 7 seconds. Expected training loss at convergence should be around 0.0018.

Inference

Inference takes about 16GB GPU memory.

### point cloud conditioned 
# --workspace: path to save outputs.
# --resume: path to pretrained ArAE checkpoint.
# --test_path: can be either a directory or a single file of mesh. We will randomly sample surface points from it.
# --generate_mode: choose from ['greedy', 'sample'], strategy for auto-regressive generation.
# --test_num_face: targeted number of face to generate, choose from [-1, 1000, 2000, 4000], usually 1000 gives most robust results.
# --test_repeat: number of times to repeat the inference with different random seeds.
# --seed: initial random seed.
python infer.py ArAE --workspace workspace --resume pretrained/ArAE.safetensors --test_path data_mesh/ --generate_mode sample --test_num_face 1000 --test_repeat 3 --seed 42

### image conditioned
# --resume2: path to pretrained DiT checkpoint.
# --test_path: can be either a directory or a single file of image.
python infer_dit.py DiT --workspace workspace --resume pretrained/ArAE.safetensors --resume2 pretrained/DiT.safetensors  --test_path data_images/ --generate_mode sample --test_num_face 1000 --test_repeat 3 

Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

Citation

@article{tang2024edgerunner,
  title={EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation},
  author={Tang, Jiaxiang and Li, Zhaoshuo and Hao, Zekun and Liu, Xian and Zeng, Gang and Liu, Ming-Yu and Zhang, Qinsheng},
  journal={arXiv preprint arXiv:2409.18114},
  year={2024}
}

Releases

No releases published

Packages

No packages published