Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

This repo contains code for the paper Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Installation

Follow the environment.yml file for creating conda environment and installing dependencies.

Training and Inference

For training the point track prediction model, run the following after changing the number of nodes / GPUs per node, batch size as needed

torchrun --nnodes=1 --nproc_per_node=8 train_track_pred.py --global-batch-size=480 --data-path=<folder with data files>

Specify path to initial image, goal image, and checkpoint (trained model is in this link). The visualization will be saved in the folder save_tracK_pred.

python inference_track_pred.py --ckpt=<path to model> --init=<path to initial image> --goal=<path to goal image>

For any questions about the project, feel free to email Homanga Bharadhwaj [email protected]

License

The code is licensed under CC-BY-NC License.md

Acknowledgement

The code in this repo is based on Diffusion Transformers https://github.com/facebookresearch/DiT and uses open-source packages like diffusers, scipy, opencv, numpy, pytorch

Citation

If you find the repository helpful, please consider citing our paper

@misc{bharadhwaj2024track2act,
      title={Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation}, 
      author={Homanga Bharadhwaj and Roozbeh Mottaghi and Abhinav Gupta and Shubham Tulsiani},
      year={2024},
      eprint={2405.01527},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
diffusion		diffusion
save_tracK_pred		save_tracK_pred
static		static
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
inference_track_pred.py		inference_track_pred.py
setup.py		setup.py
single_script.py		single_script.py
train_track_pred.py		train_track_pred.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Installation

Training and Inference

License

Acknowledgement

Citation

About

Releases

Packages

Languages

License

homangab/Track-2-Act

Folders and files

Latest commit

History

Repository files navigation

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Installation

Training and Inference

License

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages