Skip to content

Commit

Permalink
New dataset and metadata integration (#292)
Browse files Browse the repository at this point in the history
* initial changes

* using vehicle_size struct and comments

* datatype indexing changes

* removed comment

* minor cleanup

* new dataset yay

* fixed downloads and added links

* extract script for large dataset

* added hf to env

* update dataset size

* added hf to env

* Fix typo

* minor docstring update

* Add agent ids and remove degrees conversion

* Wrap yaws to be in [-pi, pi] + linting

* updated debug scene

* Remove degrees to radians conversion

* Zero initialize trajectory for padding agents

* Add agent id support

* collect object metadata in json

* ERR_VAL bug

* metadata tensor export

* env_config args to init all agents

* warning msg if all agents not init

* new metadata dataclass

* cleanup

* reverted tutorial notebook

* QOL and docstring

* Resample and render fixes

* Small rendering fix: agent colors and classification.

* metadata v2

* minor fix to init_all

* new example scenes

* Update tutorial 1

* Update tutorial 2

* Silence metadata warning

* Update tutorial 3

* Update tutorials 4 + 5

* Mini update: training configs

* hardcoded file path

* readability, warnings, remove metadata.id

* Collapse data downloading details

* Small fixes for resampling (sorry, last change)

* Update example scenarios

* tests changes

---------

Co-authored-by: kevin <[email protected]>
Co-authored-by: Aarav Pandya <[email protected]>
  • Loading branch information
3 people authored Dec 19, 2024
1 parent c353cb7 commit ecb236c
Show file tree
Hide file tree
Showing 51 changed files with 1,986 additions and 380,198 deletions.
53 changes: 47 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,16 +211,55 @@ We are open-sourcing a policy trained on 1,000 randomly sampled scenarios. You c

### Download the dataset

Two versions of the dataset are available:
- Two versions of the dataset are available, a [mini version](https://huggingface.co/datasets/EMERGE-lab/GPUDrive_mini) with a 1000 training files and 300 test/validation files, and a [large dataset](https://huggingface.co/datasets/EMERGE-lab/GPUDrive) with 100k unique scenes.
- Replace 'GPUDrive_mini' with 'GPUDrive' below if you wish to download the full dataset.

- a mini-one that is about 1 GB and consists of 1000 training files and 100 validation / test files at: [Dropbox Link](https://www.dropbox.com/sh/8mxue9rdoizen3h/AADGRrHYBb86pZvDnHplDGvXa?dl=0).
- the full dataset (150 GB) and consists of 134453 training files and 12205 validation / test files: [Dropbox Link](https://www.dropbox.com/sh/wv75pjd8phxizj3/AABfNPWfjQdoTWvdVxsAjUL_a?dl=0)
<details>
<summary>Download the dataset</summary>

- To download the dataset you need the huggingface_hub library (if you initialized from `environment.yml` then you can skip this step):
```bash
pip install huggingface_hub
```
Then you can download the dataset using python or just `huggingface-cli`.

- **Option 1**: Using Python
```python
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="EMERGE-lab/GPUDrive_mini", repo_type="dataset", local_dir="data/processed")
```

- **Option 2**: Use the huggingface-cli

1. Log in to your Hugging Face account:
```bash
huggingface-cli login
```

2. Download the dataset:
```bash
huggingface-cli download EMERGE-lab/GPUDrive_mini --local-dir data/processed --repo-type "dataset"
```

- **Option 3**: Manual Download

The simulator supports initializing scenes from the `Nocturne` dataset. The input parameter for the simulator `json_path` takes in a path to a directory containing the files in the Nocturne format. The `SceneConfig` dataclass in `pygpudrive/env/config.py` dataclass is used to configure how scenes are selected from a folder with traffic scenarios.
1. Visit https://huggingface.co/datasets/EMERGE-lab/GPUDrive_mini
2. Navigate to the Files and versions tab.
3. Download the desired files/directories.

### Re-building the dataset
_NOTE_: If you downloaded the full-sized dataset, it is grouped to subdirectories of 10k files each (according to hugging face constraints). In order for the path to work with GPUDrive, you need to run
```python
python data_utils/extract_groups.py #use --help if you've used a custom download path
```

</details>

GPUDrive is compatible with the complete [Waymo Open Motion Dataset](https://github.com/waymo-research/waymo-open-dataset), which contains over 100,000 scenarios. To download new files and create scenarios for the simulator, follow these three steps.
### Re-build the dataset

If you wish to manually generate the dataset, GPUDrive is compatible with the complete [Waymo Open Motion Dataset](https://github.com/waymo-research/waymo-open-dataset), which contains well over 100,000 scenarios. To download new files and create scenarios for the simulator, follow the steps below.

<details>
<summary>Re-build the dataset in 3 steps</summary>

1. First, head to [https://waymo.com/open/](https://waymo.com/open/) and click on the "download" button a the top. After registering, click on the files from `v1.2.1 March 2024`, the newest version of the dataset at the time of wrting (10/2024). This will lead you a Google Cloud page. From here, you should see a folder structure like this:

Expand Down Expand Up @@ -278,6 +317,8 @@ and that's it!

> **🧐 Caveat**: A single Waymo tfrecord file contains approximately 500 traffic scenarios. Processing speed is about 250 scenes/min on a 16 core CPU. Trying to process the entire validation set for example (150 tfrecords) is a LOT of time.
</details>

## 📜 Citations

If you use GPUDrive in your work, please cite us:
Expand Down
17 changes: 9 additions & 8 deletions baselines/ippo/config/ippo_ff_puffer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ mode: "train"
use_rnn: false
eval_model_path: null
baseline: false
data_dir: "data/processed/validation"
data_dir: "data/processed/examples"

environment: # Overrides default environment configs (see pygpudrive/env/config.py)
name: "gpudrive"
Expand All @@ -16,8 +16,8 @@ environment: # Overrides default environment configs (see pygpudrive/env/config.
remove_non_vehicles: true # If false, all agents are included (vehicles, pedestrians, cyclists)
use_lidar_obs: false # NOTE: Setting this to true currently turns of the other observation types
reward_type: "weighted_combination"
collision_weight: -0.025
off_road_weight: -0.025
collision_weight: -0.035
off_road_weight: -0.035
goal_achieved_weight: 1.0
dynamics_model: "classic"
collision_behavior: "ignore" # Options: "remove", "stop"
Expand All @@ -28,7 +28,7 @@ environment: # Overrides default environment configs (see pygpudrive/env/config.
wandb:
entity: ""
project: "gpudrive"
group: "rl_scale"
group: "my_group"
mode: "online" # Options: online, offline, disabled
tags: ["ppo", "ff"]

Expand All @@ -48,12 +48,13 @@ train:
# # # Data sampling # # #
resample_scenes: false
resample_criterion: "global_step"
resample_interval: 3_000_000
resample_interval: 5_000_000
resample_limit: 10000 # Resample until the limit is reached; set to a large number to continue resampling indefinitely
resample_mode: "random" # Options: random

# # # PPO # # #
torch_deterministic: false
total_timesteps: 500_000_000
total_timesteps: 1_000_000_000
batch_size: 131_072
minibatch_size: 16_384
learning_rate: 3e-4
Expand All @@ -65,7 +66,7 @@ train:
clip_coef: 0.2
clip_vloss: false
vf_clip_coef: 0.2
ent_coef: 0.0003
ent_coef: 0.0001
vf_coef: 0.5
max_grad_norm: 0.5
target_kl: null
Expand All @@ -77,7 +78,7 @@ train:
# # # Rendering # # #
render: false # Determines whether to render the environment (note: will slow down training)
render_interval: 500 # Render every k iterations
render_k_scenarios: 5 # Number of scenarios to render
render_k_scenarios: 10 # Number of scenarios to render
render_simulator_state: true # Plot the simulator state from bird's eye view
render_agent_obs: false # Debugging tool, plot what an agent is seing
render_fps: 15 # Frames per second
Expand Down

Large diffs are not rendered by default.

Loading

0 comments on commit ecb236c

Please sign in to comment.