Skip to content

Commit

Permalink
update readme, work with exporting
Browse files Browse the repository at this point in the history
  • Loading branch information
KeplerC committed Apr 8, 2024
1 parent 906e558 commit a417c55
Show file tree
Hide file tree
Showing 5 changed files with 14 additions and 8 deletions.
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
[![codecov](https://codecov.io/gh/KeplerC/fog_rtx/branch/main/graph/badge.svg?token=fog_rtx_token_here)](https://codecov.io/gh/KeplerC/fog_rtx)
[![CI](https://github.com/KeplerC/fog_rtx/actions/workflows/main.yml/badge.svg)](https://github.com/KeplerC/fog_rtx/actions/workflows/main.yml)

An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support RT-X, HuggingFace.
An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support Open-X-Embodiment, HuggingFace.

🦊fox considers memory efficiency and speed by working with a trajectory-level metadata and a lazily-loaded dataset. Implemented on [Apache Pyarrows](https://arrow.apache.org/docs/python/index.html) dataset, it allows flexible partitioning of the dataset on distributed storage.

## Install

Expand All @@ -19,10 +21,13 @@ import fog_rtx as fox
# create a new dataset
dataset = fox.Dataset(
name="test_rtx", path="/tmp/rtx",
# dataset is automatically partitioned, allowing
# distributed storage on different directories and cloud
load_from = ["/tmp/rtx", "s3://fox_stroage/"]
)

# Data collection:
# create a new episode / trajectory
# create a new trajectory
episode = dataset.new_episode(
description = "grasp teddy bear from the shelf"
)
Expand All @@ -31,7 +36,7 @@ episode = dataset.new_episode(
episode.add(feature = "arm_view", value = "image1.jpg")
episode.add(feature = "camera_pose", value = "image1.jpg")

# mark the current state as terminal state
# mark the current trajectory as finished and save it
episode.close()

# Alternatively,
Expand All @@ -46,13 +51,13 @@ episode_info = dataset.get_episode_info()
metadata = episode_info.filter(episode_info["collector"] == "User 2")
episodes = dataset.read_by(metadata)

# export and share the dataset as standard RT-X format
# export and share the dataset as standard Open-X-Embodiment format
dataset.export(episodes, format="rtx")
```


## More Coming Soon!
Currently we see a 60\% space saving on some existing RT-X datasets. This can be even more with re-paritioning the dataset. Our next steps can be found in the [planning doc](./design_doc/planning_doc.md). Feedback welcome through issues or PR to planning doc!
Currently we see a more than 60\% space saving on some existing RT-X datasets. This can be even more by re-paritioning the dataset. Our next steps can be found in the [planning doc](./design_doc/planning_doc.md). Feedback welcome through issues or PR to planning doc!

## Development

Expand Down
1 change: 1 addition & 0 deletions design_doc/planning_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### Small Steps
5. efficient image storage
6. compare with standard tfds on loading and storage
7. recover shema from save data

### known bugs
3. sql part is completely broken
Expand Down
2 changes: 1 addition & 1 deletion design_doc/system_assumptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Fox manages a trajectory information table that contains summary, tagging, etc and a step data table that contains all the data (images, etc).
1. Episode information metadata should fit in memory.
2. Trajectory data can go beyond memory or hardware disks.
3. All trajectory data within an episode should fit in memory (TODO: this constraint should be relaxed)
3. All trajectory data within an episode should fit in memory (TODO: this constraint should be relaxed, but `env_logger` package is the bottleneck here. On my dev machine (4G RAM), its `tfds.core.SeqentialWriter` experience a memory explosion when writing multiple sequences to the partition)

### Consistency
1. Data can be collected distributedly on multiple robots /processes.
Expand Down
2 changes: 1 addition & 1 deletion examples/rtx_example/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

dataset.load_rtx_episodes(
name="berkeley_autolab_ur5",
split="train[:10]",
split="train[:1]",
)

dataset.export(format="rtx")
2 changes: 1 addition & 1 deletion fog_rtx/rlds/writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def __init__(
data_directory: str,
ds_config: tfds.rlds.rlds_base.DatasetConfig,
ds_identity: tfds.core.dataset_info.DatasetIdentity,
max_episodes_per_file: int = 1000,
max_episodes_per_file: int = 1,
split_name: Optional[str] = None,
version: str = "0.0.1",
store_ds_metadata: bool = False,
Expand Down

0 comments on commit a417c55

Please sign in to comment.