update readme, work with exporting

BerkeleyAutomation · Apr 8, 2024 · a417c55 · a417c55
1 parent 906e558
commit a417c55
Show file tree

Hide file tree

Showing 5 changed files with 14 additions and 8 deletions.
diff --git a/README.md b/README.md
@@ -3,7 +3,9 @@
 [![codecov](https://codecov.io/gh/KeplerC/fog_rtx/branch/main/graph/badge.svg?token=fog_rtx_token_here)](https://codecov.io/gh/KeplerC/fog_rtx)
 [![CI](https://github.com/KeplerC/fog_rtx/actions/workflows/main.yml/badge.svg)](https://github.com/KeplerC/fog_rtx/actions/workflows/main.yml)
 
-An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support RT-X, HuggingFace. 
+An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support Open-X-Embodiment, HuggingFace. 
+
+🦊fox considers memory efficiency and speed by working with a trajectory-level metadata and a lazily-loaded dataset. Implemented on [Apache Pyarrows](https://arrow.apache.org/docs/python/index.html) dataset, it allows flexible partitioning of the dataset on distributed storage. 
 
 ## Install 
 
@@ -19,10 +21,13 @@ import fog_rtx as fox
 # create a new dataset
 dataset = fox.Dataset(
     name="test_rtx", path="/tmp/rtx", 
+    # dataset is automatically partitioned, allowing 
+    # distributed storage on different directories and cloud
+    load_from = ["/tmp/rtx", "s3://fox_stroage/"]
 )  
 
 # Data collection: 
-# create a new episode / trajectory
+# create a new trajectory
 episode = dataset.new_episode(
     description = "grasp teddy bear from the shelf"
 )
@@ -31,7 +36,7 @@ episode = dataset.new_episode(
 episode.add(feature = "arm_view", value = "image1.jpg")
 episode.add(feature = "camera_pose", value = "image1.jpg")
 
-# mark the current state as terminal state 
+# mark the current trajectory as finished and save it
 episode.close()
 
 # Alternatively, 
@@ -46,13 +51,13 @@ episode_info = dataset.get_episode_info()
 metadata = episode_info.filter(episode_info["collector"] == "User 2")
 episodes = dataset.read_by(metadata)
 
-# export and share the dataset as standard RT-X format
+# export and share the dataset as standard Open-X-Embodiment format
 dataset.export(episodes, format="rtx")
 ```
 
 
 ## More Coming Soon!
-Currently we see a 60\% space saving on some existing RT-X datasets. This can be even more with re-paritioning the dataset. Our next steps can be found in the [planning doc](./design_doc/planning_doc.md). Feedback welcome through issues or PR to planning doc!
+Currently we see a more than 60\% space saving on some existing RT-X datasets. This can be even more by re-paritioning the dataset. Our next steps can be found in the [planning doc](./design_doc/planning_doc.md). Feedback welcome through issues or PR to planning doc!
 
 ## Development
 

diff --git a/design_doc/planning_doc.md b/design_doc/planning_doc.md
@@ -3,6 +3,7 @@
 ### Small Steps 
 5. efficient image storage 
 6. compare with standard tfds on loading and storage
+7. recover shema from save data
 
 ### known bugs 
 3. sql part is completely broken 

diff --git a/design_doc/system_assumptions.md b/design_doc/system_assumptions.md
@@ -2,7 +2,7 @@
 Fox manages a trajectory information table that contains summary, tagging, etc and a step data table that contains all the data (images, etc).  
 1. Episode information metadata should fit in memory. 
 2. Trajectory data can go beyond memory or hardware disks. 
-3. All trajectory data within an episode should fit in memory (TODO: this constraint should be relaxed)
+3. All trajectory data within an episode should fit in memory (TODO: this constraint should be relaxed, but `env_logger` package is the bottleneck here. On my dev machine (4G RAM), its `tfds.core.SeqentialWriter` experience a memory explosion when writing multiple sequences to the partition)
 
 ### Consistency
 1. Data can be collected distributedly on multiple robots /processes. 

diff --git a/examples/rtx_example/load.py b/examples/rtx_example/load.py
@@ -7,7 +7,7 @@
 
 dataset.load_rtx_episodes(
     name="berkeley_autolab_ur5",
-    split="train[:10]",
+    split="train[:1]",
 )
 
 dataset.export(format="rtx")
diff --git a/fog_rtx/rlds/writer.py b/fog_rtx/rlds/writer.py
@@ -96,7 +96,7 @@ def __init__(
         data_directory: str,
         ds_config: tfds.rlds.rlds_base.DatasetConfig,
         ds_identity: tfds.core.dataset_info.DatasetIdentity,
-        max_episodes_per_file: int = 1000,
+        max_episodes_per_file: int = 1,
         split_name: Optional[str] = None,
         version: str = "0.0.1",
         store_ds_metadata: bool = False,