ASE Data Format
This page provides an overview of Aria Synthetic Environments (ASE) data formats and organization.
Using the code snippets and tools listed in Data Tools and Visualization, researchers should be able to quickly onboard this data into ML pipelines.
Overall Data Organization
- Each scene has its own subdirectory with a unique ID (0-100K)
- Each scene directory contains separate files and directories for each type of data
<sceneID>
├── rgb
│ └── vignette0000000.jpg
│ └── vignette0000001.jpg
│ ...
│ └── vignette0xxn.jpg
├── depth
│ └── depth0000000.jpg
│ └── depth0000001.jpg
│ ...
│ └── depth0xxn.jpg
├── instances
│ └── instance0000000.jpg
│ └── instance0000001.jpg
│ ...
│ └── instance0xxn.jpg
├── ase_scene_language.txt
├── trajectory.txt
├── semidense_points.csv.gz
├── semidense_observations.csv.gz
└── object_instances_to_classes.json
rgb
- 2D RGB fisheye images- Synthetically generated Aria RGB images at 10 FPS
- Each image is saved in JPEG format
depth
- 2D depth maps (16 bit)- Each depth image is the same size as the corresponding synthetic RGB image, where the pixel contents are integers expressing the depth along the pixel’s ray direction, in units of mm.
- This should not be confused with ADT depth images, which describe the depth in the camera’s Z-axis
- Each image is saved in PNG format
- Each depth image is the same size as the corresponding synthetic RGB image, where the pixel contents are integers expressing the depth along the pixel’s ray direction, in units of mm.
instances
- 2D segmentation maps (16 bit)- Each segmentation image is the same size as the corresponding synthetic RGB image, where the pixel contents are integers expressing the object Id that was observed by the pixel
- Each image is saved as PNG format
ase_scene_language.txt
- 3D floor plan definition- Describes the scene in the form of a language.
- Each row is a command which includes its own set of parameters. A set of such commands describe the geomtery of the scene specified.
- Go to ASE scene language format below for more details
trajectory.txt
- Ground-truth trajectory- Go to MPS Output - Trajectory for how the data is structured
- While the file structure is the same, please note, this is the ground truth trajectory, not an output generated by MPS
- Go to MPS Output - Trajectory for how the data is structured
semidense_points.csv.gz
- Semi-dense map points- Go to MPS Output - Semi-Dense Point Cloud for how the data is structured
- Produced by MPS run on synthetic SLAM (mono scene) camera data
semidense_observations.csv.gz
- Semi-dense map observations- Go to MPS Output - Semi-Dense Point Cloud for how the data is structured
- Produced by MPS run on synthetic SLAM (mono scene) camera data
object_instances_to_classes.json
- Per-scene mappings from the object instance image IDs to object classes
- Given an instance image pixel value/object ID, one will then be able to look up the class from this mapping
Aria RGB Sensor - Image, Depth and Instance Segmentation
For each frame from the RGB sensor we provide:
- A vignetted sensor image
- Simulated 16 bit metric depth (mm) in PNG image format
- A segmentation image (16 bit PNG)
The images in each folder are in sync. This means there will be same number of images in each folder. We also provide example data visualizers to load these images and/or associate them.
ASE Scene Language Format
The ASE Scene Language format is set of hand-designed procedural commands in pure text form. To handle commonly encountered static indoor layout elements, we use three commands:
make_wall
- the full set of parameters specifies a gravity-aligned oriented boxmake_door
- specify box-based cutouts from wallsmake_window
- specify box-based cutouts from wall
Each command includes its own set of parameters, as described below. Given the command’s full set of parameters, a geometry is completely specified.
A single scene is described via a sequence of multiple commands stored in ase_scene_language.txt
. The sequence length is arbitrary and follows no specific ordering. The interpretation of the command and its arguments is carried out by a customized interpreter responsible for parsing the sequence and generating a 3D mesh of the scene.
Trajectory and Semi-Dense Map Points
Ground-truth trajectory data provides poses for each frame generated from a simulation at 10 FPS. We are follow the same trajectory format as the closed loop trajectory used by Machine Perception Services (MPS).
For semi-dense map point clouds and their observations, we follow the same point cloud points and observations format as MPS. The semi-dense map point cloud is generated using same algorithm as MPS, with the addition of ground-truth trajectory and simulated SLAM camera images.