Skip to main content

Project Aria VRS Format

This page provides information about how Aria data streams are identified in VRS, Aria sensor data configuration, followed by useful VRS tools for common use cases.

Project Aria data is stored in VRS, which is optimized to record and playback streams of multi-modal sensor data, such as images, audio, and other discrete sensors (IMU, temperature, etc.). Data is stored in per-device streams of time-stamped records.

The most intuitive way to access Aria data is via our loaders and visualizers. Go to Data Utilities for Python and C++ interfaces to easily access VRS data. You may also want to use vrstools to extract or inspect VRS data.

Aria data streams

In VRS, data is organized by streams. Each stream stores the data measured by a specific sensor, except for Eye Tracking (both camers share a single data stream) and microphones (up to 7 microphones share a single stream).

The streams are identified by their stream ID. Each stream ID is composed of two parts, a recordable type ID to categorize the type of the sensor and a stream ID for identify the specific sensor instance. E.g. the first SLAM (aka Mono Scene) camera is identified as 1201-1 where 1201 is the numerical ID for SLAM camera data type, and 1 identifies the cameras as the first instance.

Streams can also be identified by a short label. Labels are used to identify sensors in calibration.

The following table lists the Stream ID and Recordable Type ID, as well as their label. GPS, Wi-Fi and Bluetooth have labels, but are not calibrated. If you use projectaria tools loaders, you do not have to memorize this mapping as there is an API that converts between Stream ID and labels.

Table 1: IDs Used for Sensors

Sensor Stream ID Recordable Type ID label
ET camera 211-1 EyeCameraRecordableClass camera-et
RGB camera 214-1 RgbCameraRecordableClass camera-rgb
Microphone 231-1 StereoAudioRecordableClass mic
Barometer 247-1 BarometerRecordableClass baro
GPS 281-1 GpsRecordableClass gps
Wi-Fi 282-1 WifiBeaconRecordableClasswps
Bluetooth 283-1 BluetoothBeaconRecordableClassbluetooth
SLAM/Mono Scene camera left 1201-1 SlamCameraData camera-slam-left
SLAM/Mono Scene camera right1201-2 SlamCameraData camera-slam-right
IMU (1kHz) 1202-1 SlamImuData imu-right
IMU (800Hz) 1202-2 SlamImuData imu-left
Magnetometer 1203-1 SlamMagnetometerData mag

Each stream also contains a configuration blob that stores sensor-specific information such as image resolution and nominal frame rate.

All data in VRS is timestamped. Go to Timestamps in Aria VRS for more details.

Aria sensor data and configuration

Sensor data includes:

  • Sensor readout
  • Timestamps
  • Acquisition parameters (exposure and gain settings)
  • Conditions (e.g. temperature) during data collection

Most sensor data of a single stream and at a specific timestamp is stored as a single piece, except for image and audio.

How data is stored for image recordings

  • Each camera stores a single image frame at a time, with the exception of the ET camera.
    • ET cameras pair share a single image frame by concatenating horizontally.
  • The image frame contains two parts, the image itself and the image record.
    • The image record stores timestamps, frame id, and acquisition parameters, such as exposure and gain. This avoids having to read image data to get the information in the record.

How data is stored for audio recordings

  • The audio data is grouped into data chunks of 4096 audio samples from all 7 microphones.
  • Each chunk contains two parts, the data part for the audio signal, and the report part for the timestamps of each audio signal.

Sensor configuration blob

The sensor configuration blob stores the static information of a stream. Common sensor configuration stores information, such as sensor model, sensor serial (if available) as well as frame rate.

Stream-specific information, such as image resolution, is also stored in configurations.

Go to the source code for the detailed implementation of sensor data and configurations. Go to the Advanced Code Snippets for example sensor data and how to access sensor data using Python data utilities.

Useful VRS tools

The most intuitive way to access Aria data is via our loaders and visualizers. We provide Python and C++ interface to easily access VRS data.

You may also want to use VRS tools to extract or inspect VRS data. Here are a few common use cases:

Check the VRS file’s validity and integrity

The check command decodes every record in the VRS file and prints how many records were decoded successfully. It proves that the VRS file is correct at the VRS level. You can also compute a checksum to ensure you have valid VRS files. For more information go to VRS File Validation.

$ vrs check <file.vrs>
$ vrs checksum <file.vrs>

If the file is not valid, it is normally because there is missing data that could lead to invalid behavior with the tooling. All files in our open datasets are valid, so if you encounter issues with these, re-downloading the files should resolve the issue.

Extract image or audio content to folders

Use the following commands to extract JPEG or WAV files. Use the --to <folder_path> to specify a destination folder where the data will be extracted, or it will be added to the current working directory.

$ vrs extract-images <file.vrs> --to <image_folder>
$ vrs extract-audio <file.vrs> --to <audio_folder>

To extract RAW image files, use:

vrs extract-images <file.vrs> --raw-images --to <image_folder>

Extract all content to folders and JSONs

This command lets you extract all images, audio, and metadata into files:

vrs extract-all <file.vrs> --to <folder>

The metadata is extracted into a single JSONS file that contains a succession of json messages, one per line. Each line corresponds to a single record, in timestamp order, so it is possible to parse it even if the number of records is huge. Saving all the data in a single file prevents saturating your disk with possibly millions of small files. Once extracted, your file will look like this:

    ├── file.vrs
├── all_data
* `NNNN-MM` folders: image folders, one folder per stream containing images.
├── 1201-1 # SLAM Left images
├── *.jpg
├── 1201-2 # SLAM Right images
├── *.jpg
├── 211-1 # Eye Tracking images
├── *.jpg
├── 214-1 # RGB (Color) Camera images
├── *.jpg
├── metadata.jsons
└── ReadMe.md

For more information, go to VRS Data Extraction.

Inspect how many data recordings there are by type

vrs <file.vrs> | grep "] records."

Will get you a return like this:

623 Eye Camera Class #1 - device/aria [211-1] records.
1244 RGB Camera Class #1 - device/aria [214-1] records.
729 Stereo Audio Class #1 - device/aria [231-1] records.
3101 Barometer Data Class #1 - device/aria [247-1] records.
65 Time Domain Mapping Class #1 - device/aria [285-1] records.
623 Camera Data (SLAM) #1 - device/aria [1201-1] records.
623 Camera Data (SLAM) #2 - device/aria [1201-2] records.
61965 IMU Data (SLAM) #1 - device/aria [1202-1] records.
50002 IMU Data (SLAM) #2 - device/aria [1202-2] records.
619 Magnetometer Data (SLAM) #1 - device/aria [1203-1] records.

Each line reports how many data records are stored in each data stream as well as the stream ID. For example, in this line:

623 Camera Data (SLAM) #2 - device/aria [1201-2] records.

We can see that:

  • The stream name is Camera Data (SLAM) #2 (Mono Scene camera on the right) and identified by numerical ID [1201-2]
  • SLAM camera #2 has recorded 623 frames