Skip to main content

Desktop Activities Overview

caution

Aria Pilot Dataset documentation is stored in Archive: Aria Data Tools, because it was Project Aria's first open source initiative and it uses a different data structure compared to our latest open releases. For the most up to date tooling and to find out about our other open datasets go to Project Aria Tools.

This website will be deleted in September 2024.

Desktop Activities recorded a single operator manipulating objects over a desktop space. Data was captured by the operator wearing a Project Aria device and by a multi-view motion capture system. Most of the objects manipulated were 10 commonly used objects from the YCB Benchmark object set.

Figure 1 presents these objects with the corresponding YCB object indices. During recording, we also randomized the desktop appearance by placing non-YCB objects on the desk.

YCB Objects Figure 1: YCB Objects Used

Note. The items used were Cracker Box (003), Sugar Box (004), Mustard Bottle (006), Banana (011), Bleach Cleanser (021), Mug (025), Power Drill (035), Large Marker (040), Large Clamp (051), Extra Large Clamp (052). The number in bracket corresponds to the original YCB object index.

Statistics and Scenarios

For Desktop Activities, we captured 16 sequences in total, with each recording taking two minutes on average. In this dataset:

  • Multiple objects were manipulated in 5 sequences
  • A single object was manipulated in 10 sequences
  • Non-YCB objects were tidied up and stacked in one sequence

In total, the dataset provide 23066 frames per 4 camera streams from the Project Aria device, and 78042 frames per 12 RGB camera streams from the multi-view system. Out of these frames, 19471 frames were trigger-aligned across all 16 cameras, for a total of 311536 accurately aligned images. The Project Aria device recorded at 15FPS and the multi-view camera recorded at 60FPS. The devices, although time aligned, started and stopped asynchronously.

Table 1 Desktop Activity Statistics and YCB Objects per Recording ID

Recording IDNumber of Project Aria device framesNumber of multi-view camera framesNumber of trigger-aligned frames per present cameraTracked YCB Object/s
011,2184,0001,000None, sorting non-YCB objects
021,1704,6411,128Power drill, bleach cleanser and mustard bottle
031,6035,6521,406Power drill, bleach cleanser, large clamp and extra large clamp
041,4644,8001,200Power drill, bleach cleanser, large clamp and extra large clamp
051,3804,9001,225Cracker box, sugar box, banana and mustard bottle
061,8606,7001,675Cracker box, sugar box, banana and mustard bottle
079693,200800Banana
089643,130783Bleach cleanser
091,2644,3001,075Cracker box
101,2064,0001,000Large clamp
111,4094,9001,225Power drill
121,6245,8001,450Large clamp
131,5852,500625Mug
141,4765,2701,317Large marker
151,9096,9991,749Sugar box
161,9667,2501,813Mustard bottle
Total number of frames23,06678,04219,471
Total number of images92,264936,504311,536

Note. For Number of Project Aria device frames, each frame contains of 4 trigger-aligned images (1 RGB, 2 SLAM and 1 ET). For Number of multi-view camera frames, each frame contains 12 trigger-aligned RGB images.

File Structure

Each recording uses this structure:

 Recording number
├── aria_recording.vrs
├── aria_checksum.txt
├── aria_timestamp.csv
├── aria_trajectory.csv
├── multiview_recording.vrs
├── multiview_recording_checksum.txt
  • aria_recording.vrs — provides the Project Aria recordings in VRS format. The cameras’ calibration parameters are stored as meta data inside the VRS file. The VRS file contains 1 RGB camera stream, 2 SLAM camera streams, 1 eye tracking (ET) camera stream and 2 IMU streams. The sensor timestamps reflect the device’s local time.
  • multiview_recording.vrs — provides multi-view recordings in VRS format. This file contains 12 RGB camera streams, recorded at 60 FPS. The cameras’ calibration parameters are stored as meta data inside the VRS file. The intrinsic calibration follows OpenCV camera calibration to describe the radial and tangential distortion for a pinhole camera model. The sensor timestamps reflect the synchronized global time.
  • aria_timemap.csv — Provides the time mapping from the Project Aria device timestamp to the synchronized timeline, using the Timestamps Mapping Data format.
  • aria_trajectory.csv – Provides the 6DoF SE(3) transformation of the Project Aria device with respect to the world coordinates. The world coordinates are defined by the multi-view camera extrinsics calibration. The trajectory is stored at 60 FPS and has a synchronized global timestamp.

Table 2: aria_trajectory.csv file format

global timestamp (ns)Position in world (t_world_device) [meters]Rotation R_world_device
timestamptxtytzqxqyqzqw

Figure 2: Desktop Activity Visualization