Skip to main content

Desktop Activities Capture Setup

caution

Aria Pilot Dataset documentation is stored in Archive: Aria Data Tools, because it was Project Aria's first open source initiative and it uses a different data structure compared to our latest open releases. For the most up to date tooling and to find out about our other open datasets go to Project Aria Tools.

This website will be deleted in September 2024.

The Desktop Activities dataset was captured with a Project Aria device and a multi-view motion capture system.

Hardware setup

To record this dataset we built a system with 16 OptiTrack Prime X 13 W motion tracking cameras and 12 OptiTrack Prime Color FS color cameras with 1080 x 1920 pixel resolution. These cameras are mounted on a desktop rig, similar to the setup used to collect the Assembly101 dataset.

The multi-view system is calibrated with OptiTrack Motive to obtain the intrinsics and extrinsics of all cameras. We attached markers to the Project Aria device and objects being manipulated to track their motion. We also calibrated the Project Aria device to obtain the sensor trajectories with respect to multi-view camera coordinates.

Project Aria device sensor profile

Sensor profiles allow researchers to choose which sensors on the Project Aria device to use when collecting data.

For Desktop Activities, we used Sensor Profile M, so each Project Aria device recording contains:

  • One RGB camera stream with 1408x1408 pixel resolution
  • Two SLAM camera streams with 640x480 pixel resolution
  • One eye tracking (ET) camera stream with 320x240 pixel resolution
  • Two IMU sensor streams (1KHz and 800Hz)

Table 1: Sensor Profile M

SensorProfile M
SLAM - resolution640x480
SLAM - FPS15
SLAM - encoding formatRAW
SLAM - bits per pixel (bpp)8
RGB - gain, exposure and temperatureYes
ET - resolution320x240
ET - FPS15
ET - encoding formatJPEG
ET - bpp8
ET - gain and exposureYes
RGB - resolution1408x1408
RGB - FPS15
RGB - encoding formatJPEG
RGB - bpp8
RGB - gain, exposure and temperatureYes
IMU - RIGHT acc/gyro - rate1kHz
IMU - RIGHT temperature - rate~1Hz
IMU - LEFT acc/gyro - rate800Hz
IMU - LEFT temperature - rate~1Hz

Trigger alignment and synchronized frames

During recording, the multi-view system and the Project Aria device operated at different frame rates. The Project Aria device recorded at 15FPS and with multi-view cameras recorded at 60 FPS. When recording an activity, the multi-view system and the Project Aria device started and stopped recording asynchronously.Leveraging SMPTE timecode, all sensors were synchronized to a global timeline. In addition, all cameras were trigger aligned while recording.

Using Sensor Profile M, the Project Aria device produced 4 synchronized camera images (1 RGB, 2 SLAM, and 1 ET image) per frame. The multi-view system produced 12 synchronized RGB camera images per frame.

During the overlapping capture time, the camera trigger alignment let us accurately associate frames from the Project Aria device to frames from the multi-view system. With 15 FPS for the Project Aria device and 60 FPS for the multi-view system, 1 out of every 4 multi-view frames was trigger aligned to one Project Aria device frame.