Skip to main content

Project Aria Machine Perception Services

To accelerate research with Project Aria, we provide several Spatial AI machine perception capabilities that help form the foundation for future Contextualized AI applications and analysis of egocentric data. These capabilities are powered by a set of proprietary machine perception algorithms, designed for Project Aria glasses, that provide superior accuracy and robustness on Aria data compared to off-the-shelf open source algorithms.

All Machine Perception Services (MPS) are offered as post-processing of VRS files via a cloud service. Use the MPS CLI or the Desktop App, to request derived data from any VRS file that contains necessary sensor data.

note

When research partners submit data for processing, the data is only used to serve MPS requests. Partner data is not made available to Meta researchers or Meta’s affiliates. Go to MPS Data Lifecycle for more details about how partner data is processed and stored.

Current MPS offerings

The following MPS can be requested, as long as the data has been recorded with a compatible Recording Profile. The Recording Profile Guide provides a quick list of compatible sensor profile,s or go to Recording Profiles in Technical Specifications for more granular information about each profile.

MPS offerings are grouped into SLAM, Eye Gaze and Hand Tracking services.

SLAM services

To get these outputs the recording profile must have SLAM cameras + IMU enabled.

6DoF trajectory

MPS provides two types of high frequency (1kHz) trajectories:

  • Open loop trajectory - local odometry estimation from visual-inertial odometry (VIO)
  • Closed loop trajectory - created via batch optimization, using multi-sensors' input (SLAM, IMU, barometer, Wi-Fi and GPS), fully optimized and provides poses in a consistent frame of reference.

Semi-dense point cloud

Semi-dense point cloud data supports researchers who need static scene 3D reconstructions, reliable 2D images tracks or a representative visualization of the environment.

Online sensor calibration

The time-varying intrinsic and extrinsic calibrations of cameras and IMUs are estimated at the frequency of the SLAM (mono scene) cameras by our multi-sensor state estimation pipeline.

Multi-SLAM

Multi-SLAM can be requested on two or more recordings. It creates all of the above SLAM output, in a shared co-ordinate frame.

Multi-SLAM can only be requested using the MPS CLI SDK.

Eye Gaze services

Eye Gaze data is generated using Aria's Eye Tracking (ET) camera images to estimate the direction the user is looking. These outputs can be generated for any recording that had ET cameras enabled.

In March 2024, we updated our eye gaze model to support depth estimation. We do this by providing left and right eye gaze directions (yaw values) along with the depth at which these gaze directions intersect (translation values).

If you have made a recording with In-Session Eye Gaze Calibration, you will receive a second .csv file with calibrated eye gaze outputs.

Hand Tracking

To compute hand tracking outputs, the recording profile must have SLAM cameras enabled.

Wrist and Palm Tracking

Wrist and Palm Tracking data is created by using SLAM camera images to estimate the hand movement of the wearer. The wrist and palm poses are given in the device frame in meters.

About MPS Data Loader APIs

Please refer to our MPS data loader APIs (C++ and Python support) to load the MPS outputs into your application. The visualization guide shows how to visualize all the MPS outputs.

Questions & Feedback

If you have feedback you'd like to provide, be it overall trends and experiences or where we can improve, we'd love to hear from you. Go to our Support page for different ways to get in touch.