Aria Gen 2 Pilot Dataset
The Aria Gen 2 Pilot Dataset is a multi-participant, egocentric dataset collected using Aria Gen 2 glasses with four participants (a primary wearer and three co-participants) simultaneously recording a variety of daily activities, resulting in rich, time-synchronized multimodal data. The dataset is structured to demonstrate the Gen 2 device capability and potential applications in computer vision, multimodal learning, robotics, and contextual AI.
Dataset Content
The Aria Gen 2 pilot dataset comprises four primary data content types:
- raw sensor streams acquired directly from Aria Gen 2 devices and recorded by Profile 8
- real-time machine perception outputs generated on-device via embedded algorithms during data collection
- offline machine perception results produced by Machine Perception Services (MPS) during post-processing; and
- outputs from additional offline perception algorithms. See below for details.
Content (1) and (2) are obtained natively from the device, whereas (3) and (4) are derived through post-hoc processing.
Additional Perception Algorithms
| Algorithm | Description | Output |
|---|---|---|
| Directional Automatic Speech Recognition (ASR) | Distinguishes between wearer and others, generating timestamped transcripts for all sequences. Enables analysis of conversational dynamics and social context. | Timestamped transcripts of speech. |
| Heart Rate Estimation | Uses PPG sensors to estimate continuous heart rate, reflecting physical activity and physiological state. Coverage for over 95% of recording duration. | Timestamped heart rate in beats per minute. |
| Hand-Object Interaction Recognition | Segments left/right hands and interacted objects, enabling analysis of manipulation patterns and object usage. | Segmentation masks for hands and objects per RGB image. |
| 3D Object Detection (Egocentric Voxel Lifting) | Detects 2D and 3D bounding boxes for objects in indoor scenes using multi-camera data. Supports spatial understanding and scene reconstruction. | 2D and 3D bounding boxes with class prediction. |
| Depth Estimation (Foundation Stereo) | Generates depth maps from overlapping CV cameras, enabling research in 3D scene understanding and object localization. | Depth images, rectified CV images, and corresponding camera intrinsics/extrinsics. |
Resources
For more information about the Aria Gen 2 Pilot Dataset:
- 📄 ArXiv Paper: https://arxiv.org/abs/2510.16134
- 🌐 Project Website: https://www.projectaria.com/datasets/gen2pilot/
- 💻 GitHub Repository: https://github.com/facebookresearch/projectaria_gen2_pilot_dataset
- 🔍 Dataset Explorer: https://explorer.projectaria.com/gen2pilot
Citation
If you use the Aria Gen 2 Pilot Dataset in your research, please cite the following:
@misc{kong2025ariagen2pilot,
title ={Aria Gen 2 Pilot Dataset},
author ={Chen Kong and James Fort and Aria Kang and Jonathan Wittmer and Simon Green and Tianwei Shen and Yipu Zhao and Cheng Peng and Gustavo Solaira and Andrew Berkovich and Nikhil Raina and Vijay Baiyya and Evgeniy Oleinik and Eric Huang and Fan Zhang and Julian Straub and Mark Schwesinger and Luis Pesqueira and Xiaqing Pan and Jakob Julian Engel and Carl Ren and Mingfei Yan and Richard Newcombe},
year ={2025},
eprint ={2510.16134},
archivePrefix ={arXiv},
primaryClass ={cs.CV},
url ={https://arxiv.org/abs/2510.16134},
}