HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

HOT3D is a dataset for benchmarking egocentric tracking of hands and objects in 3D. The dataset includes 833 minutes of multi-view image streams, which show 19 subjects interacting with 33 diverse rigid objects and are annotated with accurate 3D poses and shapes of hands and objects. HOT3D is recorded with two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. We aim to support research on egocentric hand-object interaction by making HOT3D publicly available and by co-organizing public challenges on the dataset.

A website focusing on the HOT3D content from Project Aria is available at projectaria.com.

News

August 19, 2025 - All object-onboarding sequences (including new ones from Aria) are now available on Hugging Face (folders starting with "object_ref", documentation).
June 1, 2025 - HOT3D poster is ready to be presented at CVPR 2025 next week.
May 1, 2025 - HOT3D paper on arXiv.org updated with the CVPR 2025 camera-ready version.
March 19, 2025 - HOT3D featured in an NVIDIA keynote by Jensen Huang.
February 26, 2025 – HOT3D paper accepted to CVPR 2025 as highlight (top 13.5% of accepted papers).
December 2, 2024 - Full HOT3D paper is now available on arXiv.org.
September 16, 2024 - HOT3D can now be browsed in Aria Dataset Explorer (Aria data, Quest 3 data).
September 11, 2024 - Submission deadline of the BOP Challenge 2024 is extended to November 29, and of the Egocentric Hand Tracking Challenge to September 22. While the 2024 winners will be selected from results submitted before these deadlines, the submission forms will stay open indefinitely.
July 1, 2024 - HOT3D-Clips used in two ECCV 2024 challenges: BOP Challenge 2024 on object detection and pose estimation, and Egocentric Hand Tracking Challenge on hand pose and shape estimation.
June 17, 2024 - HOT3D publicly announced at EgoVis CVPR 2024 workshop in Seattle.
June 13, 2024 - HOT3D whitepaper is now available on arXiv.org.

Dataset properties

833 minutes of egocentric, multi-view, synchronized recordings

HOT3D offers 1.5M multi-view frames (3.7M+ images) recorded at 30 FPS with Project Aria and Quest 3. Each Aria frame includes one RGB 1408×1408 and two monochrome 640×480 images. Each Quest 3 frame includes two monochrome 1280×1024 images. Aria recordings also include 3D scene point clouds from SLAM and eye gaze signal.

19 subjects, 4 everyday scenarios

To ensure diversity, we recruited 19 participants with different hand shapes and nationalities. In addition to simple pick-up/observe/put-down actions, recordings show scenarios resembling typical actions in a kitchen, office, and living room. All scenarios were captured in the same lab with scenario-specific furniture and regularly randomized decorative elements and lighting. In each ~2 minute recording, a participant interacts with up to 6 objects.

Accurate 3D ground-truth annotations of hands and objects

Hands and objects are annotated with ground-truth 3D poses and models. Models were scanned by in-house 3D scanners and poses obtained by a professional motion-capture system using small optical markers. Hand annotations are provided in the UmeTrack and MANO formats.

High-fidelity 3D object models

HOT3D includes high-fidelity 3D models of 33 rigid objects. Each model is captured with high-resolution geometry and PBR materials, using an in-house 3D scanner. The collection includes household and office objects of diverse appearance, size, and affordances.

Sequences for object onboarding

To support research on model-free 3D object tracking and 3D object reconstruction, HOT3D offers two types of reference sequences which show all possible views at each object: (1) sequences showing a static object on a desk, when the object is standing upright and upside-down, and (2) sequences showing an object manipulated by hands.

HOT3D-Clips

HOT3D-Clips is a set of curated sub-sequences of the HOT3D dataset, provided to enable straightforward comparison of various tracking and pose estimation methods. Each clip has 150 frames (5 seconds) with ground-truth annotations available for all modeled objects and hands and passing our visual inspection. There are 3832 clips in total, with 1983 extracted from Aria recordings and 1849 extracted from Quest3 recordings. Documentation of HOT3D-Clips and Python utilities for working with the clips are in the HOT3D Toolkit.

Clips extracted from Aria recordings. Only the RGB image stream is shown (Aria recordings additionally include two monochrome image streams). Contours of 3D models of hands and objects in the ground-truth poses are shown in white and green, respectively.

Clips extracted from Quest 3 recordings. Only one of the monochrome image streams available in Quest 3 recordings is shown. Contours of 3D models are shown in white and green as in Aria clips.

Public challenges on HOT3D

We co-organize two public challenges on HOT3D at ECCV 2024: BOP Challenge 2024, focused on model-based and model-free 2D object detection and 6DoF pose estimation, and Multiview Egocentric Hand Tracking Challenge, focused on hand pose and shape estimation. To enable benchmarking methods for joint hand and object tracking, the two challenges use the same training and test splits of HOT3D-Clips. We invite authors of relevant methods to participate in the challenges.

Download

Download full HOT3D dataset from projectaria.com
- Download instructions
- Documentation of data format (based on VRS)
- Released versions
Download HOT3D-Clips from Hugging Face
- Download instructions
- Documentation of data format (based on Webdataset)
- Used in BOP Challenge 2024 and Multiview Egocentric Hand Tracking Challenge

HOT3D Toolkit

HOT3D Toolkit is available on GitHub and provides Python API for downloading and using the full HOT3D dataset, and for loading and visualizing HOT3D-Clips.

License

By using the dataset, you acknowledge and agree to comply with the HOT3D license agreement.

Citation

If you find the dataset useful, please cite the HOT3D paper:

@article{banerjee2024hot3d,
  title={{HOT3D}: Hand and Object Tracking in {3D} from Egocentric Multi-View Videos},
  author={Banerjee, Prithviraj and Shkodrani, Sindi and Moulon, Pierre and Hampali, Shreyas and Han, Shangchen and Zhang, Fan and Zhang, Linguang and Fountain, Jade and Miller, Edward and Basol, Selen and Newcombe, Richard and Wang, Robert and Engel, Jakob Julian and Hodan, Tomas},
  journal={CVPR},
  year={2025}
}