An egocentric dataset for 3D hand and object tracking, from
HOT3D is a dataset for benchmarking egocentric tracking of hands and objects in 3D. The dataset includes 833 minutes of multi-view image streams, which show 19 subjects interacting with 33 diverse rigid objects and are annotated with accurate 3D poses and shapes of hands and objects. HOT3D is recorded with two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. We aim to support research on egocentric hand-object interaction by making HOT3D publicly available and by co-organizing public challenges on the dataset.
A website focusing on the HOT3D content from Project Aria is available at projectaria.com.
HOT3D offers 1.5M multi-view frames (3.7M+ images) recorded at 30 FPS with Project Aria and Quest 3. Each Aria frame includes one RGB 1408×1408 and two monochrome 640×480 images. Each Quest 3 frame includes two monochrome 1280×1024 images. Aria recordings also include 3D scene point clouds from SLAM and eye gaze signal.
To ensure diversity, we recruited 19 participants with different hand shapes and nationalities. In addition to simple pick-up/observe/put-down actions, recordings show scenarios resembling typical actions in a kitchen, office, and living room. All scenarios were captured in the same lab with scenario-specific furniture and regularly randomized decorative elements and lighting. In each ~2 minute recording, a participant interacts with up to 6 objects.
Hands and objects are annotated with ground-truth 3D poses and models. Models were scanned by in-house 3D scanners and poses obtained by a professional motion-capture system using small optical markers. Hand annotations are provided in the UmeTrack and MANO formats.
HOT3D includes high-fidelity 3D models of 33 rigid objects. Each model is captured with high-resolution geometry and PBR materials, using an in-house 3D scanner. The collection includes household and office objects of diverse appearance, size, and affordances.
To support research on model-free 3D object tracking and 3D object reconstruction, HOT3D offers two types of reference sequences which show all possible views at each object: (1) sequences showing a static object on a desk, when the object is standing upright and upside-down, and (2) sequences showing an object manipulated by hands.
HOT3D-Clips is a set of curated sub-sequences of the HOT3D dataset, provided to enable straightforward comparison of various tracking and pose estimation methods. Each clip has 150 frames (5 seconds) with ground-truth annotations available for all modeled objects and hands and passing our visual inspection. There are 3832 clips in total, with 1983 extracted from Aria recordings and 1849 extracted from Quest3 recordings. Documentation of HOT3D-Clips and Python utilities for working with the clips are in the HOT3D Toolkit.
We co-organize two public challenges on HOT3D at ECCV 2024: BOP Challenge 2024, focused on model-based and model-free 2D object detection and 6DoF pose estimation, and Multiview Egocentric Hand Tracking Challenge, focused on hand pose and shape estimation. To enable benchmarking methods for joint hand and object tracking, the two challenges use the same training and test splits of HOT3D-Clips. We invite authors of relevant methods to participate in the challenges.
HOT3D Toolkit is available on GitHub and provides Python API for downloading and using the full HOT3D dataset, and for loading and visualizing HOT3D-Clips.
By using the dataset, you acknowledge and agree to comply with the HOT3D license agreement.
If you find the dataset useful, please cite the HOT3D whitepaper:
@article{banerjee2024introducing, title={Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking}, author={Banerjee, Prithviraj and Shkodrani, Sindi and Moulon, Pierre and Hampali, Shreyas and Zhang, Fan and Fountain, Jade and Miller, Edward and Basol, Selen and Newcombe, Richard and Wang, Robert and Engel, Jakob Julian and Hodan, Tomas}, journal={arXiv preprint arXiv:2406.09598}, year={2024} }