Skip to main content

3D Coordinate Frame Conventions for Project Aria Glasses

This page provides an overview of 3D Coordinate Frame Conventions used for Project Aria glasses, covering:

  • Representation of 6-DoF poses
  • 3D Coordinate frame and system conventions
  • Non-visual sensor coordinate systems
  • Central Pupil Frame (CPF)

Go to the Project Aria FAQ for more calibration information and resources.

SE(3) Lie groups

Extrinsics in calibration refer to the 6-DoF pose among the sensors. These 6-DoF poses are represented by SE(3) Lie group. The quaternion part of SE(3) uses Hamilton convention following the Eigen library, in which the exact formula to convert a quaternion to a rotation matrix of the SE(3) can be found in the Eigen code repository.

We use the SE3d class in the Sophus Library to represent SE(3) Lie groups, and provide a minimal pybind for the class.

A note on sensor naming and motivation

T_sensor1_sensor2 represents a relative SE(3) transformation from sensor2 frame to sensor1 frame. An easy mnemonic is the chaining principle is: T_sensor1sensor2 * Tsensor2sensor3 * psensor3 = p_sensor1 (where p_sensor is a 3D point measured from sensor)

Code

transform_a_b represents a SE(3) rigid transformation from b coordinate frame to a coordinate frame. p_a represents an R^3 point (or vector) in the coordinate system of a. Easy mnemonics of the chaining principle (a, b, c are coordinate frames):

transform_a_c = transform_a_b @ transform_b_c;
p_a = transform_a_b @ p_b

If you want to get quaternion from the SE3d, please notice the order is consistent to numpy

quaternion_a_b = transform_a_b.to_quat() # order is w, x, y, z

3D Coordinate frame conventions

Every sensor on Aria glasses has their own local coordinate system. We represent the 6DoF pose of each sensor as the relative pose (rotation and translation) with regard to the “Device frame". The device frame is by-default the local frame of the left Mono Scene (SLAM) camera.

Project Aria device with the x, y and z axises for each sensor

Camera coordinate system convention

A camera's local frame has its origin at the camera's optical center. Coarsely, when the camera is placed up-right, the camera coordinate frame's axes points to left, up and forward.

More rigorously, we define a camera's local frame based on the optical axis and the entrance pupil of its lens. Both are uniquely defined for each camera according to the camera's lens prescription. The origin of a camera's local frame is at center of the camera's entrance pupil. The frame's Z axis is aligned with the optical axis. The camera's X axis are aligned with the projection of the image plane's X axis on the entrance pupil plane. The cross-product of the X and Z axis defines the system's Y axis.

Diagram of Project Aria coordinates, showing the Optical center and the Lens FOV

Non-visual sensor coordinate system

We choose the IMU coordinate systems to have their origins at the position of the accelerometer, oriented along the direction of the accelerometer sensitive axis, eventually orthogonalized to compensate for sensor orthogonalities error. We use a similar arrangement for the magnetometer.

Diagram of Project Aria glasses IMU coordinates

The nominal Central Pupil Frame (CPF)

The CPF frame is placed at the midpoint between the eye boxes of the left and right eye. CPF's X-axis points left, Y-axis points up and the Z-axis points forward, from the person's perspective. Aria's ET gaze is defined as a vector in the CPF space originating at (0,0,0)(0, 0, 0) of the CPF frame.

3D model of a head showing the CPF used in Project Aria data