Skip to main content

MPS Output - Eye Gaze

Eye Gaze Data Format

Project Aria's Machine Perception Services (MPS) uses Aria's Eye Tracking (ET) camera images to estimate the direction the user is looking. This eye gaze estimation is in Central Pupil Frame. Eye Gaze outputs may be part of Open Dataset Releases or Project Aria Partners can request MPS services on their own data.

Eye Gaze outputs are available for all recordings made with Eye Tracking cameras enabled. Partner data is not made available to Meta researchers or Meta’s affiliates. Go to MPS Data Lifecycle for more details about how partner data is processed and stored.

In March 2024, we updated our eye gaze model to support depth estimation. We do this by providing left and right eye gaze directions (yaw values) along with the depth at which these gaze directions intersect (translation values).

Diagram of Eye Gaze Model

Figure 1: Diagram of the March 2024 Model, showing vergence Left, Right and Combined Eye Gaze Directions.

In the new model, the convergence points and distances are derived from the predicted gaze directions. The combined direction’s yaw is used to populate the yaw field of the EyeGaze object for backwards compatibility. The pitch is common to left, right and combined gaze directions.

Eye Gaze MPS file outputs are:

  • summary.json - high level report on MPS eye gaze generation
  • general_eye_gaze.csv - based on the standard eye gaze configuration
  • personalized_eye_gaze.csv - only if you’ve made the recording with in-session Eye Gaze Calibration

Further resources

general_eye_gaze.csv

general_eye_gaze.csv outputs are available for all recordings made with Eye Tracking cameras.

At this time, all Open Dataset Eye Gaze data was computed using the older model. Aria Digital Twin (ADT) used their ground truth system to compute eye gaze depth for their release.

Where a cell shows 0, the data is not provided for that model, but to ensure backwards compatibility it will be represented as 0 in Project Aria Tools readers.

ColumnTypePre March 2024 ModelNew Model
tracking_timestamp_usintThe timestamp, in microseconds, of the eye tracking camera frame in device time domain. The MPS location output also contains pose estimations in the same time domain and these timestamps can be directly used to infer the device pose from the MPS location output.Same.
yaw_rads_cpffloatEye gaze yaw angle in radians in CPF frame. The yaw angle is the angle between the projection of the eye gaze vector (originating at CPF) on XZ plane and the Z axis in the CPF frame.Not provided, however this value can be computed using helper functions and is automatically read when parsed with our data utilities.
pitch_rads_cpffloatEye gaze pitch angle in radians in CPF frame. The pitch angle is the angle between the projection of the eye gaze vector (originating at CPF) on YZ plane and the Z axis in the CPF frame.This now corresponds to the common pitch of the left and right gaze direction.
depth_mfloatNot available.Absolute depth in meters of the 3D gaze point in CPF frame. Depth values are capped at 4m. If there are timestamps where the predicted values are degenerate, this cell is empty.
yaw_low_rads_cpffloatLower bound of the confidence interval for the yaw estimation.0
pitch_low_rads_cpffloatLower bound of the confidence interval for the pitch estimation.0
yaw_high_rads_cpffloatUpper bound of the confidence interval for the yaw estimation.0
pitch_high_rads_cpffloatUpper bound of the confidence interval for the pitch estimation.0
left_yaw_rads_cpffloat0Left eye gaze yaw angle in radians in the CPF frame. The left yaw angle is the angle between the projection of the left eye gaze vector (originating at CPF) on the XZ plane and the Z axis in the CPF frame.
right_yaw_rads_cpffloat0Right eye gaze yaw angle in radians in the CPF frame. The right yaw angle is the angle between the projection of the left eye gaze vector (originating at CPF) on the XZ plane and the Z axis in the CPF frame.
left_yaw_low_rads_cpffloat0Lower bound of left eye gaze yaw angle in radians in CPF frame.
right_yaw_low_rads_cpffloat0Lower bound of the right eye gaze yaw angle in radians in the CPF frame.
left_yaw_high_rads_cpffloat0Upper bound of the left eye gaze yaw angle in radians in the CPF frame.
right_yaw_high_rads_cpffloat0Upper bound of the right eye gaze yaw angle in radians in the CPF frame.
tx_left_eye_cpffloat0Translation along the X direction from CPF origin to left eye position.
ty_left_eye_cpffloat0Translation along the Y direction from CPF origin to left eye position.
tz_left_eye_cpffloat0Translation along the Z direction from CPF origin to left eye position.
tx_right_eye_cpffloat0Translation along the X direction from CPF origin to right eye position.
ty_right_eye_cpffloat0Translation along the Y direction from CPF origin to right eye position.
tz_right_eye_cpffloat0Translation along the Z direction from CPF origin to right eye position.
session_uidstringUnique identifier for a session within the VRS fileSame.

personalized_eye_gaze.csv

personalized_eye_gaze.csv outputs are only generated if the recording has in-session Eye Gaze Calibration data. The schema is exactly the same as general_eye_gaze.csv. The session_uids between general and personalized output will be the same.

The in-session calibration is used to compute user specific calibration (gaze correction parameters). The yaw and pitch values will be adjusted based on this calibration. If the instructions for in-session calibration are followed correctly, the calibrated eye gaze is expected to be more accurate compared to general eye gaze.

Personalized calibration parameters, pre March 2024 model

Four coefficients are used to generate the output [s_y, s_p, o_y, o_p]:

  • corrected_yaw = s_y * yaw + o_y
  • corrected_pitch = s_p * pitch + o_p

Personalized calibration parameters, new model

Six coefficients are used in the new model output [s_ly, s_ry, s_p, o_ly, o_ry, o_p]:

  • corrected_left_yaw = s_ly * left_yaw + o_ly
  • corrected_right_yaw = s_ry * right_yaw + o_ry
  • corrected_pitch = s_p * pitch + o_p

We also use a diopter delta parameter that corrects the depth:

  • corrected_depth = 1 / ((1 / depth) - diopter_delta)

The diopter delta is calculated during calibration (see Stage 2 below) is the median of difference between 1 / predicted_depth and 1 / groundtruth_depth values.

General Principles

The following principles apply to general_eye_gaze.csv and personalized_eye_gaze.csv

Confidence Intervals

The confidence intervals represent the models uncertainty estimation. A smaller interval represents higher confidence and a wider interval represents lower confidence. The confidence interval angles are in radians and in CPF frame. Some common factors that impact uncertainty include:

  • Blinking
  • Hair occluding the eye tracking cameras
  • Re-adjusting glasses or taking them off to clean them

For utility function to load the eye gaze in Python and C++, please check the code examples

Session_uid

When there are multiple users in the same vrs file (users handing off glasses to a different user without stopping the recording), session_uid identifies intervals corresponding to different calibration sessions if in-app calibration is performed during the hand-offs.

  • All the rows with the same session_uid belong to the same session within the VRS file
  • If there are multiple calibration sessions, the session_uid would be unique for each session

general_eye_gaze.csv

  • There will be a single value when there is no in-session eye calibration or only one in-session calibration
  • The session_uid column values will always match those in personalized_eye_gaze.csv

Examples

  • No calibrated eye gaze - general_eye_gaze will have one session_uid across all rows
  • One in-session calibration - general_eye_gaze will have one session_uid across all rows and this value will be identical in personalized_eye_gaze
  • k > 1 in-session calibrations - both general and calibrated eye gaze will have k unique session_uid that start when in-session calibration begins and this value will be identical in personalized_eye_gaze

summary.json

The summary.json file provides a high level overview of the output for each of the major stages. This is similar to the operator summary output from the MPS location pipeline.

For each stage of the ET pipeline, there will be one section in this file. If the section is missing, that means that the stage is not applicable or was not run.

Stage 1: GazeInference (all recordings)

Uncalibrated Eye Gaze derived data has been generated. If you’re able to download the data to view the .json file it will say SUCCESS.

NameTypeDescription
statusstringSUCCESS (if you are able to download the data and view this file)
messagestringAny further details, if available

Stage 2: InSessionCalibration (if in-session calibration available)

If the recording contains one or more valid in-session calibration intervals, the ET pipeline will compute the calibration parameters.

Each calibration session found in the VRS file will generate the following information:

NameTypeDescription
statusstringSUCCESS / FAIL
messagestringAny further details, if available
session_uidstringUnique ID representing the session
start_time_usintWhen the first wearer starts using the Aria glasses, or when subsequent wearer begins in-session calibration (2nd eye calibration onwards)
end_time_usintWhen a wearer session or recording ends
paramsArray[float]The calibration parameters (4 floats)
note

The status should be SUCCESS, unless there was an issue where the wearer began the in-session calibration, but did not generate the necessary data. In this case it would FAIL.

Stage 3: CalibrationCorrection

If Stage 2 has been successful, CalibrationCorrection will contain details about calibrated eye gaze. For each calibration session, we will output the following information:

NameTypeDescription
statusstringSUCCESS / FAIL
messagestringAny further details, if available
session_uidstringUnique id representing the session
generalized_gaze_error_radsdictGeneral gaze error in radians
calibrated_gaze_error_radsdictCalibrated gaze error in radians

If the previous stages completed successfully, the status for this stage should always be SUCCESS.

Example summary.json files

Scenario 1: No calibration available

This report is quite short, as no in-session calibration data is available. Eye Gaze MPS was successfully created:

{
"GazeInference": {
"status": "SUCCESS"
}
}

Scenario 2: In-session calibration available

In this example, there were multiple calibration sessions:

  • In session one calibration was completed successfully
  • In session two, the user began the in-session calibration, but did not generate the necessary data.
{
"GazeInference": {
"status": "SUCCESS"
},
"InSessionCalibration": [
{
"Status": "SUCCESS",
"session_uid": "01ac9bf2-334a-49c6-9dc6-fdc07ab08a2a",
"message": "",
"start_time_us": 147588973,
"end_time_us": 208304973,
"num_calibu_frames": 1000,
"parameters":[1.02361481, 1.05426864, 0.01158671, 0.01403982]
},
{
"Status": "FAIL",
"message": "Couldn't compute GT gaze vectors for the interval [487241235, 508304973]",
"session_uid": "6063bf11-84ef-4ed5-a785-ac44b4328fdc",
"start_time_us": 487241235,
"end_time_us": 508304973,
"num_calibu_frames": 10,
}
],
"CalibrationCorrection": [
{
"status": "SUCCESS",
"message": "",
"session_uid": "01ac9bf2-334a-49c6-9dc6-fdc07ab08a2a",
"generalized_gaze_error_rads": {
"mean": 0.047444001119500284,
"std": 0.015775822542178554,

"min": 0.009264740570696107,

"max": 0.16895371875829926,

"p25": 0.036160872560797655,

"p50": 0.04529629090291307,

"p75": 0.05761677117669144,

"p95": 0.0675233675673802
},
"calibrated_gaze_error_rads": {
"mean": 0.037444001119500284,

"std": 0.005775822542178554,

"min": 0.006364740570696107,

"max": 0.06835371875829926,

"p25": 0.026060872560797655,

"p50": 0.02519329090291307,

"p75": 0.03760677117669144,

"p95": 0.0474232675673802
}
},
{
"status": "FAILURE",
"message": "No calibration available for this session",
"session_uid": "6063bf11-84ef-4ed5-a785-ac44b4328fdc"
} ]
}