MPS Output - Eye Gaze

Eye Gaze Data Format

Project Aria's Machine Perception Services (MPS) uses Aria's Eye Tracking (ET) camera images to estimate the direction the user is looking. This eye gaze estimation is in Central Pupil Frame. Eye Gaze outputs may be part of Open Dataset Releases or Project Aria Partners can request MPS services on their own data.

Eye Gaze outputs are available for all recordings made with Eye Tracking cameras enabled. Partner data is not made available to Meta researchers or Meta’s affiliates. Go to MPS Data Lifecycle for more details about how partner data is processed and stored.

In March 2024, we updated our eye gaze model to support depth estimation. We do this by providing left and right eye gaze directions (yaw values) along with the depth at which these gaze directions intersect (translation values).

Figure 1: Diagram of the March 2024 Model, showing vergence Left, Right and Combined Eye Gaze Directions.

In the new model, the convergence points and distances are derived from the predicted gaze directions. The combined direction’s yaw is used to populate the yaw field of the EyeGaze object for backwards compatibility. The pitch is common to left, right and combined gaze directions.

Eye Gaze MPS file outputs are:

summary.json - high level report on MPS eye gaze generation
general_eye_gaze.csv - based on the standard eye gaze configuration
personalized_eye_gaze.csv - only if you’ve made the recording with in-session Eye Gaze Calibration

What's the difference between eye tracking and eye gaze?

Simply put, eye tracking refers to the video recording of a wearer's eye motions using Aria's inward-facing cameras. If it helps, you can think of eye tracking as eye recording. Eye gaze, on the other hand, is extrapolated information about where the user's visual attention is focused based on the eye recordings. Eye gaze information is computed by Machine Perception Services as opposed to being native to the process of recording with Aria like eye tracking is.

Further resources

Visualize Data Using Python
Visualize Data Using C++
Eye Gaze Code Snippets, includes:
- Data loading
- Vector conversion (yaw pitch to 3 vector)
- Coordinates System (CPF_To_Device)

general_eye_gaze.csv

general_eye_gaze.csv outputs are available for all recordings made with Eye Tracking cameras.

At this time, all Open Dataset Eye Gaze data was computed using the older model. Aria Digital Twin (ADT) used their ground truth system to compute eye gaze depth for their release.

Where a cell shows 0, the data is not provided for that model, but to ensure backwards compatibility it will be represented as 0 in Project Aria Tools readers.

Column	Type	Pre March 2024 Model	New Model
`tracking_timestamp_us`	int	The timestamp, in microseconds, of the eye tracking camera frame in device time domain. The MPS location output also contains pose estimations in the same time domain and these timestamps can be directly used to infer the device pose from the MPS location output.	Same.
`yaw_rads_cpf`	float	Eye gaze yaw angle in radians in CPF frame. The yaw angle is the angle between the projection of the eye gaze vector (originating at CPF) on XZ plane and the Z axis in the CPF frame.	Not provided, however this value can be computed using helper functions and is automatically read when parsed with our data utilities.
`pitch_rads_cpf`	float	Eye gaze pitch angle in radians in CPF frame. The pitch angle is the angle between the projection of the eye gaze vector (originating at CPF) on YZ plane and the Z axis in the CPF frame.	This now corresponds to the common pitch of the left and right gaze direction.
`depth_m`	float	Not available.	Absolute depth in meters of the 3D gaze point in CPF frame. Depth values are capped at 4m. If there are timestamps where the predicted values are degenerate, this cell is empty.
`yaw_low_rads_cpf`	float	Lower bound of the confidence interval for the yaw estimation.	0
`pitch_low_rads_cpf`	float	Lower bound of the confidence interval for the pitch estimation.	0
`yaw_high_rads_cpf`	float	Upper bound of the confidence interval for the yaw estimation.	0
`pitch_high_rads_cpf`	float	Upper bound of the confidence interval for the pitch estimation.	0
`left_yaw_rads_cpf`	float	0	Left eye gaze yaw angle in radians in the CPF frame. The left yaw angle is the angle between the projection of the left eye gaze vector (originating at CPF) on the XZ plane and the Z axis in the CPF frame.
`right_yaw_rads_cpf`	float	0	Right eye gaze yaw angle in radians in the CPF frame. The right yaw angle is the angle between the projection of the left eye gaze vector (originating at CPF) on the XZ plane and the Z axis in the CPF frame.
`left_yaw_low_rads_cpf`	float	0	Lower bound of left eye gaze yaw angle in radians in CPF frame.
`right_yaw_low_rads_cpf`	float	0	Lower bound of the right eye gaze yaw angle in radians in the CPF frame.
`left_yaw_high_rads_cpf`	float	0	Upper bound of the left eye gaze yaw angle in radians in the CPF frame.
`right_yaw_high_rads_cpf`	float	0	Upper bound of the right eye gaze yaw angle in radians in the CPF frame.
`tx_left_eye_cpf`	float	0	Translation along the X direction from CPF origin to left eye position.
`ty_left_eye_cpf`	float	0	Translation along the Y direction from CPF origin to left eye position.
`tz_left_eye_cpf`	float	0	Translation along the Z direction from CPF origin to left eye position.
`tx_right_eye_cpf`	float	0	Translation along the X direction from CPF origin to right eye position.
`ty_right_eye_cpf`	float	0	Translation along the Y direction from CPF origin to right eye position.
`tz_right_eye_cpf`	float	0	Translation along the Z direction from CPF origin to right eye position.
`session_uid`	string	Unique identifier for a session within the VRS file	Same.

personalized_eye_gaze.csv

personalized_eye_gaze.csv outputs are only generated if the recording has in-session Eye Gaze Calibration data. The schema is exactly the same as general_eye_gaze.csv. The session_uids between general and personalized output will be the same.

The in-session calibration is used to compute user specific calibration (gaze correction parameters). The yaw and pitch values will be adjusted based on this calibration. If the instructions for in-session calibration are followed correctly, the calibrated eye gaze is expected to be more accurate compared to general eye gaze.

Personalized calibration parameters, pre March 2024 model

Four coefficients are used to generate the output [s_y, s_p, o_y, o_p]:

corrected_yaw = s_y * yaw + o_y
corrected_pitch = s_p * pitch + o_p

Personalized calibration parameters, new model

Six coefficients are used in the new model output [s_ly, s_ry, s_p, o_ly, o_ry, o_p]:

corrected_left_yaw = s_ly * left_yaw + o_ly
corrected_right_yaw = s_ry * right_yaw + o_ry
corrected_pitch = s_p * pitch + o_p

We also use a diopter delta parameter that corrects the depth:

corrected_depth = 1 / ((1 / depth) - diopter_delta)

The diopter delta is calculated during calibration (see Stage 2 below) is the median of difference between 1 / predicted_depth and 1 / groundtruth_depth values.

General Principles

The following principles apply to general_eye_gaze.csv and personalized_eye_gaze.csv

Confidence Intervals

The confidence intervals represent the models uncertainty estimation. A smaller interval represents higher confidence and a wider interval represents lower confidence. The confidence interval angles are in radians and in CPF frame. Some common factors that impact uncertainty include:

Blinking
Hair occluding the eye tracking cameras
Re-adjusting glasses or taking them off to clean them

For utility function to load the eye gaze in Python and C++, please check the code examples

Session_uid

When there are multiple users in the same vrs file (users handing off glasses to a different user without stopping the recording), session_uid identifies intervals corresponding to different calibration sessions if in-app calibration is performed during the hand-offs.

All the rows with the same session_uid belong to the same session within the VRS file
If there are multiple calibration sessions, the session_uid would be unique for each session

general_eye_gaze.csv

There will be a single value when there is no in-session eye calibration or only one in-session calibration
The session_uid column values will always match those in personalized_eye_gaze.csv

Examples

No calibrated eye gaze - general_eye_gaze will have one session_uid across all rows
One in-session calibration - general_eye_gaze will have one session_uid across all rows and this value will be identical in personalized_eye_gaze
k > 1 in-session calibrations - both general and calibrated eye gaze will have k unique session_uid that start when in-session calibration begins and this value will be identical in personalized_eye_gaze

summary.json

The summary.json file provides a high level overview of the output for each of the major stages. This is similar to the operator summary output from the MPS location pipeline.

For each stage of the ET pipeline, there will be one section in this file. If the section is missing, that means that the stage is not applicable or was not run.

Stage 1: GazeInference (all recordings)

Uncalibrated Eye Gaze derived data has been generated. If you’re able to download the data to view the .json file it will say SUCCESS.

Name	Type	Description
status	string	SUCCESS (if you are able to download the data and view this file)
message	string	Any further details, if available

Stage 2: InSessionCalibration (if in-session calibration available)

If the recording contains one or more valid in-session calibration intervals, the ET pipeline will compute the calibration parameters.

Each calibration session found in the VRS file will generate the following information:

Name	Type	Description
status	string	SUCCESS / FAIL
message	string	Any further details, if available
session_uid	string	Unique ID representing the session
start_time_us	int	When the first wearer starts using the Aria glasses, or when subsequent wearer begins in-session calibration (2nd eye calibration onwards)
end_time_us	int	When a wearer session or recording ends
params	Array[float]	The calibration parameters (4 floats)

note

The status should be SUCCESS, unless there was an issue where the wearer began the in-session calibration, but did not generate the necessary data. In this case it would FAIL.

Stage 3: CalibrationCorrection

If Stage 2 has been successful, CalibrationCorrection will contain details about calibrated eye gaze. For each calibration session, we will output the following information:

Name	Type	Description
status	string	SUCCESS / FAIL
message	string	Any further details, if available
session_uid	string	Unique id representing the session
generalized_gaze_error_rads	dict	General gaze error in radians
calibrated_gaze_error_rads	dict	Calibrated gaze error in radians

If the previous stages completed successfully, the status for this stage should always be SUCCESS.

Example summary.json files

Scenario 1: No calibration available

This report is quite short, as no in-session calibration data is available. Eye Gaze MPS was successfully created:

{
    "GazeInference": {
        "status": "SUCCESS"
    }
}

Scenario 2: In-session calibration available

In this example, there were multiple calibration sessions:

In session one calibration was completed successfully
In session two, the user began the in-session calibration, but did not generate the necessary data.

{
    "GazeInference": {
        "status": "SUCCESS"
    },
    "InSessionCalibration": [
        {
            "Status": "SUCCESS",
            "session_uid": "01ac9bf2-334a-49c6-9dc6-fdc07ab08a2a",
            "message": "",
            "start_time_us": 147588973,
            "end_time_us": 208304973,
            "num_calibu_frames": 1000,
            "parameters":[1.02361481, 1.05426864, 0.01158671, 0.01403982]
        },
        {
            "Status": "FAIL",
            "message": "Couldn't compute GT gaze vectors for the interval [487241235, 508304973]",
            "session_uid": "6063bf11-84ef-4ed5-a785-ac44b4328fdc",
            "start_time_us": 487241235,
            "end_time_us": 508304973,
            "num_calibu_frames": 10,
        }
    ],
    "CalibrationCorrection": [
        {
            "status": "SUCCESS",
            "message": "",
            "session_uid": "01ac9bf2-334a-49c6-9dc6-fdc07ab08a2a",
            "generalized_gaze_error_rads": {
                "mean": 0.047444001119500284,
                                     "std": 0.015775822542178554,

                                     "min": 0.009264740570696107,

                                     "max": 0.16895371875829926,

                                     "p25": 0.036160872560797655,

                                     "p50": 0.04529629090291307,

                                     "p75": 0.05761677117669144,

                                     "p95": 0.0675233675673802
            },
            "calibrated_gaze_error_rads": {
                "mean": 0.037444001119500284,

                                     "std": 0.005775822542178554,

                                     "min": 0.006364740570696107,

                                     "max": 0.06835371875829926,

                                     "p25": 0.026060872560797655,

                                     "p50": 0.02519329090291307,

                                     "p75": 0.03760677117669144,

                                     "p95": 0.0474232675673802
            }
        },
        {
            "status": "FAILURE",
            "message": "No calibration available for this session",
            "session_uid": "6063bf11-84ef-4ed5-a785-ac44b4328fdc"
        }    ]
}

Eye Gaze Data Format​

What's the difference between eye tracking and eye gaze?​

Further resources​

general_eye_gaze.csv​

personalized_eye_gaze.csv​

Personalized calibration parameters, pre March 2024 model​

Personalized calibration parameters, new model​

General Principles​

Confidence Intervals​

Session_uid​

Examples​

summary.json​

Stage 1: GazeInference (all recordings)​

Stage 2: InSessionCalibration (if in-session calibration available)​

Stage 3: CalibrationCorrection​

Example summary.json files​

Scenario 1: No calibration available​

Scenario 2: In-session calibration available​