Temporal Alignment of Aria Sensor Data

This page provides an overview of how Project Aria data is temporally aligned and provides information about how to finely align IMU, barometer and magnetometer data.

Go to Timestamps in Aria VRS for how Aria data is formatted. Go to Project Aria Device Timestamping for how the hardware is configured.

tip

We recommend using Trajectory MPS outputs instead of raw IMU data wherever possible. Go to MPS Code Snippets for how to load open loop or closed loop trajectory data.

Temporal Alignment

Although each sensor data pieces are timestamped in the same time domain, there might be an offset between the instant represented by their timestamps and the actual instant for which the enclosed measurement is valid. For proper sensor fusion where precise temporal alignment of the data is required, we must take this offset into account. We need an estimate of this offset for each sensor.

Per sensor time offset : $\text{dt}_{\text{Device}}^{\text{Sensor}}$

Since the SLAM (mono scene) camera timestamps are direct measurements of a well defined instant (the center of exposure of our image sensor), we choose the SLAM camera timestamps as the reference timestamp and define $\text{dt}_{\text{Device}}^{\text{Slam}} = 0$ . The same goes for the RGB camera : $\text{dt}_{\text{Device}}^{\text{RGB}} = 0$ . Note that the next subsection describes in more details the temporal aspect of the image formation model.

For IMUs, the time offset is estimated in the factory by our camera-imu calibration process and is so that the following relation approximately holds (after IMU intrinsics compensation):

\hat{a}_{t_{\text{Device}}} = \tilde{a}\left(t_{\text{Device}} + \text{dt}_{\text{Device}}^{\text{Accel}} + 0.5\Delta_t \right)

where $t_{\text{Device}}$ is a timestamp in the device time domain, $\hat{a}_{t_{\text{Device}}}$ is an estimate of the true acceleration at the instant represented by the timestamp $t_{\text{Device}}$ , $\text{dt}_{\text{Device}}^{\text{Accel}}$ is the estimate of the time offset, $\Delta_t$ is the sampling period of the sensor. Finally, the operator $t\xrightarrow{}\tilde{a}(t)$ is a temporal interpolation of the compensated IMU sample time series $(t_k, \tilde{a}_k)$ . Note the appearance of $0.5\Delta_t$ in previous equation, which stems from internal implementation choice. The same relation exists for the gyrometer.

For magnetometer, we estimate $\text{dt}_{\text{Device}}^{\text{Mag}}$ to be around $+5ms$ with the following relation:

\hat{B}_{t_{\text{Device}}} = \tilde{B}\left(t_{\text{Device}} + \text{dt}_{\text{Device}}^{\text{Mag}} \right)

Where notation are similar as above with $\hat{B}$ representing the magnetic field vector.

For barometer, audio signal and GPS data, such an offset is undetermined.

Images formation temporal model: rolling shutter and PLS artifact

In practice, the images obtained from our environment facing camera are not well described as captured at a unique timestamp for the most demanding applications. First, the regular exposure duration window can range from 19us up to 14ms, second the RGB sensor has a rolling shutter, where each row is being captured at different time per design, finally the SLAM cameras, even if specified as global shutter sensors are impacted by a row-dependent parasitic light sensitivity: a proportion of the photons forming the image are captured outside of the regular exposure window. This proportion depends on the row.

For the RGB sensor, we characterize the rolling shutter behavior through the read-out time $\Delta \text{dt}_{\text{readout}}$ , we define is as the time between the readout of the first row and the readout of the last row. This time depends on the sensor binning configuration, that can differ on a recording basis. The readout time is specified to be $16.26ms$ for the full resolution ( $2880\times2880$ ) and $5ms$ for the binned/cropped configuration ( $1408\times1408$ ). We can account for rolling shutter by assigning a center of exposure timestamp to each pixel with the following formulae:

t_{\text{Device}}(\boldsymbol{p}) = t_{\text{Device}}^{\text{Image}} + \left(\frac{\text{row}(\boldsymbol{p})}{H} - 0.5\right) \cdot \Delta \text{dt}_{\text{readout}}

Where $t_{\text{Device}}(\boldsymbol{p})$ is the timestamp of the observation at pixel $\boldsymbol{p}$ , $t_{\text{Device}}^{\text{Image}}$ is the device timestamp of the image assigned by the MCU, $\boldsymbol{p}\xrightarrow{}\text{row}(\boldsymbol{p})$ is the projection of the pixel coordinate on the image dimension aligned with sensor rows, $H$ is the size of the image along this dimension, $\Delta \text{dt}_{\text{readout}}$ is the readout time value mentioned above.

For even more demanding applications, one might need to compensate timestamp of pixels observation for Slam cameras too. On Aria camera sensors, readout of the image is done row by row. Each row can still accumulate charges after the regular exposure time and until it is fully read out, an effect sometimes called parasitic light sensitivity (PLS). The readout time of the last row $\Delta \text{dt}_{\text{pls}}$ , i.e. the time a pixel on the last row still accumulates electrons before being discharged is specified by the manufacturer as being 9.12ms. The ratio $S_{\text{pls}}$ of the sensitivity during readout over the sensitivity during regular exposure was estimated to be ~0.01, instead of the ideal 0 value. From this, it results that when dealing with pixel observation time, it might be necessary to take effect into account by assigning to each pixel their effective center of exposure, we suggest the the following formulae:

t_{\text{Device}}(\boldsymbol{p}) = t_{\text{Device}}^{\text{Image}} + \frac{1}{2} \cdot r\left(\boldsymbol{p}\right) \cdot S_{\text{pls}} \cdot \frac{1 + r_e\left({\boldsymbol{p}}\right)}{1 + r_e\left({\boldsymbol{p}}\right) \cdot S_{\text{pls}}}

Where $t_{\text{Device}}(\boldsymbol{p})$ represents the time of the pixel observation (effective center of exposure), $r\left({\boldsymbol{p}}\right)$ represents the readout time of the current pixel $\boldsymbol{p}$ and is computed as $\Delta \text{dt}_{\text{pls}} \frac{\text{row}(\boldsymbol{p})}{H}$ , $S_{\text{pls}}$ represents the sensitivity ratio of the readout phase over the exposure phase, $r_e\left({\boldsymbol{p}}\right)$ represents the ratio of the current row readout time over the regular exposure duration.

Temporal Alignment​

Per sensor time offset : dtDeviceSensor\text{dt}_{\text{Device}}^{\text{Sensor}}dtDeviceSensor​​

Images formation temporal model: rolling shutter and PLS artifact​

Temporal Alignment

Per sensor time offset : $\text{dt}_{\text{Device}}^{\text{Sensor}}$

Images formation temporal model: rolling shutter and PLS artifact