Speech2Text Output Data

caution

Aria Pilot Dataset documentation is stored in Archive: Aria Data Tools, because it was Project Aria's first open source initiative and it uses a different data structure compared to our latest open releases. For the most up to date tooling and to find out about our other open datasets go to Project Aria Tools.

This website will be deleted in September 2024.

Speech2Text Output Data

Speech2Text Output Data provides text strings generated by Automatic Speech Recognition with timestamps and confidence rating.

Each recording has two .csv files that are the same, except speech2text/speech.csv uses the wav file time domain and speech2text/speech_aria_domain.csv uses Aria time domain.

Table 1: speech.csv Structure

startTime_ms	endTime_ms	written	confidence
54040	55040	I’m	0.25608
72920	73920	looking	0.84339

Note: token in wav file time domain (start = 0)

Table 2: speech_aria_domain.csv Structure

startTime_ns	endTime_ns	written	confidence
56511040	56512040	I’m	0.25608
56529920	56530920	looking	0.84339

Note: token in Aria file time domain (start = 0)