Skip to content


Project Aria Pilot Dataset

The Aria Pilot Dataset is a collection of 159 sequences captured using Project Aria, to accelerate the state of machine perception and AI.

A multi-purpose egocentric dataset created using Project Aria

The Aria Pilot Dataset and accompanying tools provides researchers in computer vision access to anonymized Aria sequences, captured in a variety of scenarios, such as cooking, playing games, or exercising. In addition to ‘Everyday Activities’, the dataset also includes ‘Desktop Activities’ captured with a multi-view motion capture system, helping to accelerate research into human-object interactions.

We believe this dataset will provide a baseline for external researchers to build and foster reproducible research on egocentric Computer Vision and AI/ML algorithms for scene perception, reconstruction and understanding.

Learn more about Project Aria
Sensor Data
  • 1 x 110 degree FOV Rolling Shutter RGB camera
  • 2 x 150 degree FOV Global Shutter mono cameras for SLAM and hand tracking
  • 2 x 80 degree FOV Global Shutter mono cameras for eye-tracking with IR illumination
  • 2 x 1KHz IMU + barometer & magnetometer environmental sensors
  • 7 x 48 KHz spatial microphones
  • GPS 1 Hz
Image of Project Aria's multi-purpose egocentric dataset
How is it annotated?

Automatic and manual annotations

In addition to providing sensor data from Project Aria, the Pilot Dataset also contains derived results from machine perception services which provide additional context to the spatial-temporal reference frames.

Multi-user poses in shared reference frame

In addition to providing per-frame trajectory for every recording, sequences captured within the same environment have been aligned to the same reference-frame, allowing those sequences to be understood within the same context.

Image of camera calibration parameters

Camera calibration

For a high-quality egocentric dataset, it is essential to understand how cameras perceive the world. The Project Aria Pilot Dataset provides full camera calibration parameters, including both intrinsics and extrinsics of every sensor.

Image of project aria pilot providing precise time-alignment between sequences

Multi-view motion capture

To facilitate research into human-object interactions, the Aria Pilot Dataset includes a subset of “Desktop Activities” captured using a multi-view motion capture system.

Image of shared reference frame

Multi-device time sync

In addition to aligning the trajectories of sequences captured within the same environment, the Project Aria Pilot Dataset also provides precise time-alignment between sequences captured simultaneously.


For sequences where actors speak, we provide speech-to-text annotation. This supports egocentric communications research, such as predicting turn-taking in conversations and multi-speaker transcription.

Image of Project Aria eye-tracking

Calibrated eye-gaze

Using data from Project Aria’s eye-tracking cameras, the Pilot dataset includes an estimate of the wearer’s eye-gaze. This can be used to accelerate research into user-object interactions.

How will the dataset be used?

Accelerating the state of Machine Perception and Artificial Intelligence

The Project Aria Pilot dataset consists of 159 sequences, which can be used to unlock several areas of research for progressing the state of machine perception and AI, including camera relocalization, and scene reconstruction.

Studying these research areas is crucial for researchers to engage with the challenges associated with AR devices.

Image of Machine Perception and Artificial Intelligence


To demonstrate a few representative scenarios in all-day-long activities with always-on sensing, multiple sequences were recorded using actors in five locations across USA.

Access Project Aria Pilot Dataset and accompanying Tools

If you are a researcher in AI or ML research, enter your email here to access to the Project Aria Pilot Dataset and accompanying tools.

By submitting your email and accessing the Project Aria Pilot Dataset, you agree to abide by the dataset license agreement and to receive emails in relation to the dataset.