Foundations

This section provides a comprehensive exploration of the core concepts and architectural foundations that power the Agents Research Environments (ARE). Understanding these fundamentals is essential for effectively using ARE and creating your own scenarios.

Overview

ARE is a research-focused environment designed to simulate complex, realistic tasks that span several minutes and require multiple steps to complete. Unlike static simulation frameworks, ARE provides a dynamic, ever-evolving setting where the state of the environment changes over time and new information is continuously introduced. Today, agent evaluation is often constrained by several limitations:

  • There is currently no open environments with a built-in reward signal beyond web-based environments such as GAIA or BrowserComp.

  • Most existing benchmarks focus on narrow, domain-specific tasks intended for expert systems rather than general agents.

  • Many simulations are not grounded in real user scenarios or everyday tasks that reflect actual human workflows.

  • There is no flexible environment for testing new agents directly on real-world applications or APIs

Meta Agents Research Environments is designed to address these gaps by offering strong foundations and abstractions that are general enough to support a wide range of tasks and benchmarks. By doing so, it aims to make it easier for researchers to evaluate new directions in agentic research in settings that better reflect the complexity and unpredictability of the real world. To finish, ARE offers a clear decoupling between Environment and Agent, which is not always clear in recent benchmarks, by offering a ARE interface for agents.

High-level View of ARE Abstractions

ARE is built around 4 fundamental concepts that work together to create dynamic, realistic simulations. Understanding these concepts is essential for effectively using ARE and creating your own scenarios. These 4 concepts are:

  • Environment: The environment in which agents and users interact and collaborate.

  • Apps: Similar to a desktop or mobile phone, the environment can support a set of apps that users use in their daily life, which the agent can interact with. People can add their own apps, and it is also possible having only one or even no apps beyond the internal ones (e.g. for chat or assistant tasks).

  • Events: Central to ARE and makes it dynamic. In ARE, everything is an event, meaning any action the user, the agent takes or even external events generated by the environment itself, are events. The execution of the environment is based on these events for dynamics, condition execution and validation.

  • Scenarios: A scenario starts from an an initial state, and some events are scheduled to happen at some point (the first event is usually a user message that gives the task to the agent). Each scenario has its own validation/verification that states how to evaluate the agent trajectory wrt the user request and the env state.

The diagram below shows a high level-view of the ARE Abstractions interacting with each other:

ARE is an event-based environment in which we can load scenarios. All interactions with the environment, whether its an agent or a user, are done via the same interfaces.

The Environment is the central unit in ARE. It manages registered Apps, maintains the global simulation timeline, and coordinates all interactions through an internal event loop. The environment provides the necessary abstractions to decouple completely the agent from the environment by offering a ARE Interface to connect external actors (Agents, Users, …). Key responsibilities of the environment include:

  • Instantiating and registering Apps and their exposed APIs as tools available to the agent.

  • Managing the flow of time in discrete increments, evaluating event and scheduling execution accordingly.

  • Recording all executed Events in an immutable log for later inspection and evaluation.

  • Exposing the current state of the simulation at any point in time.

The Environment runs its event loop on a separate thread, ensuring that event processing and time progression do not block the main agent process, just like an agent would be deployed in a real-world scenario.

Core Concepts

Meta Agents Research Environments is built around fundamental concepts that work together to create dynamic, realistic simulations. Understanding these concepts is essential for effectively using the system and creating your own scenarios.

Agents

Agents are the AI entities that interact with the environment to complete tasks. They serve as the intelligent actors in your simulations, capable of reasoning, planning, and taking actions through available tools.

Event-Based Agent Architecture

Agents are built around an event-based architecture that enables them to operate seamlessly within the dynamic simulation environment:

  • Turn-Based Execution: Agents operate in discrete turns, responding to user messages and environment notifications as they arrive

  • Notification-Driven: Agents wait for and react to notifications from the environment, including user messages, system events, and environment changes

  • Synchronous Processing: Each agent turn runs to completion before processing the next set of notifications, ensuring predictable behavior

  • Event Logging: All agent actions, tool calls, and observations are logged as events for analysis and replay

  • Message Queue Integration: Agents continuously monitor a notification system for new tasks and environmental changes

ReAct Framework

Agents operate using a ReAct (Reasoning + Acting) framework that cycles through:

  1. Think: Agent analyzes the current situation and plans next steps

  2. Act: Agent executes actions using available tools (app APIs)

  3. Observe: Agent processes the results and updates its understanding

This iterative process continues until the task is completed or termination conditions are met.

Key Capabilities

  • Task Execution: Understanding and completing assigned objectives

  • Tool Usage: Interacting with apps through their exposed APIs

  • Reasoning: Making decisions based on available information and context

  • Learning: Adapting behavior based on feedback and results

  • Communication: Sending messages to users and logging progress

Getting Started with Agents

For most use cases, we recommend using the default agent by specifying the --agent default parameter when running CLI commands. This provides:

  • Robust ReAct loop implementation

  • Comprehensive logging and error handling

  • Flexible tool integration

  • Customizable system prompts and behavior

The default agent will automatically handle the event-based interactions with the environment, logging all actions and observations for later analysis.

For more details about the default agent implementation, see the ARE_react_json_agent function in ARE/agents/default_agent/agent_factory.py. You can also learn more about customizing agents in Agents API and explore practical examples in the tutorials.

Universes

Universes represent distinct, fully-populated instantiations of the simulated environment, each reflecting a specific user’s digital world. They provide the realistic foundation from which scenarios emerge, containing comprehensive synthetic data that mirrors authentic usage patterns.

Core Concept

Universes embody a data-first approach to scenario development:

  • Rich Environmental Foundation: Each universe contains realistic application content including message histories, email threads, calendar events, and contacts

  • Persona-Driven: Built around detailed user personas that ensure coherence across all applications and data

  • Scenario Foundation: Well-populated universes naturally inspire compelling use cases and challenges for agents

  • Reusable Context: Multiple scenarios can emerge from a single universe without extensive manual environment construction

Relationship to Scenarios

  • Universe: Static initial state with all environmental setup and baseline data

  • Scenario: Dynamic simulation that evolves over time, beginning from a universe’s state at t=0

For comprehensive details about universe generation and the data-first approach, see Universes.

Scenarios

Scenarios are complete tasks given to agents within the simulation environment. They combine universes and all the previous concepts into cohesive, evaluable challenges that unfold dynamically over time.

Scenario Components

Every scenario consists of five main components:

  1. Apps: The applications available to the agent

  2. Data: Initial state and content populating the apps and environment (often from a universe)

  3. Events: Dynamic occurrences that happen during the scenario

  4. Task: A clear prompt defining what the agent needs to accomplish

  5. Validation Function: Logic to determine if the task was completed successfully

Dynamic Nature

Unlike static benchmarks, scenarios are dynamic and potentially require multiple agent steps or user interactions to complete.

  • The environment state evolves during execution

  • Agents must explore to gather necessary information

  • Tasks may not be self-contained and require environmental interaction

  • Multiple complexity levels can be designed for the same scenario

Scenario Format

the Agents Research Environments supports two main scenario formats:

Learn more through Scenarios for comprehensive details on scenario anatomy, evaluation goals, and creation processes.

Environment

The Environment is the core system that orchestrates the entire simulation. It acts as the central coordinator that manages all components and ensures the simulation runs smoothly.

Core Responsibilities

The environment is responsible for:

  • App Management: Registering apps and handling their API call events

  • Simulation Control: Starting, pausing, and stopping the simulation

  • Time Management: Managing the flow of time and events through an event loop

  • Event Processing: Checking the event queue and processing events at each tick

  • Event Logging: Adding completed events to the event log

  • State Management: Providing the current state of the simulation at any given step

Event Loop

The environment operates through a discrete time event simulation, which is essentially a while loop where time ticks each time_increment_in_seconds. Within each tick, the event loop:

  1. Checks Event Triggers: Determines if any event_triggers need to be fired

  2. Processes Events: Checks the event_queue for events that need to be processed and processes them

  3. Advances Time: Moves time forward to the next tick

Important

This event_loop runs in a thread separate from the main thread, which means event processing happens in the background and does not block the main thread. For example, an Agent can be running, solving a task, and calling tools while the event_loop handles how the environment should change in parallel.

Important

The simulated environment does not run in real time, it will simulate time and can run long simulations over a short period of time. This lets you run scenarios that might have events over weeks or months in a matter of minutes.

The discrete time approach ensures predictable and reproducible simulations while allowing complex interactions between agents and the dynamic environment.

Apps

Apps are interactive applications that function similarly to apps on your phone. They provide specific functionality and expose APIs that agents can use as tools to interact with the environment.

Key Characteristics

  • Data Population: Each app contains relevant data for its domain

  • API Exposure: Apps provide APIs that agents can call as tools

  • Event Registration: App interactions generate events that are logged

  • Extensibility: Anyone can build custom apps and integrate them into the platform

Common App Types

The platform includes various built-in apps:

  • Email Client: Send, receive, and manage emails

  • File System: Navigate and manipulate files and directories

  • Calendar: Schedule and manage appointments

  • Messaging: Send and receive messages

  • Shopping: Browse products and make purchases

Learn more about apps through App Implementation Tutorial for hands-on examples, or explore Apps for comprehensive details on their stateful design and creation.

Events

Events are the dynamic elements that make environments evolve over time. They represent things that happen in the simulation and can be triggered in various ways.

Event Types

Events can be categorized by their origin:

  • Scheduled Events: Happen at predefined times in the simulation

  • Triggered Events: Fire when specific conditions are met (with optional delays)

  • Agent-Initiated Events: Result from agent actions through API calls

Event Categories

  • EventType.AGENT: Events initiated by agent tool calls

  • EventType.ENV: Events defined in scenario scripts

  • EventType.USER: Events simulating user interactions

Event Management

The environment manages events through two main data structures:

  • Event Queue: Stores future events waiting to be processed

  • Event Log: Contains the history of completed events

Event Graphs

ARE supports Event Graphs - DAG (Directed Acyclic Graph) representations that enable complex scenario design with:

  • Event Dependencies: Chain events with timing relationships

  • Condition Monitoring: Check environment state and trigger responses

  • Validation Logic: Verify that agents complete expected actions

Here’s an example of a simple event dependency chain after 32 seconds of simulation:

Basic Event Graph Tutorial

For more complex scenarios, event graphs can represent sophisticated dependency chains:

Complex Event Graph DAG

This complex graph shows how multiple events can be interconnected with various dependencies, timing constraints and validation, allowing for realistic and sophisticated scenario design.

Learn more about events through Events API Reference for technical API details, or explore Events for comprehensive coverage of the event-driven architecture and lifecycle.

LLM Inference

LLM Inference powers the AI agents in ARE through flexible integration with various language model providers. Agents use LLMs for reasoning, planning, and generating responses within the simulation environment.

Role in ARE

LLMs serve as the core intelligence layer for agents, enabling them to:

  • Understand Tasks: Process user requests and scenario context to determine what needs to be accomplished

  • Reason About Actions: Analyze available tools and environment state to make informed decisions

  • Generate Tool Calls: Create appropriate API calls to interact with apps and modify the environment

  • Adapt to Changes: Respond to dynamic events and environmental updates throughout scenario execution

Flexible Provider Support

ARE integrates with multiple LLM providers through LiteLLM, supporting:

  • Hosted APIs: Including Llama API, Hugging Face providers, and commercial services

  • Local Models: Self-hosted deployments for privacy and cost control

  • Custom Endpoints: Integration with private or specialized model deployments

The system automatically handles provider-specific configurations and API differences, allowing you to focus on agent behavior rather than infrastructure details.

For detailed configuration instructions, provider setup, and CLI usage examples, see LLM Configuration Guide.

Notifications

Notifications serve as the secondary interface between agents and their environment, complementing the primary tool-based interaction model. Similar to a mobile device notification system, this framework alerts agents to important environmental changes without requiring constant monitoring.

Notification System Architecture

The notification system operates as a selective observability mechanism that implements partial rather than complete environmental awareness:

  • Filtered Information Flow: Not every event generates a notification - the system filters based on relevance and configured policies

  • Configurable Verbosity: Three levels (LOW, MEDIUM, HIGH) control how much environmental activity becomes visible to agents

  • Pull-Based Interaction: Agents retrieve notifications at the beginning of each step, integrating them into their context

  • Priority Queue: Notifications are ordered by timestamp to ensure temporal consistency

Integration with Agent Workflow

Notifications are injected into the agent’s context at each ReAct step:

  1. Environment Processing: Events occur and are filtered through the notification policy

  2. Queue Management: Relevant events are added to the notification queue with timestamps

  3. Agent Integration: At each agent step, pending notifications are retrieved and added to context

  4. Contextual Awareness: Agents can respond to environmental changes they might otherwise miss

This system enables agents to maintain awareness of dynamic environmental changes while focusing on their primary tasks, creating more realistic and responsive agent behavior.

For detailed information about the notification system architecture and implementation, see Notifications.

Understanding these core concepts provides the foundation for effectively using the platform, whether you’re running existing scenarios, creating benchmarks, or developing your own custom content.

Learn the Foundations

To fully understand the framework, it’s essential to grasp its core concepts. The following subsections provide a comprehensive explanation of the foundations of the framework.

We highly encourage that you read through them, starting with Apps.

Once you have a solid understanding of the core concepts, you can move on to the next section, which covers the practical aspects of using the Meta Agents Research Environments.

Next Steps

Now that you understand the core concepts:

For hands-on examples and tutorials, see the practical examples in the ARE repository’s tutorials/ directory.