This section provides a comprehensive exploration of the core concepts and architectural foundations that power the Agents Research Environments (ARE).
Understanding these fundamentals is essential for effectively using ARE and creating your own scenarios.
ARE is a research-focused environment designed to simulate complex, realistic tasks that span several minutes and require multiple steps to complete.
Unlike static simulation frameworks, ARE provides a dynamic, ever-evolving setting where the state of the environment changes over time and new information
is continuously introduced.
Today, agent evaluation is often constrained by several limitations:
There is currently no open environments with a built-in reward signal beyond web-based environments such as GAIA or BrowserComp.
Most existing benchmarks focus on narrow, domain-specific tasks intended for expert systems rather than general agents.
Many simulations are not grounded in real user scenarios or everyday tasks that reflect actual human workflows.
There is no flexible environment for testing new agents directly on real-world applications or APIs
Meta Agents Research Environments is designed to address these gaps by offering strong foundations and abstractions that are general enough to support a wide range of tasks and benchmarks.
By doing so, it aims to make it easier for researchers to evaluate new directions in agentic research in settings that better reflect the complexity and unpredictability
of the real world.
To finish, ARE offers a clear decoupling between Environment and Agent, which is not always clear in recent benchmarks, by offering a ARE interface for agents.
ARE is built around 4 fundamental concepts that work together to create dynamic, realistic simulations. Understanding these concepts is essential for effectively
using ARE and creating your own scenarios. These 4 concepts are:
Environment: The environment in which agents and users interact and collaborate.
Apps: Similar to a desktop or mobile phone, the environment can support a set of apps that users use in their daily life, which the agent can interact with. People can add their own apps, and it is also possible having only one or even no apps beyond the internal ones (e.g. for chat or assistant tasks).
Events: Central to ARE and makes it dynamic. In ARE, everything is an event, meaning any action the user, the agent takes or even external events generated by the environment itself, are events. The execution of the environment is based on these events for dynamics, condition execution and validation.
Scenarios: A scenario starts from an an initial state, and some events are scheduled to happen at some point (the first event is usually a user message that gives the task to the agent). Each scenario has its own validation/verification that states how to evaluate the agent trajectory wrt the user request and the env state.
The diagram below shows a high level-view of the ARE Abstractions interacting with each other:
The Environment is the central unit in ARE. It manages registered Apps, maintains the global simulation timeline, and coordinates all interactions through an
internal event loop.
The environment provides the necessary abstractions to decouple completely the agent from the environment by offering a ARE Interface to connect external
actors (Agents, Users, …). Key responsibilities of the environment include:
Instantiating and registering Apps and their exposed APIs as tools available to the agent.
Managing the flow of time in discrete increments, evaluating event and scheduling execution accordingly.
Recording all executed Events in an immutable log for later inspection and evaluation.
Exposing the current state of the simulation at any point in time.
The Environment runs its event loop on a separate thread, ensuring that event processing and time progression do not block the main agent process,
just like an agent would be deployed in a real-world scenario.
Meta Agents Research Environments is built around fundamental concepts that work together to create dynamic, realistic simulations.
Understanding these concepts is essential for effectively using the system and creating your own scenarios.
Agents
Agents are the AI entities that interact with the environment to complete tasks.
They serve as the intelligent actors in your simulations, capable of reasoning, planning, and taking actions through available tools.
Event-Based Agent Architecture
Agents are built around an event-based architecture that enables them to operate seamlessly within the dynamic simulation environment:
Turn-Based Execution: Agents operate in discrete turns, responding to user messages and environment notifications as they arrive
Notification-Driven: Agents wait for and react to notifications from the environment, including user messages, system events, and environment changes
Synchronous Processing: Each agent turn runs to completion before processing the next set of notifications, ensuring predictable behavior
Event Logging: All agent actions, tool calls, and observations are logged as events for analysis and replay
Message Queue Integration: Agents continuously monitor a notification system for new tasks and environmental changes
ReAct Framework
Agents operate using a ReAct (Reasoning + Acting) framework that cycles through:
Think: Agent analyzes the current situation and plans next steps
Act: Agent executes actions using available tools (app APIs)
Observe: Agent processes the results and updates its understanding
This iterative process continues until the task is completed or termination conditions are met.
Key Capabilities
Task Execution: Understanding and completing assigned objectives
Tool Usage: Interacting with apps through their exposed APIs
Reasoning: Making decisions based on available information and context
Learning: Adapting behavior based on feedback and results
Communication: Sending messages to users and logging progress
Getting Started with Agents
For most use cases, we recommend using the default agent by specifying the --agentdefault
parameter when running CLI commands. This provides:
Robust ReAct loop implementation
Comprehensive logging and error handling
Flexible tool integration
Customizable system prompts and behavior
The default agent will automatically handle the event-based interactions with the environment, logging all actions and observations for later analysis.
For more details about the default agent implementation, see the ARE_react_json_agent function in ARE/agents/default_agent/agent_factory.py.
You can also learn more about customizing agents in Agents API and explore practical examples in the tutorials.
Universes
Universes represent distinct, fully-populated instantiations of the simulated environment, each reflecting a specific user’s digital world.
They provide the realistic foundation from which scenarios emerge, containing comprehensive synthetic data that mirrors authentic usage patterns.
Core Concept
Universes embody a data-first approach to scenario development:
Rich Environmental Foundation: Each universe contains realistic application content including message histories, email threads, calendar events, and contacts
Persona-Driven: Built around detailed user personas that ensure coherence across all applications and data
Scenario Foundation: Well-populated universes naturally inspire compelling use cases and challenges for agents
Reusable Context: Multiple scenarios can emerge from a single universe without extensive manual environment construction
Relationship to Scenarios
Universe: Static initial state with all environmental setup and baseline data
Scenario: Dynamic simulation that evolves over time, beginning from a universe’s state at t=0
For comprehensive details about universe generation and the data-first approach, see Universes.
Scenarios
Scenarios are complete tasks given to agents within the simulation environment. They combine universes and all the previous concepts into cohesive, evaluable challenges that unfold dynamically over time.
Scenario Components
Every scenario consists of five main components:
Apps: The applications available to the agent
Data: Initial state and content populating the apps and environment (often from a universe)
Events: Dynamic occurrences that happen during the scenario
Task: A clear prompt defining what the agent needs to accomplish
Validation Function: Logic to determine if the task was completed successfully
Dynamic Nature
Unlike static benchmarks, scenarios are dynamic and potentially require multiple agent steps or user interactions to complete.
The environment state evolves during execution
Agents must explore to gather necessary information
Tasks may not be self-contained and require environmental interaction
Multiple complexity levels can be designed for the same scenario
Scenario Format
the Agents Research Environments supports two main scenario formats:
JSON Scenarios: Ready-to-use scenarios like the Gaia2 benchmark dataset - see Scenario JSON Format
Python Scenarios: Custom scenarios with full programming control - see Working with Scenarios
Learn more through Scenarios for comprehensive details on scenario anatomy, evaluation goals, and creation processes.
Environment
The Environment is the core system that orchestrates the entire simulation.
It acts as the central coordinator that manages all components and ensures the simulation runs smoothly.
Core Responsibilities
The environment is responsible for:
App Management: Registering apps and handling their API call events
Simulation Control: Starting, pausing, and stopping the simulation
Time Management: Managing the flow of time and events through an event loop
Event Processing: Checking the event queue and processing events at each tick
Event Logging: Adding completed events to the event log
State Management: Providing the current state of the simulation at any given step
Event Loop
The environment operates through a discrete time event simulation, which is essentially a while loop where time ticks each time_increment_in_seconds.
Within each tick, the event loop:
Checks Event Triggers: Determines if any event_triggers need to be fired
Processes Events: Checks the event_queue for events that need to be processed and processes them
Advances Time: Moves time forward to the next tick
Important
This event_loop runs in a thread separate from the main thread, which means event processing happens in the background and does not block the main thread.
For example, an Agent can be running, solving a task, and calling tools while the event_loop handles how the environment should change in parallel.
Important
The simulated environment does not run in real time, it will simulate time and can run long simulations over a short period of time. This lets you run
scenarios that might have events over weeks or months in a matter of minutes.
The discrete time approach ensures predictable and reproducible simulations while allowing complex interactions between agents and the dynamic environment.
Apps
Apps are interactive applications that function similarly to apps on your phone.
They provide specific functionality and expose APIs that agents can use as tools to interact with the environment.
Key Characteristics
Data Population: Each app contains relevant data for its domain
API Exposure: Apps provide APIs that agents can call as tools
Event Registration: App interactions generate events that are logged
Extensibility: Anyone can build custom apps and integrate them into the platform
Common App Types
The platform includes various built-in apps:
Email Client: Send, receive, and manage emails
File System: Navigate and manipulate files and directories
Calendar: Schedule and manage appointments
Messaging: Send and receive messages
Shopping: Browse products and make purchases
Learn more about apps through App Implementation Tutorial for hands-on examples, or explore Apps for comprehensive details on their stateful design and creation.
Events
Events are the dynamic elements that make environments evolve over time. They represent things that happen in the simulation and can be triggered in various ways.
Event Types
Events can be categorized by their origin:
Scheduled Events: Happen at predefined times in the simulation
Triggered Events: Fire when specific conditions are met (with optional delays)
Agent-Initiated Events: Result from agent actions through API calls
Event Categories
EventType.AGENT: Events initiated by agent tool calls
EventType.ENV: Events defined in scenario scripts
EventType.USER: Events simulating user interactions
Event Management
The environment manages events through two main data structures:
Event Queue: Stores future events waiting to be processed
Event Log: Contains the history of completed events
Event Graphs
ARE supports Event Graphs - DAG (Directed Acyclic Graph) representations that enable complex scenario design with:
Event Dependencies: Chain events with timing relationships
Condition Monitoring: Check environment state and trigger responses
Validation Logic: Verify that agents complete expected actions
Here’s an example of a simple event dependency chain after 32 seconds of simulation:
For more complex scenarios, event graphs can represent sophisticated dependency chains:
This complex graph shows how multiple events can be interconnected with various dependencies, timing constraints and validation, allowing for realistic and sophisticated scenario design.
Learn more about events through Events API Reference for technical API details, or explore Events for comprehensive coverage of the event-driven architecture and lifecycle.
LLM Inference
LLM Inference powers the AI agents in ARE through flexible integration with various language model providers.
Agents use LLMs for reasoning, planning, and generating responses within the simulation environment.
Role in ARE
LLMs serve as the core intelligence layer for agents, enabling them to:
Understand Tasks: Process user requests and scenario context to determine what needs to be accomplished
Reason About Actions: Analyze available tools and environment state to make informed decisions
Generate Tool Calls: Create appropriate API calls to interact with apps and modify the environment
Adapt to Changes: Respond to dynamic events and environmental updates throughout scenario execution
Flexible Provider Support
ARE integrates with multiple LLM providers through LiteLLM, supporting:
Hosted APIs: Including Llama API, Hugging Face providers, and commercial services
Local Models: Self-hosted deployments for privacy and cost control
Custom Endpoints: Integration with private or specialized model deployments
The system automatically handles provider-specific configurations and API differences, allowing you to focus on agent behavior rather than infrastructure details.
For detailed configuration instructions, provider setup, and CLI usage examples, see LLM Configuration Guide.
Notifications
Notifications serve as the secondary interface between agents and their environment, complementing the primary tool-based interaction model.
Similar to a mobile device notification system, this framework alerts agents to important environmental changes without requiring constant monitoring.
Notification System Architecture
The notification system operates as a selective observability mechanism that implements partial rather than complete environmental awareness:
Filtered Information Flow: Not every event generates a notification - the system filters based on relevance and configured policies
Configurable Verbosity: Three levels (LOW, MEDIUM, HIGH) control how much environmental activity becomes visible to agents
Pull-Based Interaction: Agents retrieve notifications at the beginning of each step, integrating them into their context
Priority Queue: Notifications are ordered by timestamp to ensure temporal consistency
Integration with Agent Workflow
Notifications are injected into the agent’s context at each ReAct step:
Environment Processing: Events occur and are filtered through the notification policy
Queue Management: Relevant events are added to the notification queue with timestamps
Agent Integration: At each agent step, pending notifications are retrieved and added to context
Contextual Awareness: Agents can respond to environmental changes they might otherwise miss
This system enables agents to maintain awareness of dynamic environmental changes while focusing on their primary tasks, creating more realistic and responsive agent behavior.
For detailed information about the notification system architecture and implementation, see Notifications.
Understanding these core concepts provides the foundation for effectively using the platform,
whether you’re running existing scenarios, creating benchmarks, or
developing your own custom content.
To fully understand the framework, it’s essential to grasp its core concepts.
The following subsections provide a comprehensive explanation of the foundations of the framework.
We highly encourage that you read through them, starting with Apps.
Once you have a solid understanding of the core concepts, you can move on to the next section, which covers the practical aspects of using the Meta Agents Research Environments.