CLI Reference

The Agents Research Environments provides three main command-line interfaces for different use cases:

  • are-run: Simple scenario runner for individual scenarios

  • are-benchmark: Benchmark runner for dataset evaluation

  • are-gui: GUI server for interactive scenario management

All CLIs share common parameters for consistency and ease of use.

Note

Recommended Usage: We recommend using uvx –from meta-agents-research-environments to run these commands without installing the package locally:

# Instead of: are-run -s scenario_name
uvx --from meta-agents-research-environments are-run -s scenario_name

# Instead of: are-benchmark run --hf dataset
uvx --from meta-agents-research-environments are-benchmark run --hf dataset

# Instead of: are-gui -s scenario_name
uvx --from meta-agents-research-environments are-gui -s scenario_name

For users who want to dig deeper into the library or develop custom scenarios, local installation is available (see Installation).

Common Parameters

The following parameters are available across all Meta Agents Research Environments CLI tools:

Model Configuration

-m, --model <MODEL>

Model name to use for the agent. This specifies which language model will be used to power the AI agent during scenario execution.

-mp, --provider <PROVIDER>

Provider of the model (e.g., ‘openai’, ‘anthropic’, ‘meta’). This determines which API or service will be used to access the specified model.

--endpoint <URL>

URL of the endpoint to contact for running the agent’s model. Use this when connecting to custom model endpoints or local model servers.

Agent Configuration

-a, --agent <AGENT>

Agent to use for running the scenarios. This specifies which agent implementation will be used to interact with the model and execute scenario actions.

Logging Configuration

--log-level <LEVEL>

Set the logging level. Available levels: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default: INFO

Runtime Configuration

-o, --oracle

Run scenarios in Oracle mode where oracle events (user-defined agent events) are executed. This is useful for testing and validation scenarios.

--simulated_generation_time_mode <MODE>

Mode for simulating LLM generation time. Available modes: measured, fixed, random. Default: measured

--noise

Enable noise augmentation with tool augmentation and environment events configs. This adds realistic variability to scenario execution.

--max_concurrent_scenarios <NUMBER>

Maximum number of concurrent scenarios to run. If not specified, automatically sets based on the number of CPUs available.

Output Configuration

--output_dir <DIRECTORY>

Directory to dump the scenario states and logs.

JSON Configuration

--kwargs <JSON>

Additional keyword arguments as a JSON string to pass to the scenario initialization function. Default: {}

--scenario_kwargs <JSON>

Additional keyword arguments as a JSON string to pass to initialize the scenario. Default: {}

are-run CLI

The main scenario runner for executing individual scenarios.

are-run

Main entry point for the Meta Agents Research Environments scenario runner CLI.

This function processes command line arguments and runs scenarios using the MultiScenarioRunner. It supports running scenarios by ID from the registry or by providing JSON scenario files.

are-run [OPTIONS]

Options

--log-level <log_level>

Set the logging level

Options:

DEBUG | INFO | WARNING | ERROR | CRITICAL

-a, --agent <agent>

Agent to use for running the Scenario

Options:

default

--endpoint <endpoint>

URL of the endpoint to contact for running the agent’s model

-mp, --provider, --model_provider <provider>

Provider of the model

Options:

azure | meta | local | llama-api | huggingface | mock | black-forest-labs | cerebras | cohere | fal-ai | featherless-ai | fireworks-ai | groq | hf-inference | hyperbolic | nebius | novita | nscale | openai | replicate | sambanova | together

-m, --model <model>

Model used in the agent

--max_concurrent_scenarios <max_concurrent_scenarios>

Maximum number of concurrent scenarios to run. If not specified, automatically sets based on the number of CPUs

--noise

Enable noise augmentation with tool augmentation and environment events configs

--simulated_generation_time_mode <simulated_generation_time_mode>

Mode for simulating LLM generation time

Options:

measured | fixed | random

-o, --oracle

Run the scenario in Oracle mode where oracle events (i.e. user defined agent events) are ran

--scenario_kwargs <scenario_kwargs>

Additional keyword arguments to pass to initialize the scenario as a JSON string

--multi_scenario_kwargs <multi_scenario_kwargs>

A list of additional keyword arguments to pass while initializing the scenario as an array of json strings. Will initialize the same scenario with different arguments.

--multi_kwargs <multi_kwargs>

A list of additional keyword arguments to pass to the scenario creation function as a JSON string. Will create multiple scenarios with the same kwargs, with the only difference being the arguments in this list.

--kwargs <kwargs>

Additional keyword arguments to pass to the scenario initialize function as a JSON string

-s, --scenario-id <scenario_id>

Scenarios to run from registry (can be specified multiple times)

--scenario-file <scenario_file>

JSON scenario files to run (can be specified multiple times)

--hf-url <hf_url>

HuggingFace dataset URLs in format: hf://datasets/dataset_name/config/split/scenario_id (can be specified multiple times)

--output_dir, --dump_dir <output_dir>

Directory to dump the scenario states and logs

-e, --export

Export the trace to a JSON file.

-w, --wait-for-user-input-timeout <wait_for_user_input_timeout>

Timeout for user inputs in seconds (no timeout by default).

--list-scenarios

List all available scenarios and exit.

Run Usage Examples

Run a scenario by ID:

are-run --scenario-id example_scenario --model Llama-3.1-70B-Instruct --provider llama-api

Run scenarios from JSON files:

are-run --scenario-file scenario1.json --scenario-file scenario2.json --model Llama-3.1-70B-Instruct --provider llama-api

Run with custom output directory and oracle mode:

are-run --scenario-id test_scenario --model Llama-3.1-70B-Instruct --provider llama-api --oracle --output_dir ./results

are-benchmark CLI

The benchmark runner for evaluating your agent against a datasets (e.g. Gaia2).

For comprehensive documentation, examples, and best practices, see: Benchmarking with Meta Agents Research Environments

are-gui CLI

The GUI server for interactive scenario management and execution.

are-gui

Main entry point for the Meta Agents Research Environments GUI server CLI.

This function starts the Meta Agents Research Environments GUI server with the specified configuration, providing a web-based interface for running and managing scenarios.

are-gui [OPTIONS]

Options

--log-level <log_level>

Set the logging level

Options:

DEBUG | INFO | WARNING | ERROR | CRITICAL

-a, --agent <agent>

Agent to use for running the Scenario

Options:

default

--endpoint <endpoint>

URL of the endpoint to contact for running the agent’s model

-mp, --provider, --model_provider <provider>

Provider of the model

Options:

azure | meta | local | llama-api | huggingface | mock | black-forest-labs | cerebras | cohere | fal-ai | featherless-ai | fireworks-ai | groq | hf-inference | hyperbolic | nebius | novita | nscale | openai | replicate | sambanova | together

-m, --model <model>

Model used in the agent

--kwargs <kwargs>

Additional keyword arguments to pass to the scenario initialize function as a JSON string

-s, --scenario_id <scenario_id>

Scenario to run. Can be a scenario ID from the registry or a HuggingFace URL in format: hf://datasets/dataset_name/config/split/scenario_id

-h, --hostname <hostname>

Server hostname

-p, --port <port>

Server port

-c, --certfile <certfile>

Server SSL certificate path

-k, --keyfile <keyfile>

Server SSL key path

-d, --debug

Enable debugging mode.

--profile

Enable cProfile profiler.

--ui_view <ui_view>

Default UI mode to start client in. Examples: ‘SCENARIOS’, ‘PLAYGROUND’

--inactivity-limit <inactivity_limit>

Session inactivity limit in seconds before cleanup

--cleanup-interval <cleanup_interval>

Interval in seconds between session cleanup checks

--dataset-path <dataset_path>

Path to the dataset directory containing JSON scenario files organized in subfolders

GUI Usage Examples

Start GUI server on default port:

are-gui --model Llama-3.1-70B-Instruct --provider llama-api

Start with custom hostname and port:

are-gui --hostname 0.0.0.0 --port 8080 --model Llama-3.1-70B-Instruct --provider llama-api

Start with SSL support:

are-gui --certfile cert.pem --keyfile key.pem --model Llama-3.1-70B-Instruct --provider llama-api

Troubleshooting

Common Issues

Parameter Conflicts

If you see errors about conflicting parameters, make sure you’re not mixing old and new parameter names:

# Wrong - mixing old and new names
are-run --scenario-id test --scenario_id backup

# Correct - use consistent naming
are-run --scenario-id test backup

Model Provider Issues

Ensure your model provider is correctly specified:

# For OpenAI models
are-run --model gpt-4 --provider openai

# For Anthropic models
are-run --model claude-3-sonnet --provider anthropic

Dataset Loading Issues

For Hugging Face datasets, make sure to specify the split:

# Wrong - missing split
are-benchmark run --hf-dataset my_dataset

# Correct - with split specified
are-benchmark run --hf-dataset my_dataset --hf-split test

Getting Help

Use the --help flag with any CLI command to see detailed usage information:

are-run --help
are-benchmark --help
are-gui --help

For specific subcommands:

are-benchmark run --help
are-benchmark judge --help