CLI Reference¶

The Agents Research Environments provides three main command-line interfaces for different use cases:

are-run: Simple scenario runner for individual scenarios
are-benchmark: Benchmark runner for dataset evaluation
are-gui: GUI server for interactive scenario management

All CLIs share common parameters for consistency and ease of use.

Note

Recommended Usage: We recommend using uvx –from meta-agents-research-environments to run these commands without installing the package locally:

# Instead of: are-run -s scenario_name
uvx --from meta-agents-research-environments are-run -s scenario_name

# Instead of: are-benchmark run --hf dataset
uvx --from meta-agents-research-environments are-benchmark run --hf dataset

# Instead of: are-gui -s scenario_name
uvx --from meta-agents-research-environments are-gui -s scenario_name

For users who want to dig deeper into the library or develop custom scenarios, local installation is available (see Installation).

Common Parameters¶

The following parameters are available across all Meta Agents Research Environments CLI tools:

Model Configuration¶

-m, --model <MODEL>¶: Model name to use for the agent. This specifies which language model will be used to power the AI agent during scenario execution.

-mp, --provider <PROVIDER>¶: Provider of the model (e.g., ‘openai’, ‘anthropic’, ‘meta’). This determines which API or service will be used to access the specified model.

--endpoint <URL>¶: URL of the endpoint to contact for running the agent’s model. Use this when connecting to custom model endpoints or local model servers.

Agent Configuration¶

-a, --agent <AGENT>¶: Agent to use for running the scenarios. This specifies which agent implementation will be used to interact with the model and execute scenario actions.

Logging Configuration¶

--log-level <LEVEL>¶: Set the logging level. Available levels: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default: INFO

Runtime Configuration¶

-o, --oracle¶: Run scenarios in Oracle mode where oracle events (user-defined agent events) are executed. This is useful for testing and validation scenarios.

--simulated_generation_time_mode <MODE>¶: Mode for simulating LLM generation time. Available modes: measured, fixed, random. Default: measured

--noise¶: Enable noise augmentation with tool augmentation and environment events configs. This adds realistic variability to scenario execution.

--max_concurrent_scenarios <NUMBER>¶: Maximum number of concurrent scenarios to run. If not specified, automatically sets based on the number of CPUs available.

Output Configuration¶

--output_dir <DIRECTORY>¶: Directory to dump the scenario states and logs.

JSON Configuration¶

--kwargs <JSON>¶: Additional keyword arguments as a JSON string to pass to the scenario initialization function. Default: {}

--scenario_kwargs <JSON>¶: Additional keyword arguments as a JSON string to pass to initialize the scenario. Default: {}

are-run CLI¶

The main scenario runner for executing individual scenarios.

are-run¶

Main entry point for the Meta Agents Research Environments scenario runner CLI.

This function processes command line arguments and runs scenarios using the MultiScenarioRunner. It supports running scenarios by ID from the registry or by providing JSON scenario files.

are-run [OPTIONS]

Options

--log-level <log_level>¶

Set the logging level

Options:: DEBUG | INFO | WARNING | ERROR | CRITICAL

-a, --agent <agent>¶

Agent to use for running the Scenario

Options:: default

--endpoint <endpoint>¶: URL of the endpoint to contact for running the agent’s model

-mp, --provider, --model_provider <provider>¶

Provider of the model

Options:: azure | meta | local | llama-api | huggingface | mock | black-forest-labs | cerebras | cohere | fal-ai | featherless-ai | fireworks-ai | groq | hf-inference | hyperbolic | nebius | novita | nscale | openai | replicate | sambanova | together

-m, --model <model>¶: Model used in the agent

--max_concurrent_scenarios <max_concurrent_scenarios>¶: Maximum number of concurrent scenarios to run. If not specified, automatically sets based on the number of CPUs

--noise¶: Enable noise augmentation with tool augmentation and environment events configs

--simulated_generation_time_mode <simulated_generation_time_mode>¶

Mode for simulating LLM generation time

Options:: measured | fixed | random

-o, --oracle¶: Run the scenario in Oracle mode where oracle events (i.e. user defined agent events) are ran

--scenario_kwargs <scenario_kwargs>¶: Additional keyword arguments to pass to initialize the scenario as a JSON string

--multi_scenario_kwargs <multi_scenario_kwargs>¶: A list of additional keyword arguments to pass while initializing the scenario as an array of json strings. Will initialize the same scenario with different arguments.

--multi_kwargs <multi_kwargs>¶: A list of additional keyword arguments to pass to the scenario creation function as a JSON string. Will create multiple scenarios with the same kwargs, with the only difference being the arguments in this list.

--kwargs <kwargs>¶: Additional keyword arguments to pass to the scenario initialize function as a JSON string

-s, --scenario-id <scenario_id>¶: Scenarios to run from registry (can be specified multiple times)

--scenario-file <scenario_file>¶: JSON scenario files to run (can be specified multiple times)

--hf-url <hf_url>¶: HuggingFace dataset URLs in format: hf://datasets/dataset_name/config/split/scenario_id (can be specified multiple times)

--output_dir, --dump_dir <output_dir>¶: Directory to dump the scenario states and logs

-e, --export¶: Export the trace to a JSON file.

-w, --wait-for-user-input-timeout <wait_for_user_input_timeout>¶: Timeout for user inputs in seconds (no timeout by default).

--list-scenarios¶: List all available scenarios and exit.

Run Usage Examples¶

Run a scenario by ID:

are-run --scenario-id example_scenario --model Llama-3.1-70B-Instruct --provider llama-api

Run scenarios from JSON files:

are-run --scenario-file scenario1.json --scenario-file scenario2.json --model Llama-3.1-70B-Instruct --provider llama-api

Run with custom output directory and oracle mode:

are-run --scenario-id test_scenario --model Llama-3.1-70B-Instruct --provider llama-api --oracle --output_dir ./results

are-benchmark CLI¶

The benchmark runner for evaluating your agent against a datasets (e.g. Gaia2).

For comprehensive documentation, examples, and best practices, see: Benchmarking with Meta Agents Research Environments

are-gui CLI¶

The GUI server for interactive scenario management and execution.

are-gui¶

Main entry point for the Meta Agents Research Environments GUI server CLI.

This function starts the Meta Agents Research Environments GUI server with the specified configuration, providing a web-based interface for running and managing scenarios.

are-gui [OPTIONS]

Options

--log-level <log_level>¶

Set the logging level

Options:: DEBUG | INFO | WARNING | ERROR | CRITICAL

-a, --agent <agent>¶

Agent to use for running the Scenario

Options:: default

--endpoint <endpoint>¶: URL of the endpoint to contact for running the agent’s model

-mp, --provider, --model_provider <provider>¶

Provider of the model

Options:: azure | meta | local | llama-api | huggingface | mock | black-forest-labs | cerebras | cohere | fal-ai | featherless-ai | fireworks-ai | groq | hf-inference | hyperbolic | nebius | novita | nscale | openai | replicate | sambanova | together

-m, --model <model>¶: Model used in the agent

--kwargs <kwargs>¶: Additional keyword arguments to pass to the scenario initialize function as a JSON string

-s, --scenario_id <scenario_id>¶: Scenario to run. Can be a scenario ID from the registry or a HuggingFace URL in format: hf://datasets/dataset_name/config/split/scenario_id

-h, --hostname <hostname>¶: Server hostname

-p, --port <port>¶: Server port

-c, --certfile <certfile>¶: Server SSL certificate path

-k, --keyfile <keyfile>¶: Server SSL key path

-d, --debug¶: Enable debugging mode.

--profile¶: Enable cProfile profiler.

--ui_view <ui_view>¶: Default UI mode to start client in. Examples: ‘SCENARIOS’, ‘PLAYGROUND’

--inactivity-limit <inactivity_limit>¶: Session inactivity limit in seconds before cleanup

--cleanup-interval <cleanup_interval>¶: Interval in seconds between session cleanup checks

--dataset-path <dataset_path>¶: Path to the dataset directory containing JSON scenario files organized in subfolders

GUI Usage Examples¶

Start GUI server on default port:

are-gui --model Llama-3.1-70B-Instruct --provider llama-api

Start with custom hostname and port:

are-gui --hostname 0.0.0.0 --port 8080 --model Llama-3.1-70B-Instruct --provider llama-api

Start with SSL support:

are-gui --certfile cert.pem --keyfile key.pem --model Llama-3.1-70B-Instruct --provider llama-api

Troubleshooting¶

Common Issues¶

Parameter Conflicts

If you see errors about conflicting parameters, make sure you’re not mixing old and new parameter names:

# Wrong - mixing old and new names
are-run --scenario-id test --scenario_id backup

# Correct - use consistent naming
are-run --scenario-id test backup

Model Provider Issues

Ensure your model provider is correctly specified:

# For OpenAI models
are-run --model gpt-4 --provider openai

# For Anthropic models
are-run --model claude-3-sonnet --provider anthropic

Dataset Loading Issues

For Hugging Face datasets, make sure to specify the split:

# Wrong - missing split
are-benchmark run --hf-dataset my_dataset

# Correct - with split specified
are-benchmark run --hf-dataset my_dataset --hf-split test

Getting Help¶

Use the --help flag with any CLI command to see detailed usage information:

are-run --help
are-benchmark --help
are-gui --help

For specific subcommands:

are-benchmark run --help
are-benchmark judge --help