Quickstart: Getting Your Agent to Run
Prerequisites
To use GPT-4o to complete tasks, set your OpenAI API key:
export OPENAI_API_KEY=YOUR_KEY
Alternatively, edit the corresponding line in config/agent/GPT-4o.yaml.
Supported Clients
We support multiple client types:
- vLLM
- OpenAI
- Azure
- AWS
To use a different client, change the client_type argument in your configuration. Check out /src/open_apps/agent/vLLM_agent.py Line 49 and following for specifics about how these clients are called internally.
Running Your Agent
uv run launch_agent.py agent=GPT-4o
To run a local model with vLLM,
-
Launch your local vLLM model:
vllm serve [MODEL_NAME]. VLLM will tell you your hostname. -
Launch your agent
uv run launch_agent.py agent=AGENT_CONFIG agent.hostname=VLLM_HOSTNAME
Configuring Your Policy
Our agent policies are built on top of AgentLab. Our setup enables automatic configuration of your prompt with config flags. Here are some key flags you can configure in your agent's YAML file:
Observation Flags:
use_axtree: Enable AXTree observation (accessibility tree)use_screenshot: Enable screenshot observationuse_som: Add visual marks to screenshots for element identificationextract_coords: Include element coordinates in observations
History & Memory Flags:
use_history: Enable action/thought history trackinguse_action_history: Track previous actions taken by the agentuse_think_history: Track previous thoughts/reasoning steps
Reasoning & Examples Flags:
use_thinking: Enable chain-of-thought reasoning before actionsuse_concrete_example: Include concrete examples in the promptuse_abstract_example: Include abstract reasoning examples in the prompt
Custom Prompts:
prompt_txt.system_prompt: Override the default system promptprompt_txt.action_prompt: Define custom action instructionsprompt_txt.think_prompt: Define custom thinking/reasoning instructions
For the complete set of configuration options, see config/agent/default.yaml.
Creating Your Own Agent
If AgentLab's capabilities don't meet your needs, you can create a custom agent.
- Navigate to
src/open_apps/agent/ -
Copy and modify the following files:
vLLM_agent.pyvLLM_prompt.py
This allows you to build rich, custom agent implementations tailored to your specific requirements.