LLM Configuration Guide

This guide covers how to configure and use different Large Language Model (LLM) providers with the Agents Research Environments.

Overview

LLM inference in ARE is powered by LiteLLM, providing flexible access to various language model providers and local models. The system supports multiple inference backends to accommodate different deployment scenarios and model preferences.

Supported Providers

ARE integrates with multiple LLM providers through LiteLLM:

  • Llama API: Meta’s hosted Llama models via Llama API

  • Local Models: Self-hosted models running locally

  • Hugging Face: Models hosted on Hugging Face Hub

  • Hugging Face Providers: Various inference providers including:

    • black-forest-labs

    • fal-ai

    • fireworks-ai

    • hf-inference

    • hyperbolic

    • nebius

    • novita

    • replicate

    • sambanova

    • together

Configuration

LLM engines are configured through the LLMEngineConfig which specifies:

  • provider: The inference provider to use

  • model_name: The specific model identifier

  • endpoint: Optional custom endpoint URL for local or private deployments

The system automatically creates the appropriate engine based on the provider:

  • LiteLLMEngine: Used for most providers including llama-api, local, and huggingface

  • HuggingFaceLLMEngine: Used for Hugging Face inference providers

CLI Usage Examples

Note

In most CLI examples throughout this documentation, we omit the LLM connection arguments (-p, --provider, --endpoint) for brevity. You can choose any provider and model combination that suits your needs by adding the appropriate arguments shown below.

Using Llama API (Recommended):

# Run with Llama 3.1 70B via Llama API
are-run -s scenario_find_image_file -a default --provider llama-api -m Llama-4-Maverick-17B-128E-Instruct-FP8

# Benchmark with Llama API
are-benchmark -s scenario_find_image_file -a default --provider llama-api -m Llama-4-Maverick-17B-128E-Instruct-FP8

Using Local Models:

# Run with local model
are-run -s scenario_find_image_file -a default --provider local -m llama3.1-8b-instruct --endpoint http://localhost:8000

# Run with Hugging Face local deployment
are-run -s scenario_find_image_file -a default --provider huggingface -m meta-llama/Llama-3.1-8B-Instruct

Using Hugging Face Providers:

# Run with Together AI
are-run -s scenario_find_image_file -a default --provider together -m meta-llama/Llama-3.1-70B-Instruct

# Run with Fireworks AI
are-run -s scenario_find_image_file -a default --provider fireworks-ai -m accounts/fireworks/models/llama-v3p1-70b-instruct

Environment Variables

Different providers may require specific environment variables:

  • Llama API: LLAMA_API_KEY (required), LLAMA_API_BASE (optional)

  • Hugging Face: HF_TOKEN (for private models)

  • Provider-specific: Each provider may have its own API key requirements

Model Selection

Choose models based on your requirements:

  • Performance: Larger models (70B, 405B) for complex reasoning tasks

  • Speed: Smaller models (8B) for faster inference

  • Cost: Local models for cost-effective deployment

  • Availability: Hosted APIs for convenience without infrastructure setup

The default configuration uses Llama API with llama3.1-70b-instruct for a balance of performance and efficiency.

Next Steps