LLM inference in ARE is powered by LiteLLM, providing flexible access to various language model providers and local models.
The system supports multiple inference backends to accommodate different deployment scenarios and model preferences.
Supported Providers
ARE integrates with multiple LLM providers through LiteLLM:
Llama API: Meta’s hosted Llama models via Llama API
In most CLI examples throughout this documentation, we omit the LLM connection arguments (-p, --provider, --endpoint) for brevity.
You can choose any provider and model combination that suits your needs by adding the appropriate arguments shown below.
Using Llama API (Recommended):
# Run with Llama 3.1 70B via Llama API
are-run-sscenario_find_image_file-adefault--providerllama-api-mLlama-4-Maverick-17B-128E-Instruct-FP8
# Benchmark with Llama API
are-benchmark-sscenario_find_image_file-adefault--providerllama-api-mLlama-4-Maverick-17B-128E-Instruct-FP8
Using Local Models:
# Run with local model
are-run-sscenario_find_image_file-adefault--providerlocal-mllama3.1-8b-instruct--endpointhttp://localhost:8000
# Run with Hugging Face local deployment
are-run-sscenario_find_image_file-adefault--providerhuggingface-mmeta-llama/Llama-3.1-8B-Instruct
Using Hugging Face Providers:
# Run with Together AI
are-run-sscenario_find_image_file-adefault--providertogether-mmeta-llama/Llama-3.1-70B-Instruct
# Run with Fireworks AI
are-run-sscenario_find_image_file-adefault--providerfireworks-ai-maccounts/fireworks/models/llama-v3p1-70b-instruct