Architecture¶
Note
This section describes internal implementation details of the autoresearch engine. The architecture may change at any time without notice.
The implementation is organized into several modules with clear boundaries. The goal is to keep scheduling generic, domain logic testable, and infrastructure swappable.
Design Principles¶
Several design choices cut across the modules and are not obvious from the code alone.
The engine must remain domain-neutral. The async work engine must not grow SPDL, coding-agent, source-control, metrics, hypothesis-planning, or experiment-phase logic. If a behavior depends on what an experiment is, it belongs in the workflow adapter, the policy module, or behind a platform capability — never in the runner.
Stop criteria live in the planner, not the engine. The engine
stops when the queue and running set are both empty. Autoresearch
enforces its own stopping conditions (plateau patience, max iterations,
all best practices tried) by returning no children from the planning
step. This keeps the engine simple and avoids a domain-specific
should_stop callback.
Resume is phase-based. Each experiment coroutine persists its phase
(queued, preparing, running, analyzing, completed, failed) at every
meaningful boundary. On resume, the coroutine inspects the persisted
phase to skip already-completed steps: a running experiment with a
known job ID resumes polling rather than re-launching, and an
analyzing experiment skips straight to analysis.
Domain coroutines own their cancellation behavior. The engine
cancels asyncio tasks on SIGINT/SIGTERM, but each coroutine
decides what state to persist before re-raising CancelledError.
Remote jobs are not automatically cancelled by the engine — if remote
cancellation is needed, the coroutine or adapter must do it explicitly.
Failures are structured domain data. Every failure path (prepare,
build, launch, poll, analyze, plan) produces a FailureRecord with a
FailureKind and FailurePhase. The runner never learns about
failure kinds. Expected failures flow through _AutoresearchError;
unexpected exceptions are caught and wrapped into structured records.
This ensures durable accounting even for phases that never reach a
remote job.
Async Work Engine¶
The generic runner (utils/runner.py) knows nothing about SPDL,
training jobs, source control, metrics, or hypothesis planning. It
operates on serializable _WorkSpec objects and a _WorkAdapter
protocol:
Maintains a priority queue of pending
_WorkSpecobjects.Starts up to
max_concurrencycoroutines via the adapter.Waits for the first coroutine to complete.
Passes completed
_WorkResultobjects (which may contain child specs) back to the adapter and re-queues children.Checkpoints queued and running specs on cancellation.
The runner does not inspect experiment payloads. Infrastructure-specific work belongs in the platform capability layer, and domain decisions belong in the workflow adapter.
Workflow¶
The autoresearch workflow (utils/workflow/) is the domain side of
the boundary. It turns an experiment _WorkSpec into a coroutine
that performs the full experiment lifecycle:
Restore or prepare the source tree.
Apply code changes when the experiment requires a rebuild.
Build the image and launch the remote job.
Poll for completion and detect stalled jobs.
Collect metrics and run coding agent analysis.
Record state, master-table rows, findings, and plots.
Ask the coding agent for follow-up experiments and return them as child
_WorkSpecobjects.
The workflow is split into focused modules:
adapter.py – the
AutoresearchAdapterthat implements_WorkAdapterand orchestrates the experiment coroutine.policy.py – deterministic decisions (planning gates, duplicate filtering, stall detection) expressed as pure functions that can be unit tested without infrastructure.
store.py – durable state persistence (master table, findings, tree visualization).
analysis_ops.py / planning_ops.py / source_ops.py – individual workflow operations that interact with the platform.
Platform Capabilities¶
The platform layer (utils/platform/) provides a capability boundary
between the workflow and infrastructure. AutoresearchPlatform
bundles five capability objects:
_Workspace– source control operations (detect SCM, commit, restore, check for changes)._Artifacts– image building and tagging._Execution– job launch, status polling, and cancellation._Evidence– metrics collection and system profiling._CodingAgent– stateless coding agent invocations (analysis, planning, code changes).
The workflow can swap local, remote, Claude, Codex, or test implementations by replacing these capability objects without changing any orchestration code.
Stateless Agent Invocations¶
Each coding agent call is fully stateless. The workflow constructs a self-contained prompt that includes everything the agent needs: the SPDL optimization knowledge base, the full experiment history, collected metrics, and the pipeline source code. There is no persistent conversation or session state.
This design makes the system robust to interruptions. After Ctrl+C,
the engine can resume from the last persisted checkpoint without
relying on a conversation session. It also means the coding agent can
be swapped between runs (e.g., switching from Claude to Codex) with
no state migration.
Hypothesis Tree¶
Experiments are organized in a tree structure. The seed experiments (baseline, headspace, MTP) are root nodes. Follow-up experiments proposed by the coding agent become children of the node that triggered the planning.
baseline
headspace
mtp
├── gpu_nvdec_decode
│ ├── split_demux_decode
│ │ └── nvdec_c7_optimal
│ └── nvdec_c20_oversub
├── batch_size_16
└── torch_compile
Each node tracks its status (queued, preparing, running, analyzing,
completed, failed), the source control commit it was built from, and
the analysis results. The tree is owned by the workflow store and
visualized as hypothesis_tree.png after each experiment completes.
The following is the hypothesis tree from the video classification example, showing 116 nodes explored across 120 experiments: