.. _autoresearch-architecture: Architecture ============ .. note:: This section describes internal implementation details of the autoresearch engine. The architecture may change at any time without notice. The implementation is organized into several modules with clear boundaries. The goal is to keep scheduling generic, domain logic testable, and infrastructure swappable. .. mermaid:: flowchart TB CLI["python cli.py"] Supervisor["Supervisor agent
(interactive)"] Engine["AsyncWorkEngine
(runner.py)"] Adapter["AutoresearchAdapter
(workflow/)"] Policy["policy.py"] Store["store.py"] Platform["AutoresearchPlatform
(platform/)"] Agent["_CodingAgent"] CLI --> Supervisor Supervisor --> Engine Engine --> Adapter Adapter --> Policy Adapter --> Store Adapter --> Platform Platform --> Agent Design Principles ------------------ Several design choices cut across the modules and are not obvious from the code alone. **The engine must remain domain-neutral.** The async work engine must not grow SPDL, coding-agent, source-control, metrics, hypothesis-planning, or experiment-phase logic. If a behavior depends on what an experiment is, it belongs in the workflow adapter, the policy module, or behind a platform capability — never in the runner. **Stop criteria live in the planner, not the engine.** The engine stops when the queue and running set are both empty. Autoresearch enforces its own stopping conditions (plateau patience, max iterations, all best practices tried) by returning no children from the planning step. This keeps the engine simple and avoids a domain-specific ``should_stop`` callback. **Resume is phase-based.** Each experiment coroutine persists its phase (queued, preparing, running, analyzing, completed, failed) at every meaningful boundary. On resume, the coroutine inspects the persisted phase to skip already-completed steps: a ``running`` experiment with a known job ID resumes polling rather than re-launching, and an ``analyzing`` experiment skips straight to analysis. **Domain coroutines own their cancellation behavior.** The engine cancels asyncio tasks on ``SIGINT``/``SIGTERM``, but each coroutine decides what state to persist before re-raising ``CancelledError``. Remote jobs are not automatically cancelled by the engine — if remote cancellation is needed, the coroutine or adapter must do it explicitly. **Failures are structured domain data.** Every failure path (prepare, build, launch, poll, analyze, plan) produces a ``FailureRecord`` with a ``FailureKind`` and ``FailurePhase``. The runner never learns about failure kinds. Expected failures flow through ``_AutoresearchError``; unexpected exceptions are caught and wrapped into structured records. This ensures durable accounting even for phases that never reach a remote job. Async Work Engine ----------------- The generic runner (``utils/runner.py``) knows nothing about SPDL, training jobs, source control, metrics, or hypothesis planning. It operates on serializable ``_WorkSpec`` objects and a ``_WorkAdapter`` protocol: - Maintains a priority queue of pending ``_WorkSpec`` objects. - Starts up to ``max_concurrency`` coroutines via the adapter. - Waits for the first coroutine to complete. - Passes completed ``_WorkResult`` objects (which may contain child specs) back to the adapter and re-queues children. - Checkpoints queued and running specs on cancellation. The runner does not inspect experiment payloads. Infrastructure-specific work belongs in the platform capability layer, and domain decisions belong in the workflow adapter. Workflow -------- The autoresearch workflow (``utils/workflow/``) is the domain side of the boundary. It turns an experiment ``_WorkSpec`` into a coroutine that performs the full experiment lifecycle: - Restore or prepare the source tree. - Apply code changes when the experiment requires a rebuild. - Build the image and launch the remote job. - Poll for completion and detect stalled jobs. - Collect metrics and run coding agent analysis. - Record state, master-table rows, findings, and plots. - Ask the coding agent for follow-up experiments and return them as child ``_WorkSpec`` objects. The workflow is split into focused modules: - **adapter.py** -- the ``AutoresearchAdapter`` that implements ``_WorkAdapter`` and orchestrates the experiment coroutine. - **policy.py** -- deterministic decisions (planning gates, duplicate filtering, stall detection) expressed as pure functions that can be unit tested without infrastructure. - **store.py** -- durable state persistence (master table, findings, tree visualization). - **analysis_ops.py** / **planning_ops.py** / **source_ops.py** -- individual workflow operations that interact with the platform. Platform Capabilities --------------------- The platform layer (``utils/platform/``) provides a capability boundary between the workflow and infrastructure. ``AutoresearchPlatform`` bundles five capability objects: - ``_Workspace`` -- source control operations (detect SCM, commit, restore, check for changes). - ``_Artifacts`` -- image building and tagging. - ``_Execution`` -- job launch, status polling, and cancellation. - ``_Evidence`` -- metrics collection and system profiling. - ``_CodingAgent`` -- stateless coding agent invocations (analysis, planning, code changes). The workflow can swap local, remote, Claude, Codex, or test implementations by replacing these capability objects without changing any orchestration code. .. mermaid:: flowchart LR Workflow["AutoresearchAdapter"] Platform["AutoresearchPlatform"] Workspace["_Workspace"] Artifacts["_Artifacts"] Execution["_Execution"] Evidence["_Evidence"] Agent["_CodingAgent"] Workflow --> Platform Platform --> Workspace Platform --> Artifacts Platform --> Execution Platform --> Evidence Platform --> Agent Stateless Agent Invocations --------------------------- Each coding agent call is fully stateless. The workflow constructs a self-contained prompt that includes everything the agent needs: the SPDL optimization knowledge base, the full experiment history, collected metrics, and the pipeline source code. There is no persistent conversation or session state. This design makes the system robust to interruptions. After ``Ctrl+C``, the engine can resume from the last persisted checkpoint without relying on a conversation session. It also means the coding agent can be swapped between runs (e.g., switching from Claude to Codex) with no state migration. Hypothesis Tree --------------- Experiments are organized in a tree structure. The seed experiments (baseline, headspace, MTP) are root nodes. Follow-up experiments proposed by the coding agent become children of the node that triggered the planning. .. code-block:: text baseline headspace mtp ├── gpu_nvdec_decode │ ├── split_demux_decode │ │ └── nvdec_c7_optimal │ └── nvdec_c20_oversub ├── batch_size_16 └── torch_compile Each node tracks its status (queued, preparing, running, analyzing, completed, failed), the source control commit it was built from, and the analysis results. The tree is owned by the workflow store and visualized as ``hypothesis_tree.png`` after each experiment completes. The following is the hypothesis tree from the :ref:`video classification example `, showing 116 nodes explored across 120 experiments: .. image:: /_static/data/autoresearch_video_classification_hypothesis_tree.png :alt: Hypothesis tree from video classification optimization :width: 100%