BATTER Developer Guide#

This guide provides an up-to-date overview of the internal architecture of BATTER. It targets contributors who need to understand or extend the codebase.

The project powers absolute binding (ABFE) and solvation (ASFE) free-energy workflows, supports both local execution and SLURM clusters, and packages results in a portable artifact store.

Code Layout#

batter/                     # Modern package (public API, pipelines, builders)
batter_v1/                  # Legacy BAT.py compatibility layer (frozen)
docs/                       # Sphinx sources (user + developer guides)
examples/                   # Reference YAML workflows and restraint templates
tests/                      # Pytest suite covering configs, pipelines, exec, etc.
extern/                     # Vendored dependencies (editable installs)
devtools/                   # Helper scripts + conda envs for development
scripts/                    # Misc automation helpers
README.rst                  # Project overview
TODO                        # Open engineering tasks / ideas
pyproject.toml / setup.cfg  # Build and packaging metadata
environment*.yml            # Conda environments for the main stack

The batter/ package itself is organised as:

batter/
├── api.py                 # Public entry points (run_from_yaml, FE repos, etc.)
├── cli/                   # click-based CLI commands (run, fe, fek-schedule, ...)
├── config/                # Pydantic models for run/simulation YAML + helpers
├── systems/               # System descriptors and builders (MABFE / MASFE)
├── _internal/             # Low-level build ops (create_box, restraints, sim files)
├── param/                 # Ligand parameterisation helpers
├── pipeline/              # Steps, payloads, pipeline factories
├── exec/                  # Local/SLURM backends and step handlers
├── orchestrate/           # High-level orchestration + pipeline wiring
├── runtime/               # Portable artifact store and FE repository
├── analysis/              # Post-processing & convergence utilities
└── utils/                 # Shared helpers (Amber wrappers, file ops, etc.)

Further Reading#

The following reference chapters live elsewhere in the docs but are useful when working on internal builders or pipelines:

High-Level Execution Flow#

A run triggered via batter.orchestrate.run.run_from_yaml() progresses through the stages below:

  1. Configuration – Parse the top-level YAML into a RunConfig and resolve the composed SimulationConfig.

  2. System build – Use a SystemBuilder (MABFEBuilder or MASFEBuilder) to stage shared inputs under <output_folder>/executions/<run_id>/.

  3. Ligand staging – Copy ligand files under executions/<run_id>/simulations/<LIG>/inputs.

  4. Parameterisation – Run the param_ligands step once to populate executions/<run_id>/artifacts/ligand_params.

  5. Pipeline construction – Select an ABFE/ASFE pipeline using select_pipeline().

  6. Execution – Drive each pipeline phase on the chosen backend (LocalBackend or SlurmBackend). Step handlers consume typed StepPayload objects.

  7. Result packaging – Persist window outputs and summary statistics using FEResultsRepository, enabling portable analysis.

Configuration Layer#

Module: batter.config

  • RunSection – Execution controls that include the artifact destination (run.output_folder) and optional builder override (run.system_type), along with backend/dry-run/failure policy knobs.

  • CreateArgs – Inputs required to stage the system (protein, topology, ligands, restraints).

  • RunConfig – Aggregates the sections, exposes helpers such as load(), model_validate_yaml(), and resolved_sim_config(), and resolves relative paths when a YAML is loaded.

  • SimulationConfig – Fully merged simulation specification produced by RunConfig.resolved_sim_config(). The developer-facing configuration never includes this model directly, but the developer guide documents available fields and the protocol-specific validations (e.g., ABFE requires z_steps*; ASFE requires y_steps*).

Systems and Builders#

Modules: batter.systems.core, batter.systems.mabfe, batter.systems.masfe

  • SimSystem – Immutable descriptor of a system on disk. Metadata is stored in SystemMeta which offers structured accessors and merge semantics for propagating ligand-specific data.

  • MABFEBuilder – Prepares shared ABFE systems and creates per-ligand children under simulations/<LIG>/.

  • MASFEBuilder – MASFE counterpart that stages ligands without a protein topology.

Binding (ABFE) components#

ABFE simulations decouple the ligand from the bound complex using the z component (restraints + decoupling in complex). Both z_steps1/z_steps2 are required via fe_sim.steps1/steps2. Ensure the restraints and lambdas in the run YAML align with your chosen decoupling scheme.

Solvation (ASFE) components#

ASFE simulations run two FE components:

  • y – ligand-in-solvent decoupling.

  • m – ligand-in-vacuum decoupling.

Both components require step counts in fe_sim.steps1/steps2. The orchestrator enforces that both are positive before pipeline execution.

Practical constraints#

  • Water boxes require buffer_x/y/z >= 15 Å; the validator will reject smaller padding to avoid vacuum artifacts. For membranes, automatic Z padding is applied if needed.

  • Resume semantics rely on run_id plus the stored configuration signature (only create and fe_sim fields). Changing execution knobs under run will not trigger a new run_id, so bump the run_id yourself when you want a clean workspace.

  • Lambda overrides: provide a default lambdas list and override per component via component_lambdas when needed. Missing components inherit the default schedule.

Parameterisation#

Module: batter.param.ligand

  • batch_ligand_process() – Performs ligand force-field assignment, producing a content-addressed store under artifacts/ligand_params. Used by the param_ligands handler to distribute parameter files.

Pipelines and Payloads#

Modules: batter.pipeline.step, batter.pipeline.pipeline, batter.pipeline.factory, batter.pipeline.payloads

  • Step – Encapsulates a DAG node with name, requires and a StepPayload. step.params remains as a compatibility alias.

  • Pipeline – Topologically orders steps and invokes the backend through run().

  • batter.pipeline.factory – Builds canonical ABFE/ASFE pipelines. Pipelines are expressed in terms of StepPayload and SystemParams.

  • batter.pipeline.payloads – Defines the typed payload and system-parameter models. See Typed Pipeline Payloads and System Metadata for details.

Execution Backends#

Modules: batter.exec.base, batter.exec.local, batter.exec.slurm, batter.exec.handlers.

  • ExecBackend – Shared protocol implemented by backends.

  • LocalBackend – Runs Python handlers directly (serial or joblib).

  • SlurmBackend – Submits SLURM jobs via SlurmJobManager.

  • Handler modules under batter/exec/handlers implement step-specific logic (system prep, parameterisation, equilibration, FE production, analysis). Each handler receives a StepPayload and a SimSystem.

Orchestration#

Module: batter.orchestrate.run

run_from_yaml() wires every layer together:

  1. Load the run YAML and apply optional overrides.

  2. Instantiate a system builder inferred from the selected protocol (abfe/md → MABFE, asfe → MASFE; overrides via run.system_type remain for backward compatibility).

  3. Resolve staged ligands (supporting resume) and regenerate the system if required.

  4. Construct the ABFE/ASFE pipeline using select_pipeline.

  5. Execute parent-only steps (system_prep, param_ligands).

  6. Clone the pipeline for per-ligand execution, injecting the SLURM job manager when needed.

  7. Run phases sequentially, enforcing skip/resume semantics via batter.orchestrate.markers.

  8. Persist FE results using FEResultsRepository.

Run identifiers and config signatures#

Each execution lives under <output_folder>/executions/<run_id>/. When a run_id already exists, batter.orchestrate.run._compute_run_signature() compares the current YAML against the stored signature under artifacts/config/run_config.hash. Only the simulation inputs are hashed (create and fe_sim/fe); run and override flags do not affect the signature. A normalized JSON snapshot of the hashed payload is also written to artifacts/config/run_config.normalized.json to aid debugging. If the signatures differ and run_id was requested explicitly, the orchestrator raises unless --allow-run-id-mismatch is set; in auto mode it will automatically pick a fresh run_id and log a brief diff of the mismatched fields.

Runtime & Portability#

Modules: batter.runtime.portable, batter.runtime.fe_repo.

  • ArtifactStore – Manages a relocatable manifest of files/directories produced during the run.

  • FEResultsRepository – Indexes and stores FERecord objects, capturing total ΔG, per-window data, and copies of analysis outputs.

Directory Layout Example#

The structure below illustrates an ABFE execution root (<output_folder>/executions/<run_id>/):

executions/<run_id>/
├── artifacts/
│   ├── config/
│   │   ├── sim_overrides.json
│   │   └── sim.resolved.yaml
│   └── ligand_params/
│       ├── index.json
│       └── LIG1/
│           ├── lig.mol2
│           └── metadata.json
├── simulations/
│   ├── LIG1/
│   │   ├── inputs/ligand.sdf
│   │   └── fe/...
│   └── LIG2/
│       └── ...
├── batter.run.log
└── fe/Results/Results.dat     (per-ligand directories once analysis finishes)

Refer to batter.orchestrate.markers for the sentinel files used to detect completion or failure of each phase.