BATTER Developer Guide#
This guide provides an up-to-date overview of the internal architecture of BATTER. It targets contributors who need to understand or extend the codebase.
The project powers absolute binding (ABFE) and solvation (ASFE) free-energy workflows, supports both local execution and SLURM clusters, and packages results in a portable artifact store.
Focus Topics
Code Layout#
batter/ # Modern package (public API, pipelines, builders)
batter_v1/ # Legacy BAT.py compatibility layer (frozen)
docs/ # Sphinx sources (user + developer guides)
examples/ # Reference YAML workflows and restraint templates
tests/ # Pytest suite covering configs, pipelines, exec, etc.
extern/ # Vendored dependencies (editable installs)
devtools/ # Helper scripts + conda envs for development
scripts/ # Misc automation helpers
README.rst # Project overview
TODO # Open engineering tasks / ideas
pyproject.toml / setup.cfg # Build and packaging metadata
environment*.yml # Conda environments for the main stack
The batter/ package itself is organised as:
batter/
├── api.py # Public entry points (run_from_yaml, FE repos, etc.)
├── cli/ # click-based CLI commands (run, fe, fek-schedule, ...)
├── config/ # Pydantic models for run/simulation YAML + helpers
├── systems/ # System descriptors and builders (MABFE / MASFE)
├── _internal/ # Low-level build ops (create_box, restraints, sim files)
├── param/ # Ligand parameterisation helpers
├── pipeline/ # Steps, payloads, pipeline factories
├── exec/ # Local/SLURM backends and step handlers
├── orchestrate/ # High-level orchestration + pipeline wiring
├── runtime/ # Portable artifact store and FE repository
├── analysis/ # Post-processing & convergence utilities
└── utils/ # Shared helpers (Amber wrappers, file ops, etc.)
Further Reading#
The following reference chapters live elsewhere in the docs but are useful when working on internal builders or pipelines:
For Slurm header customisation, see SLURM header templates. For REMD operational details, see REMD submission flow.
High-Level Execution Flow#
A run triggered via batter.orchestrate.run.run_from_yaml() progresses through
the stages below:
Configuration – Parse the top-level YAML into a
RunConfigand resolve the composedSimulationConfig.System build – Use a
SystemBuilder(MABFEBuilderorMASFEBuilder) to stage shared inputs under<output_folder>/executions/<run_id>/.Ligand staging – Copy ligand files under
executions/<run_id>/simulations/<LIG>/inputs.Parameterisation – Run the
param_ligandsstep once to populateexecutions/<run_id>/artifacts/ligand_params.Pipeline construction – Select an ABFE/ASFE pipeline using
select_pipeline().Execution – Drive each pipeline phase on the chosen backend (
LocalBackendorSlurmBackend). Step handlers consume typedStepPayloadobjects.Result packaging – Persist window outputs and summary statistics using
FEResultsRepository, enabling portable analysis.
Configuration Layer#
Module: batter.config
RunSection– Execution controls that include the artifact destination (run.output_folder) and optional builder override (run.system_type), along with backend/dry-run/failure policy knobs and notification settings (run.email_on_completion/run.email_sender).CreateArgs– Inputs required to stage the system (protein, topology, ligands, restraints).RunConfig– Aggregates the sections, exposes helpers such asload(),model_validate_yaml(), andresolved_sim_config(), and resolves relative paths when a YAML is loaded.SimulationConfig– Fully merged simulation specification produced byRunConfig.resolved_sim_config(). The developer-facing configuration never includes this model directly, but the developer guide documents available fields and the protocol-specific validations (e.g., ABFE requiresz_steps*; ASFE requiresy_steps*).
Systems and Builders#
Modules: batter.systems.core, batter.systems.mabfe, batter.systems.masfe
SimSystem– Immutable descriptor of a system on disk. Metadata is stored inSystemMetawhich offers structured accessors andmergesemantics for propagating ligand-specific data.MABFEBuilder– Prepares shared ABFE systems and creates per-ligand children undersimulations/<LIG>/.MASFEBuilder– MASFE counterpart that stages ligands without a protein topology.
Binding (ABFE) components#
ABFE simulations decouple the ligand from the bound complex using the z component
(restraints + decoupling in complex). Provide total production steps via
fe_sim.n_steps (or z_n_steps). Ensure the restraints and lambdas in the run YAML
align with your chosen decoupling scheme.
Solvation (ASFE) components#
ASFE simulations run two FE components:
y– ligand-in-solvent decoupling.m– ligand-in-vacuum decoupling.
Both components require step counts in fe_sim.n_steps (y_n_steps/m_n_steps).
The orchestrator enforces that both are positive before pipeline execution.
Practical constraints#
Water boxes require
buffer_x/y/z >= 10 Å; the validator will reject smaller padding to avoid vacuum artifacts. For membranes, automatic Z padding is applied if needed.Resume semantics rely on
run_idplus the stored configuration signature (onlycreateandfe_simfields). Changing execution knobs underrunwill not trigger a new run_id, so bump the run_id yourself when you want a clean workspace.Lambda overrides: provide a default
lambdaslist and override per component viacomponent_lambdaswhen needed. Missing components inherit the default schedule.
Parameterisation#
Module: batter.param.ligand
batch_ligand_process()– Performs ligand force-field assignment, producing a content-addressed store underartifacts/ligand_params. Used by theparam_ligandshandler to distribute parameter files.
Pipelines and Payloads#
Modules: batter.pipeline.step, batter.pipeline.pipeline,
batter.pipeline.factory, batter.pipeline.payloads
Step– Encapsulates a DAG node withname,requiresand aStepPayload.step.paramsremains as a compatibility alias.Pipeline– Topologically orders steps and invokes the backend throughrun().batter.pipeline.factory– Builds canonical ABFE/ASFE pipelines. Pipelines are expressed in terms ofStepPayloadandSystemParams.batter.pipeline.payloads– Defines the typed payload and system-parameter models. See Typed Pipeline Payloads and System Metadata for details.
Execution Backends#
Modules: batter.exec.base, batter.exec.local, batter.exec.slurm,
batter.exec.handlers.
ExecBackend– Shared protocol implemented by backends.LocalBackend– Runs Python handlers directly (serial or joblib).SlurmBackend– Submits SLURM jobs viaSlurmJobManager.Handler modules under
batter/exec/handlersimplement step-specific logic (system prep, parameterisation, equilibration, FE production, analysis). Each handler receives aStepPayloadand aSimSystem.
Orchestration#
Module: batter.orchestrate.run
run_from_yaml() wires every layer together:
Load the run YAML and apply optional overrides.
Instantiate a system builder inferred from the selected protocol (abfe/md → MABFE, asfe → MASFE; overrides via
run.system_typeremain for backward compatibility).Resolve staged ligands (supporting resume) and regenerate the system if required.
Construct the ABFE/ASFE pipeline using
select_pipeline.Execute parent-only steps (
system_prep,param_ligands).Clone the pipeline for per-ligand execution, injecting the SLURM job manager when needed.
Run phases sequentially, enforcing skip/resume semantics via
batter.orchestrate.markers.Persist FE results using
FEResultsRepository.
Run identifiers and config signatures#
Each execution lives under <output_folder>/executions/<run_id>/. When a run_id
already exists, batter.orchestrate.run._compute_run_signature() compares the
current YAML against the stored signature under artifacts/config/run_config.hash.
Only the simulation inputs are hashed (create and fe_sim/fe); run and
override flags do not affect the signature. A normalized JSON snapshot of the hashed
payload is also written to artifacts/config/run_config.normalized.json to aid
debugging. If the signatures differ and run_id was requested explicitly, the
orchestrator raises unless --allow-run-id-mismatch is set; in auto mode it
will automatically pick a fresh run_id and log a brief diff of the mismatched fields.
Runtime & Portability#
Modules: batter.runtime.portable, batter.runtime.fe_repo.
ArtifactStore– Manages a relocatable manifest of files/directories produced during the run.FEResultsRepository– Indexes and storesFERecordobjects, capturing total ΔG, per-window data, and copies of analysis outputs.
Directory Layout Example#
The structure below illustrates an ABFE execution root (<output_folder>/executions/<run_id>/):
executions/<run_id>/
├── artifacts/
│ ├── config/
│ │ ├── sim_overrides.json
│ │ └── sim.resolved.yaml
│ └── ligand_params/
│ ├── index.json
│ └── LIG1/
│ ├── lig.mol2
│ └── metadata.json
├── simulations/
│ ├── LIG1/
│ │ ├── inputs/ligand.sdf
│ │ └── fe/...
│ └── LIG2/
│ └── ...
├── batter.run.log
└── fe/Results/Results.dat (per-ligand directories once analysis finishes)
Refer to batter.orchestrate.markers for the sentinel files used to detect
completion or failure of each phase.