BATTER Developer Guide#
This guide provides an up-to-date overview of the internal architecture of BATTER. It targets contributors who need to understand or extend the codebase.
The project powers absolute binding (ABFE), relative binding (RBFE), and solvation (ASFE) free-energy workflows, supports both local execution and SLURM clusters, and packages results in a portable artifact store.
Focus Topics
Code Layout#
batter/ # Modern package (public API, pipelines, builders)
batter_v1/ # Legacy BAT.py compatibility layer (frozen)
docs/ # Sphinx sources (user + developer guides)
examples/ # Reference YAML workflows and restraint templates
tests/ # Pytest suite covering configs, pipelines, exec, etc.
extern/ # Vendored dependencies (editable installs)
devtools/ # Helper scripts + conda envs for development
scripts/ # Misc automation helpers
README.rst # Project overview
TODO # Open engineering tasks / ideas
pyproject.toml / setup.cfg # Build and packaging metadata
environment*.yml # Conda environments for the main stack
The batter/ package itself is organised as:
batter/
├── api.py # Public entry points (run_from_yaml, FE repos, etc.)
├── cli/ # click-based CLI commands (run, fe, fek-schedule, ...)
├── config/ # Pydantic models for run/simulation YAML + helpers
├── systems/ # System descriptors and builders (MABFE / MASFE)
├── _internal/ # Low-level build ops (create_box, restraints, sim files)
├── param/ # Ligand parameterisation helpers
├── pipeline/ # Steps, payloads, pipeline factories
├── exec/ # Local/SLURM backends and step handlers
├── orchestrate/ # High-level orchestration + pipeline wiring
├── runtime/ # Portable artifact store and FE repository
├── analysis/ # Post-processing & convergence utilities
└── utils/ # Shared helpers (Amber wrappers, file ops, etc.)
Further Reading#
The following reference chapters live elsewhere in the docs but are useful when working on internal builders or pipelines:
For Slurm header customisation, see SLURM header templates. For REMD operational details, see REMD submission flow.
High-Level Execution Flow#
A run triggered via batter.orchestrate.run.run_from_yaml() progresses through
the stages below:
Configuration – Parse the top-level YAML into a
RunConfigand resolve the composedSimulationConfig.System build – Use a
SystemBuilder(MABFEBuilderorMASFEBuilder) to stage shared inputs under<output_folder>/executions/<run_id>/.Ligand staging – Copy ligand files under
executions/<run_id>/simulations/<LIG>/inputs.Parameterisation – Run the
param_ligandsstep once to populateexecutions/<run_id>/artifacts/ligand_params.Pipeline construction – Select an ABFE/ASFE pipeline using
select_pipeline().Execution – Drive each pipeline phase on the chosen backend (
LocalBackendorSlurmBackend). Step handlers consume typedStepPayloadobjects.Result packaging – Persist window outputs and summary statistics using
FEResultsRepository, enabling portable analysis.
Configuration Layer#
Module: batter.config
RunSection– Execution controls that include the artifact destination (run.output_folder) and optional builder override (run.system_type), along with backend/dry-run/failure policy knobs and notification settings (run.email_on_completion/run.email_sender).CreateArgs– Inputs required to stage the system (protein, topology, ligands, restraints).RunConfig– Aggregates the sections, exposes helpers such asload(),model_validate_yaml(), andresolved_sim_config(), and resolves relative paths when a YAML is loaded.SimulationConfig– Fully merged simulation specification produced byRunConfig.resolved_sim_config(). The developer-facing configuration never includes this model directly, but the developer guide documents available fields and the protocol-specific validations (e.g., ABFE requiresz_steps*; ASFE requiresy_steps*).
Systems and Builders#
Modules: batter.systems.core, batter.systems.mabfe, batter.systems.masfe
SimSystem– Immutable descriptor of a system on disk. Metadata is stored inSystemMetawhich offers structured accessors andmergesemantics for propagating ligand-specific data.MABFEBuilder– Prepares shared ABFE systems and creates per-ligand children undersimulations/<LIG>/.MASFEBuilder– MASFE counterpart that stages ligands without a protein topology.
Binding (ABFE) components#
ABFE simulations decouple the ligand from the bound complex using the z component
(restraints + decoupling in complex). Provide total production steps via
fe_sim.n_steps (or z_n_steps). Ensure the restraints and lambdas in the run YAML
align with your chosen decoupling scheme.
Solvation (ASFE) components#
ASFE simulations run two FE components:
y– ligand-in-solvent decoupling.m– ligand-in-vacuum decoupling.
Both components require step counts in fe_sim.n_steps (y_n_steps/m_n_steps).
The orchestrator enforces that both are positive before pipeline execution.
Practical constraints#
Water boxes require
buffer_x/y/z >= 10 Å; the validator will reject smaller padding to avoid vacuum artifacts. For membranes, automatic Z padding is applied if needed.Resume semantics rely on
run_idplus the stored configuration signature (onlycreateandfe_simfields). Changing execution knobs underrunwill not trigger a new run_id, so bump the run_id yourself when you want a clean workspace.Lambda overrides: provide a default
lambdaslist and override per component viacomponent_lambdaswhen needed. Missing components inherit the default schedule.
Parameterisation#
Module: batter.param.ligand
batch_ligand_process()– Performs ligand force-field assignment, producing a content-addressed store underartifacts/ligand_params. Used by theparam_ligandshandler to distribute parameter files.
Pipelines and Payloads#
Modules: batter.pipeline.step, batter.pipeline.pipeline,
batter.pipeline.factory, batter.pipeline.payloads
Step– Encapsulates a DAG node withname,requiresand aStepPayload.step.paramsremains as a compatibility alias.Pipeline– Topologically orders steps and invokes the backend throughrun().batter.pipeline.factory– Builds canonical ABFE/ASFE pipelines. Pipelines are expressed in terms ofStepPayloadandSystemParams.batter.pipeline.payloads– Defines the typed payload and system-parameter models. See Typed Pipeline Payloads and System Metadata for details.
Execution Backends#
Modules: batter.exec.base, batter.exec.local, batter.exec.slurm,
batter.exec.handlers.
ExecBackend– Shared protocol implemented by backends.LocalBackend– Runs Python handlers directly (serial or joblib).SlurmBackend– Submits SLURM jobs viaSlurmJobManager.Handler modules under
batter/exec/handlersimplement step-specific logic (system prep, parameterisation, equilibration, FE production, analysis). Each handler receives aStepPayloadand aSimSystem.
Orchestration#
Module: batter.orchestrate.run
run_from_yaml() wires every layer together:
Load the run YAML and apply optional overrides.
Instantiate a system builder inferred from the selected protocol (
abfe/rbfe/md→MABFE,asfe→MASFE; overrides viarun.system_typeremain for backward compatibility).Resolve staged ligands (supporting resume) and regenerate the system if required.
Construct the ABFE/ASFE pipeline using
select_pipeline.Execute parent-only steps (
system_prep,param_ligands).Clone the pipeline for per-ligand execution, injecting the SLURM job manager when needed.
Run phases sequentially, enforcing skip/resume semantics via
batter.orchestrate.markers.Persist FE results using
FEResultsRepository.
Run identifiers and config signatures#
Each execution lives under <output_folder>/executions/<run_id>/. When a run ID
already exists, batter.orchestrate.run_support.compute_run_signature() compares
the current YAML against the stored signature under artifacts/config/run_config.hash.
Only the simulation inputs are hashed (create and fe_sim/fe); run and
override flags do not affect the signature. A normalized JSON snapshot of the hashed
payload is also written to artifacts/config/run_config.normalized.json to aid
debugging. If the signatures differ and run_id was requested explicitly, the
orchestrator raises unless --allow-run-id-mismatch is set; in auto mode it
will automatically pick a fresh run_id and log a brief diff of the mismatched fields.
Runtime & Portability#
Modules: batter.runtime.portable, batter.runtime.fe_repo.
ArtifactStore– Manages a relocatable manifest of files/directories produced during the run.FEResultsRepository– Indexes and storesFERecordobjects, capturing total ΔG, per-window data, and copies of analysis outputs.
Directory Layout Example#
The structure below illustrates an ABFE execution root (<output_folder>/executions/<run_id>/):
executions/<run_id>/
├── artifacts/
│ ├── config/
│ │ ├── sim_overrides.json
│ │ └── sim.resolved.yaml
│ └── ligand_params/
│ ├── index.json
│ └── LIG1/
│ ├── lig.mol2
│ └── metadata.json
├── simulations/
│ ├── LIG1/
│ │ ├── inputs/ligand.sdf
│ │ └── fe/...
│ └── LIG2/
│ └── ...
├── batter.run.log
└── fe/Results/Results.dat (per-ligand directories once analysis finishes)
Refer to batter.orchestrate.markers for the sentinel files used to detect
completion or failure of each phase.