Ligand Parameterisation#
BATTER ships a lightweight parameterisation toolkit that converts staged ligand
inputs into a content-addressed store of AMBER or OpenFF artefacts. The main
entry point is batter.param.ligand.batch_ligand_process(), which produces
GAFF/GAFF2 mol2/frcmod/lib bundles or OpenFF prmtop files that
can be reused across simulations.
Typical usage#
from batter.param.ligand import batch_ligand_process
hashes, metadata = batch_ligand_process(
ligand_paths={
"ligA": "ligands/adp.sdf",
"ligB": "ligands/amp.mol2",
},
output_path="cache/ligands",
ligand_ff="gaff2",
charge_method="am1bcc",
)
print("Prepared hashes:", hashes)
print("Canonical SMILES:", metadata["ligands/adp.sdf"][1])
API Reference#
Ligand parameterisation helpers for GAFF/GAFF2 and OpenFF workflows.
- class batter.param.ligand.LigandFactory[source]#
Bases:
objectFactory that chooses the appropriate loader/processor by file extension.
- create_ligand(ligand_file: str | Path, index: int, output_dir: str | Path, ligand_name: str | None = None, charge: str = 'am1bcc', retain_lig_prot: bool = True, ligand_ff: str = 'gaff2', unique_mol_names: List[str] | None = None) LigandProcessing[source]#
Instantiate a concrete
LigandProcessingsubclass.- Parameters:
ligand_file, index, output_dir, ligand_name, charge, retain_lig_prot,
ligand_ff, unique_mol_names – Forwarded to the underlying processor.
- Returns:
Processor configured for the detected file type.
- Return type:
- Raises:
ValueError – If the file extension is unsupported.
- class batter.param.ligand.LigandProcessing(ligand_file: str | Path, index: int, output_dir: str | Path, ligand_name: str | None = None, charge: str = 'am1bcc', retain_lig_prot: bool = True, ligand_ff: str = 'gaff2', unique_mol_names: List[str] | None = None)[source]#
Bases:
ABCBase class for ligand processing and parameterization.
It loads a ligand, determines a unique residue/name, estimates the charge, and generates AMBER/OpenFF parameters.
- Parameters:
ligand_file – Input ligand path (SDF/MOL2/PDB depending on subclass).
index – 1-based index used for stable name generation.
output_dir – Output folder for generated files.
ligand_name – Optional preferred name; will be uniquified to 3 chars.
charge – Charge method for OpenFF pre-charge or quick estimate (e.g.,
"am1bcc").retain_lig_prot – If
True, keep hydrogen atoms from input.ligand_ff – One of
"gaff"or"gaff2"or an OpenFF release like"openff-2.2.0".unique_mol_names – Existing names to avoid collisions.
- Variables:
ligand_object (SmallMoleculeComponent)
openff_molecule (Molecule)
ligand_charge (float) – Estimated total charge (integer).
atomnames (list[str]) – Atom names extracted from generated PDB (AMBER path).
- fetch_from_existing_db(database: str | Path) bool[source]#
Search and copy ligand artifacts from a local database.
- Parameters:
database – Directory containing
<name>.(frcmod|lib|prmtop|inpcrd|mol2|pdb|json|sdf).- Returns:
Trueif a full, matching entry was found and copied.- Return type:
bool
- property ligand_sdf_path: str#
Path to the canonicalised SDF stored on disk.
- Type:
str
- property name: str#
Three-character residue name used for generated artifacts.
- Type:
str
- prepare_ligand_parameters() None[source]#
Generate parameters using either AMBER (GAFF/GAFF2) or OpenFF path.
Notes
OpenFF path first creates AMBER artifacts for tleap-based system build.
Writes a
<name>.jsonmetadata file to the output folder.
- prepare_ligand_parameters_amberff(charge_method: str = 'bcc') None[source]#
Prepare ligand parameters using AMBER (GAFF/GAFF2): mol2/frcmod/lib/prmtop.
- Parameters:
charge_method – Antechamber charge method (e.g.,
"bcc"or"gas").
- prepare_ligand_parameters_openff() None[source]#
Prepare ligand parameters using OpenFF toolkit (and AMBER bootstrap).
Behavior#
Runs a fast AMBER bootstrap (GAFF2 + gas charges) so tleap artifacts exist.
Generates an OpenFF prmtop for downstream if you prefer OpenMM/OpenFF.
- property smiles: str#
Canonical SMILES with explicit hydrogens.
- Type:
str
- class batter.param.ligand.MOL2_LigandProcessing(ligand_file: str | Path, index: int, output_dir: str | Path, ligand_name: str | None = None, charge: str = 'am1bcc', retain_lig_prot: bool = True, ligand_ff: str = 'gaff2', unique_mol_names: List[str] | None = None)[source]#
Bases:
LigandProcessing
- class batter.param.ligand.PDB_LigandProcessing(ligand_file: str | Path, index: int, output_dir: str | Path, ligand_name: str | None = None, charge: str = 'am1bcc', retain_lig_prot: bool = True, ligand_ff: str = 'gaff2', unique_mol_names: List[str] | None = None)[source]#
Bases:
LigandProcessing
- class batter.param.ligand.SDF_LigandProcessing(ligand_file: str | Path, index: int, output_dir: str | Path, ligand_name: str | None = None, charge: str = 'am1bcc', retain_lig_prot: bool = True, ligand_ff: str = 'gaff2', unique_mol_names: List[str] | None = None)[source]#
Bases:
LigandProcessing
- batter.param.ligand.batch_ligand_process(ligand_paths: Sequence[str | Path] | Mapping[str, str | Path], output_path: str | Path, retain_lig_prot: bool = True, ligand_ph: float = 7.0, ligand_ff: str = 'gaff2', charge_method: str = 'am1bcc', overwrite: bool = False, run_with_slurm: bool = False, max_slurm_jobs: int = 50, run_with_slurm_kwargs: Dict[str, Any] | None = None, job_extra_directives: List[str] | None = None, on_failure: Literal['prune', 'retry', 'raise'] | None = None) Tuple[List[str], Dict[str, Tuple[str, str]]][source]#
Parameterise ligands into a content-addressed store.
Artifacts for each ligand are written under:
<output_path>/<hash_id>/*
where
hash_id = sha256(canonical_smiles + ligand_ff + retain).hexdigest()[:12].- Parameters:
ligand_paths – List of file paths or mapping {alias: path}. Only the file path affects hashing.
output_path – Output directory for the content-addressed store.
retain_lig_prot – Whether to retain hydrogens from inputs.
ligand_ph – Target protonation pH (reserved for future use).
ligand_ff – Force field (‘gaff’/’gaff2’ or a valid OpenFF release name).
charge_method – Charge method for ligand.
overwrite – If True, re-parameterize even if <hash_id> already exists.
run_with_slurm – If True, distribute parametrization with Dask+SLURM (same behavior as before).
max_slurm_jobs, run_with_slurm_kwargs, job_extra_directives – SLURM/Dask configuration.
- Returns:
list of str – Hash identifiers in processing order (duplicates preserved).
dict – Mapping from the provided input path to
(hash_id, canonical_smiles).
Caching and validation#
Ligand artifacts are content-addressed: input coordinates plus force-field/charge
settings are hashed, so rerunning batch_ligand_process with the same inputs reuses
cached mol2/frcmod/lib bundles instead of recomputing. Charge assignment
errors and missing protonation states surface as exceptions; callers should surface
those errors up the pipeline rather than silently skipping ligands. Validation also
canonicalises SMILES so cache keys stay stable across input formats.
Output layout#
By default, outputs land under the provided output_path in per-ligand folders
that include the hash. Metadata (canonical SMILES, charge method, parameter files)
is returned to the caller so builders can record it in SystemMeta and reuse the
same parameter set across runs.