BiosimulantBiosimulant
Docs
Search labs...
Sign inGet Started

Boltz Workflow: Batch Ligand Ranking

About lab

Boltz Workflow: Batch Ligand Ranking

Batch Ligand Ranking is a guided BioSimulant Boltz workflow for comparing a small ligand CSV against one protein target. It runs Boltz-2 once per ligand, extracts binder probability, affinity-like value, confidence metrics, and top-structure artifacts, then ranks the completed candidates into a report-ready table.

This is the first workflow here that is more than a repackaged single Boltz run. It adds CSV intake, repeated execution, result aggregation, ranking, flags, and a batch-specific visualisation table. It is still designed for learning, small-set comparison, and early biological hypothesis generation, not validated drug discovery or final compound selection.

Workflow Status

This lab validates locally, exports as a portable .bsilab package, and has a successful private pre-publication GPU-backed run. It is published on Biosimulant Hub and now uses a multi-stage Compose graph. The graph separates source-backed context, input assembly, Boltz-2 prediction, conservative interpretation, and visual reporting.

Publication checklist:

  • manifest validation passes: complete
  • strict package export passes: complete
  • entrypoints import successfully: complete
  • unit tests pass: complete
  • at least one real GPU run completes: complete
  • run results include structure, affinity, confidence, metadata, and visuals: complete
  • screenshots/assets are captured from the real run: complete
  • Hub workflow card is public and points at the published lab id: complete

Pre-Publication Run Evidence

The current assets and metrics are derived from this private staged remote run:

  • run id: c180d5ab-7d17-4037-8f4a-7b28aaf35a22
  • staged lab id: 3c6200af-bc4a-4041-9714-121954af1520
  • run lab commit: f851d8490e3b725fdb685b1b998d1e96c0f6311191bef82067d60cfdacc55c89
  • remote size: GPU A10G
  • status: completed
  • duration: 873.2 seconds
  • credits settled: 70

Key outputs from the run:

  • completed ligands: 3
  • failed ligands: 0
  • top ligand: Dasatinib
  • top affinity_probability_binary: 0.954538881778717
  • top affinity_pred_value: -2.822425127029419
  • top confidence_score: 0.945051610469818
  • top complex_plddt: 0.9359697103500366
  • top ligand_iptm: 0.981379210948944

Ranked table evidence:

  • rank 1: Dasatinib - binder probability 0.954538881778717, affinity-like value -2.822425127029419, confidence 0.945051610469818
  • rank 2: Nilotinib - binder probability 0.7802920341491699, affinity-like value -0.9997102618217468, confidence 0.9546635746955872
  • rank 3: Imatinib - binder probability 0.6947591304779053, affinity-like value -0.9419443607330322, confidence 0.9453792572021484

Curated Known Example

The bundled known-example mode starts with:

  • target: Human ABL1 kinase domain
  • target source: RCSB PDB 2HYY sequence FASTA
  • ligand CSV examples: Imatinib, Dasatinib, and Nilotinib
  • ligand sources: PubChem CID 5291, 3062316, and 644241
  • protein sequence length: 273 amino acids
  • maximum default ligands per run: 3
  • MSA server usage enabled for the default example

The example is a kinase-inhibitor teaching set. It does not predict patient response, clinical efficacy, resistance, dosing, safety, or therapeutic suitability.

Compose Workflow Graph

The published workflow is intentionally split into real BioSimulant modules:

  1. batch_target_context emits source-backed target, ligand, disease/use-case, provenance, and caveat context.
  2. ligand_library_loader resolves the public protein, ligand, MSA, and run-option inputs into the exact Boltz request.
  3. boltz_batch_ligand_ranker runs the unchanged Boltz-2 scientific wrapper.
  4. ranking_interpreter converts raw Boltz outputs into conservative evidence fields without adding new biological claims.
  5. visualisation renders the 3D structure, confidence/affinity summaries, source context, request traceability, and Q/A caveat cards.

This makes the Compose view match the workflow promise while keeping Boltz-2 as the only predictive scientific model. The surrounding modules are provenance, request assembly, interpretation, and presentation stages.

Inputs

  • protein_sequence: amino-acid sequence for the shared target protein. If omitted, the bundled ABL1 kinase-domain example is used.
  • ligand_csv: CSV text with name and smiles columns. Optional metadata columns are retained in the run context.
  • msa_path: optional path to a precomputed .a3m MSA file.
  • run_options: optional record for workflow/runtime options.

The known example mode works because lab.yaml defines the target and ligand CSV defaults directly on the batch runner model. A new user can click Run without knowing YAML, SMILES formatting details, or Boltz CLI arguments.

Outputs

  • batch_summary: ranked ligand rows with binder probability, affinity-like value, confidence, status, and flags.
  • structure_artifacts: paths to the top-ranked ligand predicted complex structure files, usually mmCIF.
  • affinity_summary: affinity-style outputs for the top-ranked completed ligand.
  • confidence_summary: model confidence outputs for the top-ranked completed ligand.
  • run_metadata: batch execution metadata, per-ligand statuses, output paths, captured logs, and status.

The visualisation model turns these records into standard Biosimulant run visuals, including a top-ranked structure viewer and a batch ranking table.

Ranking Semantics

The ranking table sorts by affinity_probability_binary descending, then affinity_pred_value descending when available. This maps to Boltz-2's distinction between binder-vs-decoy probability and affinity-like prediction.

affinity_probability_binary is most useful as a binder-vs-decoy style signal. In product language, it is the binder probability.

affinity_pred_value is intended for ligand-optimization style use cases. In product language, it is an affinity-like value. It should be used cautiously and comparatively, not as a direct experimental measurement.

Flags are conservative reminders, not decisions. A row marked review pose before follow-up still needs expert inspection. A low-confidence row should not be promoted based only on score.

Safe Use Cases

  • Compare a small known ligand set against one target.
  • Learn how Boltz-2 affinity outputs behave across ligands.
  • Generate early candidate lists for deeper review.
  • Prepare reproducible computational biology reports.
  • Teach structure-based screening concepts.

Do Not Claim

  • Validated drug discovery.
  • Clinical prediction.
  • Medical diagnosis.
  • Final compound selection.
  • Wet-lab replacement.
  • FEP replacement.
  • Experimentally guaranteed binding-affinity certainty.

Assets

The current screenshots were captured from the successful private pre-publication GPU run above, using its persisted mmCIF structure artifact, ligand-ranking output, and parsed run metrics.

Boltz-2 predicted protein-ligand complex structure

Boltz-2 affinity and confidence summary metrics

Implementation Notes

This workflow contains a batch-specific core model:

  • models/core/src/boltz2_batch_ligand_ranker.py: CSV parsing, repeated Boltz execution, ranking, flags, top-ligand output selection.
  • models/core/src/boltz2_affinity_predictor.py: the reused single-ligand Boltz-2 runner.
  • models/visualisation/src/docking_visualisation.py: batch-aware top-structure and ranked-table visualisation.

Guided Boltz-2 workflow for ranking a small ligand CSV against one protein target using binder probability, affinity-like value, confidence metrics, flags, and report-ready output.

Runtime

Duration0.01
Comms Step0.01
Settle Steps1

Runs

Total0
Completed0
Failed0

Metadata

Packageboltz-workflow-batch-ligand-ranking
Created2026-05-23
Updated2026-05-23
boltz-workflowboltzprotein-ligandaffinitystructural-biologyguided-workflowbatch-rankinggpucontextprovenanceinput-assemblyinterpretationdockingvisualisationother

Manifest

{
  "io": {
    "inputs": [
      {
        "name": "protein_sequence",
        "label": "ABL1 Kinase Sequence",
        "maps_to": "ligand_library_loader.protein_sequence",
        "description": "Amino-acid sequence for the shared ABL1 kinase-domain target; maps directly to the Boltz-2 protein input for each ligand run."
      },
      {
        "name": "ligand_csv",
        "label": "Candidate Ligand CSV",
        "maps_to": "ligand_library_loader.ligand_csv",
        "description": "CSV text with candidate ligand names and SMILES strings; each row is passed to Boltz-2 as a separate ligand input."
      },
      {
        "name": "msa_path",
        "label": "Precomputed MSA Path",
        "maps_to": "ligand_library_loader.msa_path",
        "description": "Optional path to a precomputed MSA file; leave unset when the configured Boltz MSA server is used."
      },
      {
        "name": "run_options",
        "label": "Boltz Batch Run Options",
        "maps_to": "ligand_library_loader.run_options",
        "description": "Optional structured controls for the batch workflow, including source metadata, sampling settings, and interpretation scope."
      }
    ],
    "outputs": [
      {
        "name": "scenario_context",
        "label": "Workflow Scenario Context",
        "maps_to": "batch_target_context.scenario_context",
        "description": "Source-backed target, ligand, use-case, provenance, and caveat context for this Boltz workflow."
      },
      {
        "name": "assembled_boltz_request",
        "label": "Assembled Boltz Request",
        "maps_to": "ligand_library_loader.assembled_boltz_request",
        "description": "Traceable summary of the protein, ligand, MSA, and run options prepared for Boltz."
      },
      {
        "name": "batch_summary",
        "label": "Ranked Ligand Summary",
        "maps_to": "boltz_batch_ligand_ranker.batch_summary",
        "description": "Ranked ligand table with binder probability, affinity-like value, confidence, status, and caveat flags."
      },
      {
        "name": "affinity_summary",
        "label": "Top Ligand Affinity-Style Summary",
        "maps_to": "boltz_batch_ligand_ranker.affinity_summary",
        "description": "Parsed Boltz-2 binder probability and affinity-like outputs for the top-ranked completed ligand."
      },
      {
        "name": "confidence_summary",
        "label": "Top Ligand Confidence Summary",
        "maps_to": "boltz_batch_ligand_ranker.confidence_summary",
        "description": "Parsed Boltz-2 confidence outputs for the top-ranked predicted complex."
      },
      {
        "name": "structure_artifacts",
        "label": "Top Ligand Structure Artifacts",
        "maps_to": "boltz_batch_ligand_ranker.structure_artifacts",
        "description": "File-backed predicted complex structures for the top-ranked completed ligand."
      },
      {
        "name": "run_metadata",
        "label": "Boltz Batch Run Metadata",
        "maps_to": "boltz_batch_ligand_ranker.run_metadata",
        "description": "Runtime status, per-ligand statuses, command metadata, logs, and caveats for the latest batch invocation."
      },
      {
        "name": "ranking_evidence",
        "label": "Conservative Ranking Evidence",
        "maps_to": "ranking_interpreter.prediction_evidence",
        "description": "Interpreted Boltz output evidence with request traceability and explicit scientific caveats."
      }
    ]
  },
  "tags": [
    "boltz-workflow",
    "boltz",
    "protein-ligand",
    "affinity",
    "structural-biology",
    "guided-workflow",
    "batch-ranking",
    "gpu"
  ],
  "title": "Boltz Workflow: Batch Ligand Ranking",
  "models": [
    {
      "path": "owned/models/batch_target_context",
      "alias": "batch_target_context",
      "parameters": {
        "scenario": {
          "caveat": "Boltz-2 outputs are computational structure and affinity-style predictions for hypothesis generation; they are not experimental binding, potency, selectivity, clinical, or efficacy evidence.",
          "source_pdb": "2HYY",
          "ligand_role": "candidate ligand set",
          "target_name": "Human ABL1 kinase domain",
          "disease_area": "small ligand ranking",
          "target_family": "ABL1 kinase domain",
          "workflow_name": "Batch Ligand Ranking",
          "workflow_context": "Guided Boltz-2 batch ligand ranking workflow using ABL1 kinase-domain examples",
          "workflow_question": "How does a small ligand library rank against the shared ABL1 kinase-domain target?",
          "interpretation_scope": "Learning, small-set ranking, and early hypothesis generation only",
          "protein_sequence_length": 273
        },
        "integration_step": 0.01
      },
      "provenance": {
        "owned_path": "owned/models/batch_target_context"
      }
    },
    {
      "path": "owned/models/ligand_library_loader",
      "alias": "ligand_library_loader",
      "parameters": {
        "workflow_kind": "batch",
        "workflow_name": "Batch Ligand Ranking",
        "integration_step": 0.01,
        "default_ligand_csv": "name,smiles,source\nImatinib,CC1=C(C=C(C=C1)NC(=O)C2=CC=C(C=C2)CN3CCN(CC3)C)NC4=NC=CC(=N4)C5=CN=CC=C5,PubChem CID 5291\nDasatinib,CC1=C(C(=CC=C1)Cl)NC(=O)C2=CN=C(S2)NC3=CC(=NC(=N3)C)N4CCN(CC4)CCO,PubChem CID 3062316\nNilotinib,CC1=C(C=C(C=C1)C(=O)NC2=CC(=CC(=C2)C(F)(F)F)N3C=C(N=C3)C)NC4=NC=CC(=N4)C5=CN=CC=C5,PubChem CID 644241\n",
        "default_run_options": {
          "source_pdb": "2HYY",
          "target_name": "Human ABL1 kinase domain",
          "workflow_name": "Batch Ligand Ranking",
          "ligand_examples": [
            "Imatinib",
            "Dasatinib",
            "Nilotinib"
          ],
          "workflow_context": "Guided Boltz-2 batch ligand ranking workflow using ABL1 kinase-domain examples",
          "interpretation_scope": "Learning, small-set ranking, and early hypothesis generation only"
        },
        "default_protein_sequence": "VSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQES"
      },
      "provenance": {
        "owned_path": "owned/models/ligand_library_loader"
      }
    },
    {
      "path": "owned/models/boltz_batch_ligand_ranker",
      "alias": "boltz_batch_ligand_ranker",
      "parameters": {
        "override": true,
        "accelerator": "gpu",
        "max_ligands": 3,
        "runtime_mode": "managed",
        "output_format": "mmcif",
        "sampling_steps": 200,
        "use_msa_server": true,
        "recycling_steps": 3,
        "diffusion_samples": 1
      },
      "provenance": {
        "owned_path": "owned/models/boltz_batch_ligand_ranker"
      }
    },
    {
      "path": "owned/models/ranking_interpreter",
      "alias": "ranking_interpreter",
      "parameters": {
        "mode": "batch",
        "caveat": "Boltz-2 outputs are computational structure and affinity-style predictions for hypothesis generation; they are not experimental binding, potency, selectivity, clinical, or efficacy evidence.",
        "core_alias": "boltz_batch_ligand_ranker",
        "workflow_name": "Batch Ligand Ranking",
        "integration_step": 0.01
      },
      "provenance": {
        "owned_path": "owned/models/ranking_interpreter"
      }
    },
    {
      "path": "owned/models/visualisation",
      "alias": "visualisation",
      "parameters": {
        "mode": "boltz_batch",
        "lab_title": "Boltz Workflow: Batch Ligand Ranking",
        "source_alias": "boltz_batch_ligand_ranker",
        "context_alias": "batch_target_context",
        "assembler_alias": "ligand_library_loader",
        "integration_step": 0.01,
        "interpreter_alias": "ranking_interpreter"
      },
      "provenance": {
        "owned_path": "owned/models/visualisation"
      }
    }
  ],
  "wiring": [
    {
      "to": [
        "ligand_library_loader.scenario_context",
        "ranking_interpreter.scenario_context",
        "visualisation.batch_target_context_scenario_context"
      ],
      "from": "batch_target_context.scenario_context"
    },
    {
      "to": [
        "boltz_batch_ligand_ranker.protein_sequence"
      ],
      "from": "ligand_library_loader.protein_sequence"
    },
    {
      "to": [
        "boltz_batch_ligand_ranker.ligand_csv"
      ],
      "from": "ligand_library_loader.ligand_csv"
    },
    {
      "to": [
        "boltz_batch_ligand_ranker.msa_path"
      ],
      "from": "ligand_library_loader.msa_path"
    },
    {
      "to": [
        "boltz_batch_ligand_ranker.run_options"
      ],
      "from": "ligand_library_loader.run_options"
    },
    {
      "to": [
        "ranking_interpreter.assembled_boltz_request",
        "visualisation.ligand_library_loader_assembled_boltz_request"
      ],
      "from": "ligand_library_loader.assembled_boltz_request"
    },
    {
      "to": [
        "visualisation.boltz_batch_ligand_ranker_batch_summary",
        "ranking_interpreter.boltz_batch_ligand_ranker_batch_summary"
      ],
      "from": "boltz_batch_ligand_ranker.batch_summary"
    },
    {
      "to": [
        "visualisation.boltz_batch_ligand_ranker_affinity_summary",
        "ranking_interpreter.boltz_batch_ligand_ranker_affinity_summary"
      ],
      "from": "boltz_batch_ligand_ranker.affinity_summary"
    },
    {
      "to": [
        "visualisation.boltz_batch_ligand_ranker_confidence_summary",
        "ranking_interpreter.boltz_batch_ligand_ranker_confidence_summary"
      ],
      "from": "boltz_batch_ligand_ranker.confidence_summary"
    },
    {
      "to": [
        "visualisation.boltz_batch_ligand_ranker_structure_artifacts",
        "ranking_interpreter.boltz_batch_ligand_ranker_structure_artifacts"
      ],
      "from": "boltz_batch_ligand_ranker.structure_artifacts"
    },
    {
      "to": [
        "visualisation.boltz_batch_ligand_ranker_run_metadata",
        "ranking_interpreter.boltz_batch_ligand_ranker_run_metadata"
      ],
      "from": "boltz_batch_ligand_ranker.run_metadata"
    },
    {
      "to": [
        "visualisation.ranking_interpreter_prediction_evidence"
      ],
      "from": "ranking_interpreter.prediction_evidence"
    }
  ],
  "runtime": {
    "duration": 0.01,
    "settle_steps": 1,
    "initial_inputs": {},
    "communication_step": 0.01
  },
  "description": "Guided Boltz-2 workflow for ranking a small ligand CSV against one protein target using binder probability, affinity-like value, confidence metrics, flags, and report-ready output.",
  "schema_version": "2.0"
}

Sign in to start your own run. Public-lab history stays visible here.

Select a run from History to view its results.