UBio-MolFM-V1: Universal Bio-Molecular Foundation Model

UBio-MolFM is a foundation model suite for molecular modeling, specifically designed for bio-systems. This model, UBio-MolFM-V1 (Stage 3), is built on the E2Former-V2 linear-scaling equivariant transformer architecture. Refer to the technique report for more details: UBio-MolFM (arXiv:2602.17709).

Model Details

  • Model Type: Equivariant Transformer (E2Former-V2)
  • Training Stage: Stage 3 (Final stage of curriculum learning)
  • Parameters: Included in molfm-v1-stage-3.pt
  • Architecture: Linear-scaling equivariant attention with linear activation memory.
  • Related Papers:
  • Capabilities:
    • Predicts single-point energy and atomic forces.
    • Supports large-scale simulations (up to 1,500 atoms with high fidelity, and up to 100,000 atoms on a single GPU).
    • Optimized for bio-specific molecular systems.

Files

  • molfm-v1-stage-3.pt: Pretrained model checkpoint.
  • config.yaml: Model and inference configuration.

Usage

To use this model, you need to install the molfm codebase. Please refer to the official repository for installation instructions.

Single-Point Energy and Force Prediction

from ase.build import molecule
from molfm.interface.ase.calculator.e2former_calculator import E2FormerCalculator

# 1. Setup atoms
atoms = molecule("H2O")
atoms.set_cell([10, 10, 10])
atoms.pbc = [True, True, True]

# 2. Load the model using the provided checkpoint and config
calc = E2FormerCalculator(
    checkpoint_path="path/to/molfm-v1-stage-3.pt",
    config_name="path/to/config.yaml", # Or local config name if in search path
    head_name="omol25",
    device="cuda",
    use_tf32=True,
    use_compile=True,
)

# 3. Perform calculation
atoms.calc = calc
energy = atoms.get_potential_energy()
forces = atoms.get_forces()

print(f"Energy: {energy} eV")
print(f"Forces:\n{forces}")

Molecular Dynamics with ASE

from ase import units
from ase.md.langevin import Langevin
from ase.md.velocitydistribution import MaxwellBoltzmannDistribution

# Initialize velocities
MaxwellBoltzmannDistribution(atoms, temperature_K=300)

# Setup Langevin integrator
dyn = Langevin(atoms, 1 * units.fs, temperature_K=300, friction=0.01)

# Run MD
dyn.run(100)

Performance Notes

  • TensorFloat-32: Set use_tf32=True to enable TF32 on supported NVIDIA GPUs for higher throughput.
  • Torch Compile: Set use_compile=True to enable torch.compile for faster execution.

Training Data

The model was trained using a three-stage curriculum learning strategy on a combination of datasets:

  • UBio-Mol26: 17M bio-specific molecular dataset. We have released a high-precision subset: UBio-Protein26 (5 million protein DFT data).
  • OMol25: Large-scale molecular dataset.

Citation

If you use UBio-MolFM-V1 in your research, please cite:

@misc{huang2026ubiomolfm,
      title={UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems}, 
      author={Lin Huang and Arthur Jiang and XiaoLi Liu and Zion Wang and Jason Zhao and Chu Wang and HaoCheng Lu and ChengXiang Huang and JiaJun Cheng and YiYue Du and Jia Zhang},
      year={2026},
      eprint={2602.17709},
      url={https://arxiv.org/abs/2602.17709},
      archivePrefix={arXiv},
      primaryClass={physics.chem-ph}
}

@misc{huang2026e2formerv2,
      title={E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory}, 
      author={Lin Huang and Chengxiang Huang and Ziang Wang and Yiyue Du and Chu Wang and Haocheng Lu and Yunyang Li and Xiaoli Liu and Arthur Jiang and Jia Zhang},
      year={2026},
      eprint={2601.16622},
      url={https://arxiv.org/abs/2601.16622},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

This model and the associated code are released under the MIT License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for IQuestLab/IQuest-UBio-MolFM-V1