EMNLP 2025 main
"Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment"
This model is trained on ChEBI-20 dataset.
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("laituan245/molt5-base", model_max_length=512)
model = T5ForConditionalGeneration.from_pretrained('PhTae/MolBridge-Gen-Base-C2S')
caption = 'The molecule is a monoterpene that is bicyclo[2.2.1]heptane substituted by methy groups at positions 1, 3 and 3. It is a monoterpene, a terpenoidfundamental parent and a carbobicyclic compound.'
caption = 'Provide a molecule based on this description: ' + caption
token = tokenizer(caption, return_tensors='pt', padding='longest', truncation=True)
gen_results = model.generate(input_ids=token['input_ids'],
attention_mask=token['attention_mask'],
num_beams=5,
max_new_tokens=512)
gen_results = tokenizer.decode(gen_results[0], skip_special_tokens=True)
print(gen_results)
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support