starmpcc/Asclepius-Synthetic-Clinical-Notes
Viewer β’ Updated β’ 158k β’ 781 β’ 113
How to use GuyDor007/MediSimplifier-LoRA-Adapters with PEFT:
Task type is invalid.
Fine-tuned LoRA adapters that simplify medical discharge summaries to 6th-grade reading level for patient comprehension.
| Model | Base Model | ROUGE-L | SARI | BERTScore | FK-Grade | Improvement |
|---|---|---|---|---|---|---|
| openbiollm_8b_lora π | Llama3-OpenBioLLM-8B | 0.6749 | 74.64 | 0.9498 | 7.16 | +157.3% |
| mistral_7b_lora | Mistral-7B-Instruct-v0.2 | 0.6491 | 73.79 | 0.9464 | 6.91 | +65.9% |
| biomistral_7b_dare_lora | BioMistral-7B-DARE | 0.6318 | 73.01 | 0.9439 | 6.95 | +53.3% |
Key Achievement: ~50% readability reduction (FK 14.5 β ~7.0), matching ground truth quality.
βββ openbiollm_8b_lora/ # Best overall quality
βββ mistral_7b_lora/ # Best readability (FK 6.91)
βββ biomistral_7b_dare_lora/ # Medical domain baseline
βββ checkpoints/
βββ full_training_llama3/ # Epochs 1-3 (checkpoint-500/1000/1500)
βββ full_training_mistral/ # Epochs 1-3
βββ full_training_biomistral/ # Epochs 1-3
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model + adapter (example: OpenBioLLM - best performer)
base_model = AutoModelForCausalLM.from_pretrained(
"aaditya/Llama3-OpenBioLLM-8B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "GuyDor007/MediSimplifier-LoRA-Adapters/openbiollm_8b_lora")
tokenizer = AutoTokenizer.from_pretrained("aaditya/Llama3-OpenBioLLM-8B")
# Inference
SYSTEM_MESSAGE = "You are a helpful medical assistant that simplifies complex medical text for patients."
TASK_INSTRUCTION = """Simplify the following medical discharge summary in plain language for patients with no medical background.
Guidelines:
- Replace medical jargon with everyday words (e.g., "hypertension" β "high blood pressure")
- Keep all important information (diagnoses, medications, follow-up instructions)
- Use short, clear sentences (aim for 15-20 words per sentence)
- Aim for a 6th-grade reading level
- Maintain the same structure as the original
- Do not add or omit information"""
# ChatML format for OpenBioLLM
prompt = f"""<|im_start|>system
{SYSTEM_MESSAGE}<|im_end|>
<|im_start|>user
{TASK_INSTRUCTION}
{complex_medical_text}<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
simplified = tokenizer.decode(outputs[0], skip_special_tokens=True)
# For mistral_7b_lora or biomistral_7b_dare_lora
base_model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.2", # or "BioMistral/BioMistral-7B-DARE"
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "GuyDor007/MediSimplifier-LoRA-Adapters/mistral_7b_lora")
# Mistral format
prompt = f"""[INST] <<SYS>>
{SYSTEM_MESSAGE}
<</SYS>>
{TASK_INSTRUCTION}
{complex_medical_text} [/INST]"""
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 32 |
| LoRA Alpha (Ξ±) | 64 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| rsLoRA | True |
| Dropout | 0.05 |
| Epochs | 3 |
| Training Samples | 7,999 |
| Trainable Parameters | 27.3M (0.38%) |
| Learning Rate | 2e-4 (cosine, 3% warmup) |
| Batch Size | 4 (grad accum: 4, effective: 16) |
| Precision | BF16 |
| Max Sequence Length | 2048 |
| Phase | Research Question | Finding |
|---|---|---|
| Rank | Optimal LoRA rank? | r=32 best (contradicts Hu et al. 2021 claim of r=4-8) |
| Modules | Best target modules? | all_attn (q,k,v,o) despite 2x params |
| Data Size | Data efficiency? | More data = better (+5.5-6.6% ROUGE-L from 2Kβ8K) |
| rsLoRA | Scaling method? | Adopted based on literature (Kalajdzievski 2023) |
| Split | Samples |
|---|---|
| Train | 7,999 |
| Validation | 999 |
| Test | 1,001 |
| Metric | Target | Achieved |
|---|---|---|
| ROUGE-L | Higher | 0.63-0.67 |
| SARI | β₯40 | 73-75 β |
| BERTScore | Higher | 0.94-0.95 |
| Flesch-Kincaid | β€6 | 6.9-7.2 (close) |
Training checkpoints saved at each epoch for reproducibility and analysis:
| Checkpoint | Step | Epoch |
|---|---|---|
| checkpoint-500 | 500 | 1 |
| checkpoint-1000 | 1000 | 2 |
| checkpoint-1500 | 1500 | 3 (final) |
@misc{medisimplifier2026,
author = {Dor, Guy and Avraham, Shmulik},
title = {MediSimplifier: LoRA Fine-Tuning for Medical Discharge Summary Simplification},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/GuyDor007/MediSimplifier-LoRA-Adapters}}
}
Apache 2.0
Base model
BioMistral/BioMistral-7B-DARE