GLM-4.5-Air-HS-LoRA-CurriculumLearning
A LoRA fine-tuned version of GLM-4.5-Air (108B MoE) trained on the Hyperswitch codebase using Phased Curriculum Learning.
Model Description
This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.
Key Features
- 🎯 Domain-Specific: Trained exclusively on Hyperswitch Rust codebase
- 📚 Curriculum Learning: 3-phase progressive training (Foundation → Evolution → PR Mastery)
Training Details
Hardware Configuration
| Component |
Specification |
| GPUs |
16× NVIDIA H200 (144GB each) |
| Nodes |
2 nodes × 8 GPUs |
| Distributed Strategy |
PyTorch FSDP (Full Shard) |
| Precision |
BF16 Mixed Precision |
LoRA Configuration
| Parameter |
Value |
| LoRA Rank (r) |
128 |
| LoRA Alpha |
256 |
| LoRA Dropout |
0.05 |
| Target Modules |
q_proj, k_proj, v_proj, o_proj |
| Trainable Parameters |
368 tensors |
Training Hyperparameters
| Parameter |
Value |
| Effective Batch Size |
32 (1 × 2 grad_accum × 16 GPUs) |
| Sequence Length |
16,384 tokens |
| Chunk Overlap |
2,048 tokens |
| LR Scheduler |
Cosine |
| Weight Decay |
0.01 |
| Max Grad Norm |
1.0 |
| Precision |
BF16 |
Curriculum Learning Phases
The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:
Phase 1: Foundation (2 epochs)
| Metric |
Value |
| Dataset |
Codebase structure and file patterns |
| Samples |
9,293 train / 512 eval |
| Learning Rate |
2.5e-5 |
| Warmup Ratio |
0.15 |
| Training Time |
12.7 hours |
| Final Eval Loss |
0.365 |
| Final Eval Accuracy |
88.8% |
Phase 2: Evolution (2 epochs)
| Metric |
Value |
| Dataset |
Commit patterns and code changes |
| Samples |
16,622 train / 1,545 eval |
| Learning Rate |
2.0e-5 |
| Warmup Ratio |
0.10 |
| Training Time |
24.7 hours |
| Final Eval Loss |
2.55 |
| Final Eval Accuracy |
40.8% |
Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.
Phase 3: PR Mastery (1 epoch)
| Metric |
Value |
| Dataset |
Pull request and review patterns |
| Samples |
9,797 train / 509 eval |
| Learning Rate |
1.5e-5 |
| Warmup Ratio |
0.05 |
| Training Time |
6.9 hours |
| Final Eval Loss |
0.501 |
| Final Eval Accuracy |
90.2% |
Training Summary
| Metric |
Value |
| Total Training Time |
44.9 hours |
| Total Steps |
1,926 |
| Total Epochs |
5 (2 + 2 + 1) |
| Initial Train Loss |
0.592 |
| Final Train Loss |
0.495 |
| Final Perplexity |
1.65 |
Citation
If you use this model, please cite:
@misc{glm45air-hs-lora-curriculum,
title = {GLM-4.5-Air-HS-LoRA-CurriculumLearning},
author = {Aditya Narayan},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning}
}
Acknowledgments
- Base model: GLM-4.5-Air by Zhipu AI
- Training framework: PyTorch FSDP + PEFT
- Dataset: Hyperswitch open-source repository by Juspay