GLM-4.5-Air-HS-LoRA-CurriculumLearning

A LoRA fine-tuned version of GLM-4.5-Air (108B MoE) trained on the Hyperswitch codebase using Phased Curriculum Learning.

Model Description

This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.

Key Features

  • 🎯 Domain-Specific: Trained exclusively on Hyperswitch Rust codebase
  • 📚 Curriculum Learning: 3-phase progressive training (Foundation → Evolution → PR Mastery)

Training Details

Hardware Configuration

Component Specification
GPUs 16× NVIDIA H200 (144GB each)
Nodes 2 nodes × 8 GPUs
Distributed Strategy PyTorch FSDP (Full Shard)
Precision BF16 Mixed Precision

LoRA Configuration

Parameter Value
LoRA Rank (r) 128
LoRA Alpha 256
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj
Trainable Parameters 368 tensors

Training Hyperparameters

Parameter Value
Effective Batch Size 32 (1 × 2 grad_accum × 16 GPUs)
Sequence Length 16,384 tokens
Chunk Overlap 2,048 tokens
LR Scheduler Cosine
Weight Decay 0.01
Max Grad Norm 1.0
Precision BF16

Curriculum Learning Phases

The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:

Phase 1: Foundation (2 epochs)

Metric Value
Dataset Codebase structure and file patterns
Samples 9,293 train / 512 eval
Learning Rate 2.5e-5
Warmup Ratio 0.15
Training Time 12.7 hours
Final Eval Loss 0.365
Final Eval Accuracy 88.8%

Phase 2: Evolution (2 epochs)

Metric Value
Dataset Commit patterns and code changes
Samples 16,622 train / 1,545 eval
Learning Rate 2.0e-5
Warmup Ratio 0.10
Training Time 24.7 hours
Final Eval Loss 2.55
Final Eval Accuracy 40.8%

Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.

Phase 3: PR Mastery (1 epoch)

Metric Value
Dataset Pull request and review patterns
Samples 9,797 train / 509 eval
Learning Rate 1.5e-5
Warmup Ratio 0.05
Training Time 6.9 hours
Final Eval Loss 0.501
Final Eval Accuracy 90.2%

Training Summary

Metric Value
Total Training Time 44.9 hours
Total Steps 1,926
Total Epochs 5 (2 + 2 + 1)
Initial Train Loss 0.592
Final Train Loss 0.495
Final Perplexity 1.65

Citation

If you use this model, please cite:

@misc{glm45air-hs-lora-curriculum,
  title = {GLM-4.5-Air-HS-LoRA-CurriculumLearning},
  author = {Aditya Narayan},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning}
}

Acknowledgments

  • Base model: GLM-4.5-Air by Zhipu AI
  • Training framework: PyTorch FSDP + PEFT
  • Dataset: Hyperswitch open-source repository by Juspay
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning

Adapter
(4)
this model

Dataset used to train AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning