Qwen3-4B Telugu Transliteration
Model Summary
pavanmantha/qwen3-4b-telugu-transliteration is a full fine-tune of Qwen3-4B-Instruct for the task of Telugu-to-Roman (Latin script) transliteration. Given a sentence written in Telugu script, the model outputs its phonetically accurate Roman-alphabet equivalent. It was trained for 1 epoch on ~35k instruction-formatted Telugu–Romanized pairs using DDP across 3× NVIDIA L40S GPUs.
Model Details
Model Description
- Developed by: Pavan Kumar Mantha (pavanmantha)
- Model type: Causal Language Model — full fine-tune (no LoRA/PEFT)
- Language(s): Telugu (
te) → Roman/Latin (enscript) - License: Apache 2.0 (inherited from Qwen3 base)
- Finetuned from:
Qwen/Qwen3-4B-Instruct-2507 - Total parameters: 4.02B (all trainable)
- Model dtype: bfloat16
Model Sources
- Repository: pavanmantha/qwen3-4b-telugu-transliteration
- Training dataset: pavanmantha/telugu_transliteration_40k
Uses
Direct Use
This model is intended for converting Telugu script text into its Roman (Latin alphabet) transliteration. Useful for:
- Search and indexing pipelines that need phonetic normalization
- Text-to-speech preprocessing
- Keyboard input systems for Telugu speakers
- Cross-script NLP research
Downstream Use
Can be integrated into larger NLP pipelines as a preprocessing step — for example, feeding transliterated output into downstream models that work better with Latin-script text.
Out-of-Scope Use
- This model is not a translation model. It performs transliteration (phonetic conversion), not semantic translation from Telugu to English.
- Not suited for zero-shot tasks unrelated to transliteration.
- May produce degraded output for heavily domain-specific or rare vocabulary not represented in the training data.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "pavanmantha/qwen3-4b-telugu-transliteration"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
def transliterate(telugu_text: str) -> str:
messages = [
{
"role": "system",
"content": "You are a Telugu transliteration expert. Convert the given Telugu text written in Telugu script into its Roman (Latin alphabet) transliteration accurately.",
},
{
"role": "user",
"content": telugu_text,
},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=False,
temperature=None,
top_p=None,
)
generated = outputs[0][inputs["input_ids"].shape[-1]:]
return tokenizer.decode(generated, skip_special_tokens=True)
# Example
text = "మానవ పరిణామ ప్రక్రియ యొక్క అవలోకనాన్ని అందించండి."
print(transliterate(text))
# → "manava parinama prakriya yokka avalokananni andinchandi."
Training Details
Training Data
- Dataset:
pavanmantha/telugu_transliteration_40k - Total samples: 43,614
- Train split: 34,891 samples (80%)
- Validation split: 8,723 samples (20%), stratified with
seed=42 - Columns:
text(Telugu script),text_transliterated(Roman script)
Sample:
| text | text_transliterated |
|---|---|
| మానవ పరిణామ ప్రక్రియ యొక్క అవలోకనాన్ని అందించండి. | manava parinama prakriya yokka avalokananni andinchandi. |
Training Procedure
Each sample was formatted as a 3-turn chat using Qwen3's chat template:
System : "You are a Telugu transliteration expert..."
User : <Telugu script text>
Assistant: <Roman transliteration>
Labels were masked to -100 for the system + user prompt tokens so that the cross-entropy loss is computed only on the assistant response (the transliterated output). On average, 20.06% of tokens per sample were unmasked (non-masked ratio: 63 / 314 tokens for the representative sample).
Preprocessing
- Tokenizer: Qwen3 BPE tokenizer,
padding_side="right" - Max sequence length: 512 tokens
- Padding: dynamic, padded to multiple of 8
- Label padding:
-100 - Tokenization workers: 4
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Training regime | bf16 mixed precision |
| Learning rate | 2e-6 |
| LR scheduler | Linear with warmup |
| Warmup steps | 50 |
| Per-device batch size | 4 |
| Gradient accumulation steps | 8 |
| Effective batch size | 96 (4 × 8 × 3 GPUs) |
| Epochs | 1 |
| Steps per epoch | 363 |
| Total steps | 364 |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Gradient checkpointing | ✅ (use_reentrant=False) |
| Optimizer | AdamW (HF Trainer default) |
Training Metrics
All values extracted from the TensorBoard event log.
| Step | Train Loss | Grad Norm | Learning Rate | Epoch |
|---|---|---|---|---|
| 50 | 2.4207 | 4.3750 | 2.00e-6 | 0.138 |
| 100 | 1.1400 | 1.2578 | 2.00e-6 | 0.275 |
| 150 | 0.7542 | 0.8164 | 1.50e-6 | 0.413 |
| 200 | 0.6720 | 0.7148 | 1.50e-6 | 0.550 |
| 250 | 0.6414 | 0.6563 | 1.00e-6 | 0.688 |
| 300 | 0.6273 | 0.5820 | 5.00e-7 | 0.825 |
| 350 | 0.6220 | 0.6758 | ~0 | 0.963 |
| 364 (final) | — | — | — | 1.000 |
Final training summary (step 364):
| Metric | Value |
|---|---|
| Train loss (epoch avg) | 0.9693 |
| Train runtime | 1,433.7 s (~23.9 min) |
| Samples/second | 24.34 |
| Steps/second | 0.254 |
| Total FLOPs | 2.57 × 10¹⁷ |
The loss dropped rapidly from 2.42 → 0.62 in the first 350 steps, demonstrating fast and stable convergence for this transliteration task.
Evaluation
Testing Data
The validation set consists of 8,723 held-out samples from the same pavanmantha/telugu_transliteration_40k dataset, split with seed=42.
Metrics
The primary training objective is cross-entropy loss (on unmasked assistant tokens only). Evaluation loss was tracked every 500 steps via eval_strategy="steps". Additional task-specific metrics (CER, WER, exact match) can be computed post-hoc using the inference snippet above.
Bias, Risks, and Limitations
- The model is trained on a single dataset and may not generalize well to specialized domains (legal, medical, technical Telugu).
- Transliteration conventions vary across regions and systems (e.g., ISO 15919 vs. informal conventions); the model reflects the conventions present in the training data.
- Rare or archaic Telugu vocabulary may be poorly handled.
- The model carries any biases inherent in the underlying Qwen3-4B-Instruct base model.
Recommendations
Users should validate outputs against a known-good reference for high-stakes applications. For best results, inputs should be clean, well-formed Telugu script sentences within the general domain of the training data.
Environmental Impact
- Hardware: 3× NVIDIA L40S (46 GB each)
- Training time: ~23.9 minutes (1,433.7 seconds)
- Cloud provider: Private compute cluster
- Compute region: India
- Estimated CO₂: Minimal — single short epoch on 3 GPUs
Carbon emissions can be estimated using the Machine Learning Impact calculator.
Technical Specifications
Model Architecture
- Architecture: Qwen3ForCausalLM (decoder-only transformer)
- Parameters: 4.02B
- Dtype: bfloat16
- Attention: Standard scaled dot-product (Flash Attention 2 not used in this run)
- Training strategy: Full fine-tune with DDP (DistributedDataParallel) across 3 GPUs
use_cache: Disabled during training
Compute Infrastructure
- GPUs: 3× NVIDIA L40S, 46 GB VRAM each
- Driver: 550.127.05 | CUDA 12.8
- Framework: PyTorch + HuggingFace Transformers + Accelerate
- Launcher:
torchrun --nproc_per_node=3
Software Versions
| Library | Role |
|---|---|
transformers |
Model, Trainer, TrainingArguments |
torch |
DDP, autocast, gradient checkpointing |
accelerate |
Distributed backend |
datasets |
Data loading and tokenization |
tokenizers |
Qwen3 BPE tokenizer |
Citation
If you use this model, please cite the base model and dataset:
@misc{qwen3-4b-telugu-transliteration,
author = {Pavan Kumar Mantha},
title = {Qwen3-4B Telugu Transliteration},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/pavanmantha/qwen3-4b-telugu-transliteration}
}
Model Card Authors
Pavan Kumar Mantha — Distinguished AI Architect, PhD researcher in Generative AI (IIITDM Kurnool), MTech Data Science (BITS Pilani).
Model Card Contact
- Downloads last month
- 6
Model tree for pavanmantha/qwen3-4b-telugu-transliteration
Base model
Qwen/Qwen3-4B-Instruct-2507