prithivida/grammar_error_correcter_v1 โ ONNX INT8 Quantized
This is an ONNX INT8 dynamically quantized version of prithivida/grammar_error_correcter_v1 for grammatical error correction in English.
This is a derivative work. All credit for the original model goes to Prithiviraj Damodaran (@prithivida). This repository only provides a quantized ONNX conversion for easier deployment. We do not claim ownership of the model architecture or weights.
Original Model
| Original Repository | prithivida/grammar_error_correcter_v1 |
| Original Author | Prithiviraj Damodaran (@prithivida) |
| Architecture | T5-base (encoder-decoder) |
| Parameters | 222.9M |
| Training Data | WikiEdits + C4 synthetic + PIE synthetic |
| License | No license specified in original repo |
Whatโs in this repo?
encoder_model.onnxโ Encoder graph (INT8 quantized)decoder_model.onnxโ Decoder graph (INT8 quantized)decoder_with_past_model.onnxโ Decoder with KV-cache (INT8 quantized)config.json,generation_config.jsonโ Model configurationtokenizer.json,tokenizer_config.json,special_tokens_map.jsonโ Tokenizer files
Quantization Details
| Step | Detail |
|---|---|
| Export | PyTorch โ ONNX FP32 via HuggingFace Optimum (ORTModelForSeq2SeqLM) |
| Quantization | onnxruntime.quantization.quantize_dynamic with QuantType.QInt8 |
| Validation | Verified ONNX FP32 output matches PyTorch FP32 exactly (0 delta) before quantizing |
Benchmark Results โ This Model
Benchmarked on a 50-sentence English grammar correction test set (2-thread CPU, Google Colab).
Quality
| Format | BLEU | chrF++ | Exact Match | ฮ BLEU | ฮ chrF++ |
|---|---|---|---|---|---|
| PyTorch FP32 (baseline) | 76.88 | 86.40 | 58% | โ | โ |
| ONNX INT8 (this repo) | 74.44 | 84.79 | 54% | -2.44 | -1.61 |
Speed & Size
| Metric | PyTorch FP32 | ONNX INT8 (this repo) |
|---|---|---|
| P50 Latency | 602 ms | 222 ms |
| P95 Latency | 801 ms | 455 ms |
| Disk Size | 850 MB | 426 MB |
| Throughput | โ | 16.96 sentences/sec |
Quantization Stability
| Metric | Value |
|---|---|
| Sentences with changed output (INT8 vs FP32) | 6/50 sentences changed |
| Deployment Scorecard Verdict | NO-GO |
Largest T5-base model. INT8 shows moderate quality drop exceeding the -2.0 BLEU threshold. Original model is part of the Gramformer library.
Cross-Model Comparison (7 Models Benchmarked)
This model was benchmarked alongside 6 other grammar correction models. All models were evaluated on the same 50-sentence test set under identical conditions.
PyTorch FP32 Baseline (all models)
| Model | Params | BLEU | chrF++ | P50 (ms) | Size (MB) | Exact Match |
|---|---|---|---|---|---|---|
| prithivida โ | 222.9M | 76.88 | 86.40 | 602 | 850 | 58% |
| coedit-small | 60.5M | 77.42 | 86.84 | 226 | 231 | 60% |
| vennify-t5-base | 222.9M | 78.57 | 87.87 | 539 | 850 | 58% |
| aventiq-t5-small | 60.5M | 34.43 | 45.88 | 213 | 231 | 16% |
| visheratin-mini | 31.2M | 66.84 | 81.77 | 133 | 119 | 46% |
| visheratin-tiny | 15.6M | 66.08 | 82.69 | 85 | 59 | 48% |
| pszemraj-small | 77.0M | 57.11 | 77.13 | 363 | 294 | 28% |
ONNX INT8 Quantized (all models)
| Model | Params | BLEU (FP32) | BLEU (INT8) | chrF++ (INT8) | P50 (ms) | Disk (MB) | Exact Match | Scorecard |
|---|---|---|---|---|---|---|---|---|
| prithivida โ this model | 222.9M | 76.88 | 74.44 | 84.79 | 222 | 426 | 54% | NO-GO |
| coedit-small | 60.5M | 77.42 | 77.60 | 86.69 | 117 | 152 | 60% | GO |
| vennify-t5-base | 222.9M | 78.57 | 77.24 | 86.74 | 224 | 426 | 56% | GO |
| aventiq-t5-small | 60.5M | 34.43 | 40.66 | 51.95 | 120 | 152 | 22% | GO* |
| visheratin-mini | 31.2M | 66.84 | 66.65 | 82.67 | 78 | 93 | 46% | GO |
| visheratin-tiny | 15.6M | 66.08 | 64.51 | 82.28 | 60 | 55 | 44% | GO |
| pszemraj-small | 77.0M | 57.11 | 3.15 | 18.72 | 138 | 153 | 0% | NO-GO |
Statistical Significance (Bootstrap, 1000 resamples)
Key cross-model comparisons (INT8 variants):
| Model A | Model B | chrF++ A | chrF++ B | p-value | Significant? |
|---|---|---|---|---|---|
| coedit-small | vennify-t5-base | 86.69 | 86.74 | 1.000 | No (identical) |
| coedit-small | visheratin-mini | 86.69 | 82.67 | 0.038 | Yes |
| coedit-small | visheratin-tiny | 86.69 | 82.28 | 0.010 | Yes |
| coedit-small | prithivida | 86.69 | 84.79 | 0.324 | No |
| visheratin-mini | visheratin-tiny | 82.67 | 82.28 | 0.824 | No (identical) |
| prithivida | vennify-t5-base | 84.79 | 86.74 | 0.324 | No |
Recommendation
| Use Case | Recommended Model | Why |
|---|---|---|
| Best overall (desktop) | coedit-small INT8 (152 MB) | Highest INT8 quality (chrF++ 86.69), 117ms latency, only 1 regression |
| Smallest (mobile/edge) | visheratin-tiny INT8 (55 MB) | 60ms latency, 55 MB disk, acceptable quality (chrF++ 82.28) |
| Highest baseline quality | vennify-t5-base INT8 (426 MB) | chrF++ 86.74, but 3x larger than coedit-small for no significant quality gain |
Usage
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer
model_id = "YOUR_USERNAME/prithivida-grammar-correcter-onnx-int8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSeq2SeqLM.from_pretrained(model_id)
text = "gec: She go to school yesterday"
inputs = tokenizer([text], return_tensors="pt", max_length=128, truncation=True)
outputs = model.generate(**inputs, max_new_tokens=128, num_beams=1, repetition_penalty=1.3)
corrected = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(corrected)
Input prefix: gec:
Acknowledgments
All credit for the original model goes to Prithiviraj Damodaran (@prithivida). This repository only provides an ONNX INT8 quantized conversion to make the model easier to deploy in production environments (mobile, edge, browser).
Quantization and benchmarking performed as part of the Smart Desktop Keyboard Grammar Engine project.
Citation
If you use this model, please cite the original authors:
@misc{prithivida_onnx_int8,
title = {prithivida/grammar_error_correcter_v1 โ ONNX INT8 Quantized},
note = {Quantized version of prithivida/grammar_error_correcter_v1},
url = {https://huggingface.co/prithivida/grammar_error_correcter_v1},
}
- Downloads last month
- 36
Model tree for TonyRaju/prithivida-grammar-correcter-onnx-int8
Base model
prithivida/grammar_error_correcter_v1