prithivida/grammar_error_correcter_v1 โ€” ONNX INT8 Quantized

This is an ONNX INT8 dynamically quantized version of prithivida/grammar_error_correcter_v1 for grammatical error correction in English.

This is a derivative work. All credit for the original model goes to Prithiviraj Damodaran (@prithivida). This repository only provides a quantized ONNX conversion for easier deployment. We do not claim ownership of the model architecture or weights.

Original Model

Original Repository prithivida/grammar_error_correcter_v1
Original Author Prithiviraj Damodaran (@prithivida)
Architecture T5-base (encoder-decoder)
Parameters 222.9M
Training Data WikiEdits + C4 synthetic + PIE synthetic
License No license specified in original repo

Whatโ€™s in this repo?

  • encoder_model.onnx โ€” Encoder graph (INT8 quantized)
  • decoder_model.onnx โ€” Decoder graph (INT8 quantized)
  • decoder_with_past_model.onnx โ€” Decoder with KV-cache (INT8 quantized)
  • config.json, generation_config.json โ€” Model configuration
  • tokenizer.json, tokenizer_config.json, special_tokens_map.json โ€” Tokenizer files

Quantization Details

Step Detail
Export PyTorch โ†’ ONNX FP32 via HuggingFace Optimum (ORTModelForSeq2SeqLM)
Quantization onnxruntime.quantization.quantize_dynamic with QuantType.QInt8
Validation Verified ONNX FP32 output matches PyTorch FP32 exactly (0 delta) before quantizing

Benchmark Results โ€” This Model

Benchmarked on a 50-sentence English grammar correction test set (2-thread CPU, Google Colab).

Quality

Format BLEU chrF++ Exact Match ฮ” BLEU ฮ” chrF++
PyTorch FP32 (baseline) 76.88 86.40 58% โ€” โ€”
ONNX INT8 (this repo) 74.44 84.79 54% -2.44 -1.61

Speed & Size

Metric PyTorch FP32 ONNX INT8 (this repo)
P50 Latency 602 ms 222 ms
P95 Latency 801 ms 455 ms
Disk Size 850 MB 426 MB
Throughput โ€” 16.96 sentences/sec

Quantization Stability

Metric Value
Sentences with changed output (INT8 vs FP32) 6/50 sentences changed
Deployment Scorecard Verdict NO-GO

Largest T5-base model. INT8 shows moderate quality drop exceeding the -2.0 BLEU threshold. Original model is part of the Gramformer library.


Cross-Model Comparison (7 Models Benchmarked)

This model was benchmarked alongside 6 other grammar correction models. All models were evaluated on the same 50-sentence test set under identical conditions.

PyTorch FP32 Baseline (all models)

Model Params BLEU chrF++ P50 (ms) Size (MB) Exact Match
prithivida โ† 222.9M 76.88 86.40 602 850 58%
coedit-small 60.5M 77.42 86.84 226 231 60%
vennify-t5-base 222.9M 78.57 87.87 539 850 58%
aventiq-t5-small 60.5M 34.43 45.88 213 231 16%
visheratin-mini 31.2M 66.84 81.77 133 119 46%
visheratin-tiny 15.6M 66.08 82.69 85 59 48%
pszemraj-small 77.0M 57.11 77.13 363 294 28%

ONNX INT8 Quantized (all models)

Model Params BLEU (FP32) BLEU (INT8) chrF++ (INT8) P50 (ms) Disk (MB) Exact Match Scorecard
prithivida โ† this model 222.9M 76.88 74.44 84.79 222 426 54% NO-GO
coedit-small 60.5M 77.42 77.60 86.69 117 152 60% GO
vennify-t5-base 222.9M 78.57 77.24 86.74 224 426 56% GO
aventiq-t5-small 60.5M 34.43 40.66 51.95 120 152 22% GO*
visheratin-mini 31.2M 66.84 66.65 82.67 78 93 46% GO
visheratin-tiny 15.6M 66.08 64.51 82.28 60 55 44% GO
pszemraj-small 77.0M 57.11 3.15 18.72 138 153 0% NO-GO

Statistical Significance (Bootstrap, 1000 resamples)

Key cross-model comparisons (INT8 variants):

Model A Model B chrF++ A chrF++ B p-value Significant?
coedit-small vennify-t5-base 86.69 86.74 1.000 No (identical)
coedit-small visheratin-mini 86.69 82.67 0.038 Yes
coedit-small visheratin-tiny 86.69 82.28 0.010 Yes
coedit-small prithivida 86.69 84.79 0.324 No
visheratin-mini visheratin-tiny 82.67 82.28 0.824 No (identical)
prithivida vennify-t5-base 84.79 86.74 0.324 No

Recommendation

Use Case Recommended Model Why
Best overall (desktop) coedit-small INT8 (152 MB) Highest INT8 quality (chrF++ 86.69), 117ms latency, only 1 regression
Smallest (mobile/edge) visheratin-tiny INT8 (55 MB) 60ms latency, 55 MB disk, acceptable quality (chrF++ 82.28)
Highest baseline quality vennify-t5-base INT8 (426 MB) chrF++ 86.74, but 3x larger than coedit-small for no significant quality gain

Usage

from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer

model_id = "YOUR_USERNAME/prithivida-grammar-correcter-onnx-int8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSeq2SeqLM.from_pretrained(model_id)

text = "gec: She go to school yesterday"
inputs = tokenizer([text], return_tensors="pt", max_length=128, truncation=True)
outputs = model.generate(**inputs, max_new_tokens=128, num_beams=1, repetition_penalty=1.3)
corrected = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(corrected)

Input prefix: gec:

Acknowledgments

All credit for the original model goes to Prithiviraj Damodaran (@prithivida). This repository only provides an ONNX INT8 quantized conversion to make the model easier to deploy in production environments (mobile, edge, browser).

Quantization and benchmarking performed as part of the Smart Desktop Keyboard Grammar Engine project.

Citation

If you use this model, please cite the original authors:

@misc{prithivida_onnx_int8,
  title = {prithivida/grammar_error_correcter_v1 โ€” ONNX INT8 Quantized},
  note = {Quantized version of prithivida/grammar_error_correcter_v1},
  url = {https://huggingface.co/prithivida/grammar_error_correcter_v1},
}
Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TonyRaju/prithivida-grammar-correcter-onnx-int8

Quantized
(3)
this model