Helsinki-NLP/opus-100
Viewer • Updated • 55.1M • 28.8k • 235
A neural machine translation model for English to Polish translation, trained entirely from scratch using the MarianMT architecture.
Note: This model was not fine-tuned from any existing pre-trained model. Both the model weights and the SentencePiece tokenizer were trained from scratch on the parallel corpus.
| Component | Configuration |
|---|---|
| d_model | 768 |
| Encoder layers | 8 |
| Decoder layers | 8 |
| Attention heads | 12 |
| FFN dimension | 3072 |
| Vocabulary size | 32,000 |
| Max position embeddings | 512 |
| Activation function | GELU |
The model was trained on high-quality parallel corpora:
A custom SentencePiece tokenizer (unigram model) was trained on the parallel corpus with:
>>pl<<)from transformers import MarianMTModel, MarianTokenizer
model_name = "pumad/pumatic-en-pl"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
translated = model.generate(**inputs)
output = tokenizer.decode(translated[0], skip_special_tokens=True)
print(output)
from transformers import pipeline
translator = pipeline("translation", model="pumad/pumatic-en-pl")
result = translator("The quick brown fox jumps over the lazy dog.")
print(result[0]['translation_text'])
Try this model live at pumatic.eu
API documentation available at pumatic.eu/docs
Apache 2.0
If you use this model, please cite:
@misc{pumatic-en-pl,
author = {pumad},
title = {Pumatic English-Polish Translation Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/pumad/pumatic-en-pl}
}