Sequence to Sequence Learning with Neural Networks
Paper
β’
1409.3215
β’
Published
β’
3
A sequence-to-sequence neural machine translation model that translates German text to English, built using PyTorch with LSTM encoder-decoder architecture.
This model implements the classic seq2seq architecture from Sutskever et al. (2014) for German-English translation:
German Input β Embedding β LSTM Encoder β Context Vector β LSTM Decoder β Embedding β English Output
Hyperparameters:
<PAD>, <UNK>, <START>, <END>)Training Results (5 epochs):
# This is a custom PyTorch model, not a Transformers model
# Download the files and use with the provided inference script
import requests
from pathlib import Path
# Download model files
base_url = "https://huggingface.co/sumitdotml/seq2seq-de-en/resolve/main"
files = ["best_model.pt", "german_tokenizer.pkl", "english_tokenizer.pkl"]
for file in files:
response = requests.get(f"{base_url}/{file}")
Path(file).write_bytes(response.content)
print(f"Downloaded {file}")
# Interactive mode
python inference.py --interactive
# Single translation
python inference.py --sentence "Hallo, wie geht es dir?" --verbose
# Demo mode
python inference.py
Example Translations:
"Das ist ein gutes Buch." β "this is a good idea.""Wo ist der Bahnhof?" β "where is the <UNK>""Ich liebe Deutschland." β "i share."best_model.pt: PyTorch model checkpoint (trained weights + architecture)german_tokenizer.pkl: German vocabulary and tokenization logicenglish_tokenizer.pkl: English vocabulary and tokenization logicClone the repository:
git clone https://github.com/sumitdotml/seq2seq
cd seq2seq
Set up environment:
uv venv && source .venv/bin/activate # or python -m venv .venv
uv pip install torch requests tqdm # or pip install torch requests tqdm
Download model:
python scripts/download_pretrained.py
Start translating:
python scripts/inference.py --interactive
The model uses a custom implementation with these components:
src/models/encoder.py): LSTM-based encoder with embedding layersrc/models/decoder.py): LSTM-based decoder with attention-free architecture src/models/seq2seq.py): Main model combining encoder-decoder with generation logicEnvironment:
Optimization:
# Full training pipeline
python scripts/data_preparation.py # Download WMT19 data
python src/data/tokenization.py # Build vocabularies
python scripts/train.py # Train model
# For full dataset training, modify data_preparation.py:
# use_full_dataset = True # Line 133-134
If you use this model, please cite:
@misc{seq2seq-de-en,
author = {sumitdotml},
title = {German-English Seq2Seq Translation Model},
year = {2025},
url = {https://huggingface.co/sumitdotml/seq2seq-de-en},
note = {PyTorch implementation of sequence-to-sequence translation}
}
MIT License - See repository for full license text.
For questions about this model or training code, please open an issue in the GitHub repository.