Neural Machine Translation

This repository provides an implementation of the paper Neural Machine Translation by Jointly Learning to Align and Translate. The script is designed to handle any language pair for translation tasks and also support for distributed GPU training using Lightning AI.

Usage

1. Install Dependencies

sudo apt-get install -y fonts-noto-core fonts-noto-unhinted fonts-noto-hinted libcairo2 libcairo2-dev libpango1.0-dev libglib2.0-dev
pip install -r requirements.txt

2. Prepare the Dataset

Note

The dataset script reads the dataset file in TSV format.

Foe English to Nepali dataset, you can download the dataset from here, and obtain the Nepali font from here.

3. Train the Model

Before starting, add your Comet ML API key and project name to the .env file.

Argument	Description	Default Value
`-d`, `--device`	Device for training	`cuda`
`-g`, `--gpus`	Number of GPUs per node	`1`
`-w`, `--num_workers`	Number of data loading workers	`8`
`-db`, `--dist_backend`	Distributed backend for multi-GPU training	`ddp`
`--train_path`	Path to the training dataset (TSV format)	None (required)
`--valid_path`	Path to the validation dataset (CSV format)	None (required)
`--font_path`	Path to the font file (.ttf), useful for Devanagari scripts	None (required)
`--input_lang`	Source language	`np`
`--output_lang`	Target language	`en`
`--reverse`	Reverse source and target languages	`False`
`--max_len`	Maximum sequence length	`12`
`--min_len`	Minimum sequence length	`2`
`-mt`, `--model_type`	Model type: `lstm` or `gru`	`lstm`
`-hs`, `--hidden_size`	Model hidden size	`128`
`-nl`, `--num_layers`	Number of model layers	`2`
`-bd`, `--bidirection`	Use bidirectional model	`False`
`-at`, `--attention`	Use attention mechanism	`False`
`--epochs`	Number of training epochs	`100`
`--batch_size`	Training batch size	`64`
`-lr`, `--learning_rate`	Initial learning rate	`4e-4`
`-lrf`, `--lr_factor`	Factor for learning rate decay	`0.5`
`-lrp`, `--lr_patience`	Learning rate patience for decay	`1`
`-mlt`, `--min_lr_threshold`	Minimum learning rate threshold	`1e-2`
`--precision`	Precision mode	`32-true`
`--checkpoint_path`	Path to a checkpoint file for resuming training	None
`-gc`, `--grad_clip`	Gradient norm clipping value	`1.0`
`-ag`, `--acc_grad`	Number of batches for gradient accumulation	`2`

python3 train.py \
  -d cuda -w 4 -g 2 -db ddp \
  --input_lang ne --output_lang en \
  --train_path train.tsv --valid_path valid.tsv --font_path custom_font.otf \
  --batch_size 64 -ag 2 --epochs 50 --max_len 16 --min_len 4 -lr 4e-3 \
  -hs 128 --nl 2 --mt lstm --attention --bidirection

Note

Use the --reverse flag if you want to train the model with reversed source and target languages.
Insert the path of checkpoint using --checkpoint path flag if you want to resume training with saved weights. However, note that max_len and min_len must match to ensure consistency between input and output parameters.

Experiment Results

Model	Language Pair	Max Length	Min Length	Train Set	Validation Set	GPUs	Effective Batch Size
Base Models	General	16	4	~500k	~90k	2×T4	2×128
Final Model	Nepali → English	24	4	~760k	~135k	2×T4	2×128
Final Model	English → Nepali	20	5	~685k	~120k	2×T4	2x64x2

Models	Encoder	Decoder	Attention
Base Model 1	GRU	GRU	❌
Base Model 2	LSTM	LSTM	✅
Base Model 3	BiLSTM	LSTM	✅
Final Model	BiLSTM	LSTM	✅

Note

The final model for English → Nepali translation was filtered with max_len=20 and min_len=5, due to GPU memory constraints, resulting in ~685k training pairs and ~120k validation pairs.
- Batch size: 64
- Gradient accumulation: 2 (resulting in an effective batch size of 128)
The SacreBLEU score (4-gram) was used as the primary evaluation metric.

Important

Teacher forcing was applied during training and validation to stabilize learning and prevent error propagation, as cross-entropy loss struggles with variable-length sequences.
For final validation (test), inference was performed without teacher forcing to simulate real-world performance.

Nepali → English

Test Inference Results

Source	Ground Truth	Translated	BLEU Score
नयाँ सरकारले देशमा आर्थिक विकासका लागि निजी क्षेत्रसँग हातेमालो गर्ने छ 'दाहालले भने ।	the new government will work hand in hand with the private sector for economic development in the country dahal said .	the new government will handle the private sector for economic development in the country dahal said .	65.46
थुलियम एक पेरियोडीक टेबलमा पर्ने रासायनिक तत्त्व हो ।	thulum is a chemical element on a periodic table .	the is a chemical element in a periodic table .	52.54
आर्थिक समस्या पर्दैमा आफ्ना छोराछोरी वा परिवारजनलाई बेच्न सक्दैनौं ।	we cannot sell our children or family if there is financial problems .	we cannot sell their children or family if the financial problem is .	30.58
बजेट तथा कार्यक्रम तर्जुमा गर्दा अपनाउनु पर्ने अन्य कार्यविधि तोकिए बमोजिम	the other procedures to be followed in the preparation of budget and programme shall be as prescribed .	other procedures relating to the implementation of the budget and programmes shall be as prescribed .	28.39

Attention Plots

Translation 1	Translation 2

Translation 3	Translation 4

English → Nepali

Test Inference Results

Source	Ground Truth	Translated	BLEU Score
the only person to die was the bomber .	मर्ने व्यक्ति मात्र बमबर थियो ।	मर्ने व्यक्ति मात्र बमबरको थियो ।	0
the daughter was found alive but the son was not yet untouched	छोरी त जिउँदै भेटिएकी थिई तर छोराको अझै अत्तोपत्तो थिएन	छोरी जिवित भेटियो तर छोरा अझै अछुतो थिएन	0
it was a total loss said ken smith whose company sells second hand motorboats .	यो कुल नोक्सान थियो केन स्मिथले भन्नुभयो जसको कम्पनीले दोस्रो हाते मोटरबोट बिक्री गर्दछ ।	यो कुल नोक्सान थियो केन स्मिथले दोस्रो हाते सेयरहरू गुमाउँछ ।	37.18
subsidies are also necessary to introduce new technology but this function is poorly incorporated in our system .	नयाँ प्रविधि परिचय गर्न सहायक पनि आवश्यक छ तर यो कार्य हाम्रो प्रणालीमा गरिब रूपमा समावेश गरिएको छ ।	नयाँ प्रविधि परिचय गर्न पनि आवश्यक छ तर यो कार्य हाम्रो प्रणालीमा गरिब रूपमा छ ।	65.13

Attention Plots

Translation 1	Translation 2

Translation 3	Translation 4

Citation

@inproceedings{bahdanau2015neural,
  title={Neural Machine Translation by Jointly Learning to Align and Translate},
  author={Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
  booktitle={3rd International Conference on Learning Representations, {ICLR} 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings},
  year={2015},
  archivePrefix={arXiv},
  eprint={1409.0473},
  primaryClass={cs.CL},
  url={https://doi.org/10.48550/arXiv.1409.0473}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Machine Translation

Usage

1. Install Dependencies

2. Prepare the Dataset

3. Train the Model

Experiment Results

Nepali → English

Test Inference Results

Attention Plots

English → Nepali

Test Inference Results

Attention Plots

Citation

About

Releases

Packages

Languages

License

LuluW8071/Neural-Machine-Translation

Folders and files

Latest commit

History

Repository files navigation

Neural Machine Translation

Usage

1. Install Dependencies

2. Prepare the Dataset

3. Train the Model

Experiment Results

Nepali → English

Test Inference Results

Attention Plots

English → Nepali

Test Inference Results

Attention Plots

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages