Statistical Machine Translator

As part of the Information Retrieval Course (BITS CS F469) we have built a lexical cross-language translator. It uses Statistical Machine Translation model heavily inspired by the IBM Model 1. Statistical Machine Translation is an empirical machine translation technique using which translations are generated on the basis of statistical models trained on bilingual text corpora. Our model can translate a document between Dutch and English.

To translate a document and test the model's performance

Download the repository
Open the terminal/command prompt and cd to the downloaded repository

Run the python script "testing.py"

 """ python testing.py """
 NOTE: Use Python3

The interactive command line would give you the further instructions
On completion this will give you - Total number of word pairs - Cosine similarity - Jaccard coefficient

Improvement on the IBM Model

IBM Model would create a dictionary of all possible pairs of English and Dutch words. But this will consume a lot of space and time to compute. For reference traditional IBM model was creating 1,39,54,090 words pairs when trained on only 1000 lines but my optimised model creates only 1,10,15,619 even when trained on 1,00,000 lines.

I am only considering those English-Dutch word pairs that occur in some sentence pair. This does not affect the accuracy of the model because the eliminated word pairs would have had a translation probability of 0 anyways.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Testing_Data		Testing_Data
Trained_Model		Trained_Model
Training_Data		Training_Data
.gitignore		.gitignore
Design Document.pdf		Design Document.pdf
IR Assignment.pdf		IR Assignment.pdf
Number_of_word_pairs.png		Number_of_word_pairs.png
README.md		README.md
Result_Document.pdf		Result_Document.pdf
Vocab_size.png		Vocab_size.png
plots.ipynb		plots.ipynb
preprocess.py		preprocess.py
testing.py		testing.py
training.ipynb		training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Statistical Machine Translator

To translate a document and test the model's performance

Improvement on the IBM Model

About

Uh oh!

Releases

Packages

Languages

MananAgarwal/Statistical-Machine-Translator

Folders and files

Latest commit

History

Repository files navigation

Statistical Machine Translator

To translate a document and test the model's performance

Improvement on the IBM Model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages