Part-of-Speech Tagging using Hidden Markov Models

Developed a Hidden Markov Model part-of-speech tagger for Italian, Japanese and Urdu.

HMM Learning

Contains logic for calculating transition, emission and the initial probabilities matrices for tags and words.
The command-line argument is a single file containing the training data; the program will learn the HMM, and write the model parameters to a file called hmmmodel.txt.

Contains logic for Viterbi decoding with open/closed class distinction to perform POS tagging.
Utilizes model weights(transition, emission, initial probabilities matrices) generated by hmmlearn.py.
The command-line argument is a single file containing the test data; the program will read the parameters of the HMM from the file hmmmodel.txt, tag each word in the test data, and write the results to a text file called hmmoutput.txt in the same format as the training data.

The learning program will be invoked in the following way:

python hmmlearn.py /path/to/input

The tagging program will be invoked in the following way:

python hmmdecode.py /path/to/input

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hmmdecode.py		hmmdecode.py
hmmlearn.py		hmmlearn.py
hmmmodel.txt		hmmmodel.txt