Part-of-Speech Tagging using Hidden Markov Models
Developed a Hidden Markov Model part-of-speech tagger for Italian, Japanese and Urdu.
- Contains logic for calculating transition, emission and the initial probabilities matrices for tags and words.
- The command-line argument is a single file containing the training data; the program will learn the HMM, and write the model parameters to a file called hmmmodel.txt.
- Contains logic for Viterbi decoding with open/closed class distinction to perform POS tagging.
- Utilizes model weights(transition, emission, initial probabilities matrices) generated by
hmmlearn.py
. - The command-line argument is a single file containing the test data; the program will read the parameters of the HMM from the file hmmmodel.txt, tag each word in the test data, and write the results to a text file called hmmoutput.txt in the same format as the training data.
The learning program will be invoked in the following way:
python hmmlearn.py /path/to/input
The tagging program will be invoked in the following way:
python hmmdecode.py /path/to/input