It mainly comes in use when the reader is reading Novels, Stories or anything which contains large set of paragraphs. The reader can take an image of the paragraph and input it to the model. And the model will result in a summary of that paragraph. Basically, the model makes reading easy and time saving for the readers.
- Optical Text Recognition
- Natural Language preprocessing
$ virtualenv venv --python=python3.6
$ source venv/bin/activate
- Pillow
pip3 install Pillow
- Pytesseract
pip3 install pytesseract
- OpenCV
pip3 install opencv-python
- NLTK
pip3 install nltk
python3 ocr.py --image images/story1.jpg > story.txt
This file contains all the text from the image story1.jpg using OCR with pytesseract.
In this file we used python's NLTK for removing stop_words, puctuations. And also word & sentence tokenizers from the NLTK library.
python3 summarize.py story.txt > summary.txt
This file contains the summary of the the text file story.txt.