Skip to content

Using PERO OCR for layout and text recognition

Stefan Weil edited this page Oct 22, 2022 · 3 revisions

Examples

These examples run layout recognition and text recognition on all images in directory images.

The first example only creates PAGE XML output:

source ~/src/github/DCGM/venv3.9/bin/activate
python -u ~/src/github/DCGM/pero-ocr/user_scripts/parse_folder.py -c ~/src/github/DCGM/pero-ocr/demo/config.ini -i images --output-xml-path output

The second example creates PAGE and ALTO XML, text output and other output information:

source ~/src/github/DCGM/venv3.9/bin/activate
python -u ~/src/github/DCGM/pero-ocr/user_scripts/parse_folder.py -c ~/src/github/DCGM/pero-ocr/demo/config.ini -i images --output-xml-path out/page --output-alto-path out/alto --output-transcriptions-file-path out/text --output-render-path out/render --output-line-path out/line --output-logit-path out/logit

Working example of config.ini

The file ~/src/github/DCGM/pero-ocr/demo/config.ini contains settings for the OCR process. The directory ~/src/github/DCGM/pero-ocr/demo which contains config.ini also contains the required models.

# OCR config
[PAGE_PARSER]
RUN_LAYOUT_PARSER = yes
RUN_LINE_PARSER = yes
RUN_LINE_CROPPER = yes
RUN_OCR = yes
RUN_DECODER = no

[PARSE_FOLDER]
#INPUT_IMAGE_PATH =
#INPUT_XML_PATH =
#OUTPUT_XML_PATH =
#OUTPUT_LOGIT_PATH =

# Layout detection can be specified in multiple stages
# [LAYOUT_PARSER_X] where X specifies the order of processing.
[LAYOUT_PARSER_1]
# This method uses neural network to detect lines and paragraph.
METHOD = LAYOUT_CNN
# Should the method detect lines
DETECT_LINES = yes
# Should the method detect text regions. This option can be set to “no” when text regions are defined in input Page XML files.
DETECT_REGIONS = yes
# Optionally merges lines with similar horizontal positions inside a text region. This is usually not needed.
MERGE_LINES = no
# Adjust height of existing lines. This can be used only when text lines are specified in input Page XML files.
ADJUST_HEIGHTS = no
# Path to a Pytorch network which processes images and detects lines and paragraph.
MODEL_PATH = ./ParseNet.pb
# Maximum resolution of image which can be processed. The resolution is dynamic and adapts to text size in an image. This option effectively limits processing scale. 5 MPx fits into 5GB of GPU memory.
MAX_MEGAPIXELS = 5
# Fraction of GPU memory which should be allocated to this processing step.
GPU_FRACTION = 0.5
# Set this option to yes if you want to avoid using GPU. CPU processing can be 2-5x slower depending on the CPU type, GPU type and page size.
USE_CPU = no
# Initial image downsampling factor for processing. 4 is generally a good option for most documents and if it is not optimal, the engine changes this downsampling factor based on text size. If your text size if very big, you can increase the this number. If your text is tiny, you can try to decrease it
DOWNSAMPLE = 4
# Do not change this value
PAD = 52
# Higher values result in more detected text lines. Lower values result in less detected text line or possibly in broken text lines. Generally, there should be no need to adjust this parameter
DETECTION_THRESHOLD = 0.2

[LAYOUT_PARSER_2]
# This method orders paragraphs base on simple rules.
METHOD = REGION_SORTER_SMART

[LINE_CROPPER]
INTERP = 2
LINE_SCALE = 1.25
LINE_HEIGHT = 40

[OCR]
# This stage reads text from each cropped text line.
# This is the only supported method at the moment.
METHOD = pytorch_ocr
# This is a path to an OCR configuration file. The content of the OCR configuration file should not be changed.
OCR_JSON = ./ocr_engine.json
Clone this wiki locally