Skip to content

adnenabdessaied/OCR

Repository files navigation

OCR

This repository implements the OCR branch of the method introduced in the E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text paper published by Busta et al. (paper). We wanted to see how the OCR branch performs on traffic signs.

Pre-training

We have noticed that pre-training the network on synthetic data helps improve the overall performance and speeds up the training process. For this purpose, we used a subset of the synthetic word dataset of the Visual Geometry Group of the University of Oxford (website). Our subset contains roughly 20K training and 2.5K validation images. Here are some samples of these words: alt text alt text alt text

Training

As we mentioned above, we want to recognize writings on traffic signs. Thus, we use images collected from test drives to generate our training data. Such an image can look like this: alt text Then we crop the writings and used them to fine-tune our pre-traied network. E.g. the crops from the last example are alt text alt text alt text

Note that E2E-MLT can learn to recognize multiple languages at the same time. We only trained it for english and german characters.

All the details regarding the hyperparameters, loss function, etc can be found in the paper mentioned at the very beginning.

However, the detection was done seperately beforehand, i.e. our labels include the contour points of each traffic sign within the bigger picutre as well as the ground truth labels.

Evaluation

After training/fine-tuning the network for 10 epochs on our data, we use it for prediction. For the sake of continuity, we use the same image, i.e. the same crops, to give a feeling of what the network does. The figure below summarizes all the steps we discussed so far. alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages