using pyrnn.gz in clstm #111

srika91 · 2016-11-15T06:47:42Z

How to use the pyrnn.gz models created in ocropy for prediction in clstm, as clstm prediction seems faster than the ocropy's prediction?

jbaiter · 2016-11-15T07:37:47Z

I don't think it's possible, since pyrnn and clstm use different model definitions:

https://github.com/tmbdev/clstm/blob/master/clstm.proto
https://github.com/mittagessen/kraken/blob/master/proto/pyrnn.proto

Maybe there's a way to convert between the two, but I wouldn't know how :/

zuphilip · 2016-11-15T08:12:03Z

I think to remember @tmbdev mentions somewhere that one has to train the models for CLSTM again from the GT, i.e. they might not really be convertible.

kba · 2016-11-15T09:56:54Z

Have not tried it but there is https://github.com/naptha/ocracy/blob/master/ocropy/pyrnn2clstm.py

jbaiter · 2016-11-15T10:59:41Z

That script converts to the old HDF5-based format, not the new Protobuf-based one, unfortunately :-/
I just had a look at two protobuf models from clstm and from kraken (the fraktur one, which was converted from a pyrnn model). It looks like the ocropy-model has more parameters/weights in the LSTM layers than the clstm-model: They share wci, wgi, wgf, wgo, but the ocropus model has wip, wfp, wop in addition.
I doubt that just putting the four matching weight matrices for each layer into a clstm protobuf file would work, since those weights were conditioned on different architectures, but I'd love to be proven wrong :-)
Also, iirc clstm uses a different line normalization algorithm than ocropus, i.e. for identical line images the two models were conditioned on different inputs, though I don't know how much the difference matters in practice.

amitdo · 2016-11-15T11:44:21Z

It looks like the ocropy-model has more parameters/weights in the LSTM layers than the clstm-model: They share wci, wgi, wgf, wgo, but the ocropus model has wip, wfp, wop in addition.

In clstm the peephole optimization code was dropped.
#17 (comment)
In ocropy it's still present.

mittagessen · 2016-11-15T17:58:07Z

They are for all intents and purposes completely different networks because of the peephole connections (so not really convertible). The code linked above only reserializes pickled pyrnn into HDF5 or protobuf files as they are vastly smaller (~1000 times without compression), faster to parse, and not an inherent security risk. A HDF5 or pronn model is still not a CLSTM model but an ocropy one with some benefits.

The line normalization and preprocessing is the same for both types of models.

amitdo · 2016-11-16T08:38:57Z

The line normalization and preprocessing is the same for both types of models.

From ocropy README.md

CLSTM vs OCRopy

....

Python and C++ models can not be interchanged, both because the save file formats are different and because the text line normalization is slightly different.

mittagessen · 2016-11-16T14:20:42Z

The line image normalization is identical, the text line normalization is not. Ocropy normalizes output to NFKC(/D?), clstm doesn't normalize output to any Unicode normalization form.

DissBiscuit · 2019-11-12T13:54:23Z

@jbaiter sorry to open old closed subjects, but i am currently working on kraken, especially this fraktur model, and i understand you worked on it too ? is it a dead end ? I'm trying to see if it does a better job than tesseract...
the output I get with kraken -i imagefilename.tif outputfilename.xml binarize segment ocr -a -m fraktur.pronn on ubuntu python 2.7.15 looks like it's in the wrong format...
thanks in advance !

zuphilip added the question label Nov 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using pyrnn.gz in clstm #111

using pyrnn.gz in clstm #111

srika91 commented Nov 15, 2016

jbaiter commented Nov 15, 2016

zuphilip commented Nov 15, 2016

kba commented Nov 15, 2016

jbaiter commented Nov 15, 2016 •

edited

Loading

amitdo commented Nov 15, 2016

mittagessen commented Nov 15, 2016

amitdo commented Nov 16, 2016

CLSTM vs OCRopy

mittagessen commented Nov 16, 2016

DissBiscuit commented Nov 12, 2019 •

edited

Loading

using pyrnn.gz in clstm #111

using pyrnn.gz in clstm #111

Comments

srika91 commented Nov 15, 2016

jbaiter commented Nov 15, 2016

zuphilip commented Nov 15, 2016

kba commented Nov 15, 2016

jbaiter commented Nov 15, 2016 • edited Loading

amitdo commented Nov 15, 2016

mittagessen commented Nov 15, 2016

amitdo commented Nov 16, 2016

CLSTM vs OCRopy

mittagessen commented Nov 16, 2016

DissBiscuit commented Nov 12, 2019 • edited Loading

jbaiter commented Nov 15, 2016 •

edited

Loading

DissBiscuit commented Nov 12, 2019 •

edited

Loading