Russian Tracked Handwritings

Authors: Dmitriy Yacenko, Konstantin Smirnov

Dataset Information:

We created a character dataset by collecting samples from 12 writers. Each writer contributed with letters (lower and uppercase), digits, and words from a pangram that we have not employed in our experiments, but they are included in "extra" folder for each writer in this database. Up to 4 samples have been collected for each pair writer/character, and the total number of samples in this database version is 2812:

Moreover, this classification task is a 42-class one because we have not considered a different class for each different character: each one of the 33 letters is considered as a case-independent class, there are 9 additional clases for non-zero digits, and the zero is included in the same class as "о" 's.

Database structure:

scanner.py - character scanning program, dataset collection.
convert2mnist.py - a program for converting a dataset into a mnist-like form. It is intended for an example with the test.
example_using.py - example of a primitive grid for character recognition. It is intended only to demonstrate the consistency of the dataset. When using the dataset, of course, the user can and will use their own, more advanced approaches.
data - folder with dataset.
w_n_m - folder with writer's attempt (in total 37 folders)
<char> - the main file of the symbol track, a text file with a list of coordinates of the form - "x1","y1","x2","y2",...,"xN","yN".
<char>_times - a file with additional information on the track with a list of time in ms between receiving coordinates of points.
<char>.png is an auxiliary file - a picture of the symbol as it was visible to the writer. The file is for understanding only.

The handwriting samples were collected on a xp pen deco03 using its stylus. Each one of the 8 writers completed 1-4 consecutive sessions. In each session, the corresponding writer was asked to write one example for each character in a fixed set including lowercase and uppercase letters, digits, along with pangram words omitted. The acquisition program shows a set of boxes on the screen, a different one for each required character, and writers are told to write only inside those boxes. Subjects are monitored only when writing their first sample and every further sample is considered to be OK due to its writer accepted them as such.

Only X and Y coordinate information and timing information were recorded along the strokes by the acquisition program, without, for instance, pressure level values.

Class distribution in `example_using.py`:

[A] = { "а" , "А" }
[Б] = { "б" , "Б" }
[В] = { "в" , "В" }
[Г] = { "г" , "Г" }
[Д] = { "д" , "Д" }
[Е] = { "е" , "Е" }
[Ё] = { "ё" , "Ё" }
[Ж] = { "ж" , "Ж" }
[З] = { "з" , "З" }
[И] = { "и" , "И" }
[Й] = { "й" , "Й" }
[К] = { "к" , "К" }
[Л] = { "л" , "Л" }
[М] = { "м" , "М" }
[Н] = { "н" , "Н" }
[О] = { "о" , "О", "0" }
[П] = { "п" , "П" }
[Р] = { "р" , "Р" }
[С] = { "с" , "С" }
[Т] = { "т" , "Т" }
[У] = { "у" , "У" }
[Ф] = { "ф" , "Ф" }
[Х] = { "х" , "Х" }
[Ц] = { "ц" , "Ц" }
[Ч] = { "ч" , "Ч" }
[Ш] = { "ш" , "Ш" }
[Щ] = { "щ" , "Щ" }
[Ъ] = { "ъ" , "Ъ" }
[Ы] = { "ы" , "Ы" }
[Ь] = { "ь" , "Ь" }
[Э] = { "э" , "Э" }
[Ю] = { "ю" , "Ю" }
[Я] = { "я" , "Я" }
[1] = { "1" }
[2] = { "2" }
[3] = { "3" }
[4] = { "4" }
[5] = { "5" }
[6] = { "6" }
[7] = { "7" }
[8] = { "8" }
[9] = { "9" }

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
w_0_1		w_0_1
w_0_2		w_0_2
w_0_3		w_0_3
w_10_1		w_10_1
w_11_1		w_11_1
w_11_2		w_11_2
w_11_3		w_11_3
w_12_1		w_12_1
w_12_2		w_12_2
w_1_1		w_1_1
w_1_2		w_1_2
w_1_3		w_1_3
w_2_1		w_2_1
w_2_2		w_2_2
w_2_3		w_2_3
w_3_1		w_3_1
w_3_2		w_3_2
w_3_3		w_3_3
w_4_1		w_4_1
w_4_2		w_4_2
w_4_3		w_4_3
w_5_1		w_5_1
w_5_2		w_5_2
w_5_3		w_5_3
w_6_1		w_6_1
w_6_2		w_6_2
w_6_3		w_6_3
w_7_1		w_7_1
w_7_2		w_7_2
w_7_3		w_7_3
w_8_1		w_8_1
w_8_2		w_8_2
w_8_3		w_8_3
w_8_4		w_8_4
w_9_1		w_9_1
w_9_2		w_9_2
w_9_3		w_9_3
LICENSE.md		LICENSE.md
README.md		README.md
cleanup.py		cleanup.py
convert2mnist.py		convert2mnist.py
example_using.py		example_using.py
requirements.txt		requirements.txt
scanner.py		scanner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Russian Tracked Handwritings

Dataset Information:

Database structure:

Class distribution in `example_using.py`:

About

Releases

Packages

Languages

License

Skvayzer/russian_handwritings_tracked

Folders and files

Latest commit

History

Repository files navigation

Russian Tracked Handwritings

Dataset Information:

Database structure:

Class distribution in example_using.py:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Class distribution in `example_using.py`:

Packages