Skip to content

datasets/unicode-characters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

badge

Description

Unicode character data from around the world. This dataset was sourced from the public domain resource at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, which aggregates information from various authoritative sources. The data is regularly updated to reflect the latest Unicode standards and character properties.

Data

"data/unicode-characters.csv" contains the list of all Unicode characters, the attributes are identified in datapackage description. Original source url is http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (stored in archive/source.csv)

Note: Currently the scripts is run automatically using Github Actions

Preparation

You will need Python 3.6 or greater, to run the script

To update the data run the process script locally:

# To run locally you should do this
# Install using requirements
pip install -r scripts/requirements.txt
python3 scripts/process.py

Automation

Montly updated 'Unicode Characters' datapackage could be found on the datahub.io:
https://datahub.io/core/unicode-characters

License

The source specifies that the data can be used as is without any warranty. Given size and factual nature of the data and its source from a US company would imagine this was public domain and as such have licensed the Data Package under the Public Domain Dedication and License (PDDL).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages