GitHub - datasets/unicode-characters

Description

Unicode character data from around the world. This dataset was sourced from the public domain resource at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, which aggregates information from various authoritative sources. The data is regularly updated to reflect the latest Unicode standards and character properties.

Data

"data/unicode-characters.csv" contains the list of all Unicode characters, the attributes are identified in datapackage description. Original source url is http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (stored in archive/source.csv)

Note: Currently the scripts is run automatically using Github Actions

Preparation

You will need Python 3.6 or greater, to run the script

To update the data run the process script locally:

# To run locally you should do this
# Install using requirements
pip install -r scripts/requirements.txt
python3 scripts/process.py

Automation

Montly updated 'Unicode Characters' datapackage could be found on the datahub.io:
https://datahub.io/core/unicode-characters

License

The source specifies that the data can be used as is without any warranty. Given size and factual nature of the data and its source from a US company would imagine this was public domain and as such have licensed the Data Package under the Public Domain Dedication and License (PDDL).

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
archive		archive
data		data
scripts		scripts
README.md		README.md
datapackage.json		datapackage.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Data

Preparation

Automation

License

About

Releases

Packages

Contributors 5

Languages

datasets/unicode-characters

Folders and files

Latest commit

History

Repository files navigation

Description

Data

Preparation

Automation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages