From 2043b85eb0c12d164c973258090017930d2f356a Mon Sep 17 00:00:00 2001 From: lostfictions Date: Mon, 14 Aug 2023 00:00:03 -0400 Subject: [PATCH] readme --- README.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..18add11 --- /dev/null +++ b/README.md @@ -0,0 +1,13 @@ +# conceptnet-trim + +trim [conceptnet](https://conceptnet.io/)'s ~34,000,000 assertions (about 10gb of +tsv) into a tidy ~3,400,000 english-language assertions (in json format). + +1. clone this repo +2. [download the latest version of + conceptnet](https://github.com/commonsense/conceptnet5/wiki/Downloads) (5.7.0 + at the time of writing) +3. extract it to `/data/assertions.csv` +4. run `cargo run -r` to run in release mode. the trimmed assertions will be + written to `/data/trimmed.json` +