KWJA is a Japanese language analyzer based on pre-trained language models. KWJA performs many language analysis tasks, including:
- Typo correction
- Tokenization
- Morphological analysis
- Named entity recognition
- Dependency parsing
- PAS analysis
- Coreference resolution
- Discourse relation analysis
- etc.
- Python: 3.9+
- Dependencies: See pyproject.toml.
Install KWJA with pip:
$ pip install kwja
Perform language analysis with the kwja
command (the result is in the KNP format):
# Analyze a text
$ kwja --text "月が綺麗ですね。死んでもいいわ。"
# Analyze a text file
$ kwja --file path/to/file.txt
Make sure you have kwja
command in your path:
$ which kwja
/path/to/kwja
Install rhoknp:
$ pip install rhoknp
Perform language analysis with the kwja
instance:
from rhoknp import KWJA
kwja = KWJA()
analyzed_document = kwja.apply("月が綺麗ですね。死んでもいいわ。")
@InProceedings{植田2022,
author = {植田 暢大 and 大村 和正 and 児玉 貴志 and 清丸 寛一 and 村脇 有吾 and 河原 大輔 and 黒橋 禎夫},
title = {KWJA:汎用言語モデルに基づく日本語解析器},
booktitle = {第253回自然言語処理研究会},
year = {2022},
address = {京都},
}