Skip to content
/ kwja Public
forked from ku-nlp/kwja

A unified language analyzer for Japanese

License

Notifications You must be signed in to change notification settings

juntakano/kwja

 
 

Repository files navigation

KWJA: Kyoto-Waseda Japanese Analyzer

test codecov

KWJA is a Japanese language analyzer based on pre-trained language models. KWJA performs many language analysis tasks, including:

  • Typo correction
  • Tokenization
  • Morphological analysis
  • Named entity recognition
  • Dependency parsing
  • PAS analysis
  • Coreference resolution
  • Discourse relation analysis
  • etc.

Requirements

Getting Started

Install KWJA with pip:

$ pip install kwja

Perform language analysis with the kwja command (the result is in the KNP format):

# Analyze a text
$ kwja --text "月が綺麗ですね。死んでもいいわ。"

# Analyze a text file
$ kwja --file path/to/file.txt

Usage from Python

Make sure you have kwja command in your path:

$ which kwja
/path/to/kwja

Install rhoknp:

$ pip install rhoknp

Perform language analysis with the kwja instance:

from rhoknp import KWJA
kwja = KWJA()
analyzed_document = kwja.apply("月が綺麗ですね。死んでもいいわ。")

Citation

@InProceedings{植田2022,
  author    = {植田 暢大 and 大村 和正 and 児玉 貴志 and 清丸 寛一 and 村脇 有吾 and 河原 大輔 and 黒橋 禎夫},
  title     = {KWJA:汎用言語モデルに基づく日本語解析器},
  booktitle = {第253回自然言語処理研究会},
  year      = {2022},
  address   = {京都},
}

About

A unified language analyzer for Japanese

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%