-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
17 changed files
with
503 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
on: | ||
push: | ||
# Sequence of patterns matched against refs/tags | ||
tags: | ||
- 'v*' # Push events to matching v*, i.e. v1.0, v20.15.10 | ||
|
||
name: Create Release | ||
|
||
jobs: | ||
build: | ||
name: Create Release | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v2 | ||
- name: Create Release | ||
id: create_release | ||
uses: actions/create-release@v1 | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
with: | ||
tag_name: ${{ github.ref }} | ||
release_name: Release ${{ github.ref }} | ||
draft: false | ||
prerelease: false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,94 @@ | ||
![Tests](https://github.com/asar-studio/natural-abh/workflows/Tests/badge.svg?branch=develop) ![Release Package to npm](https://github.com/asar-studio/natural-abh/workflows/Release%20Package%20to%20npm/badge.svg) | ||
# natural-abh | ||
|
||
======= | ||
|
||
![Tests](https://github.com/asar-studio/natural-abh/workflows/Tests/badge.svg?branch=develop) | ||
![Release Package to npm](https://github.com/asar-studio/natural-abh/workflows/Release%20Package%20to%20npm/badge.svg) | ||
[![NPM version](https://img.shields.io/npm/v/natural-abh.svg)](https://www.npmjs.com/package/natural-abh) | ||
|
||
"natural-abh" is a general natural language facility for nodejs. В настоящее время поддерживается: Tokenizing, normalizing and N-grams are currently supported. | ||
|
||
It's still in the early stages, so we're very interested in bug reports, contributions and the like. | ||
|
||
### TABLE OF CONTENTS | ||
|
||
- [Installation](#installation) | ||
- [Tokenizers](#tokenizers) | ||
- [Normalizer](#normalizer) | ||
- [N-Grams](#n-grams) | ||
|
||
## Installation | ||
|
||
You can install natural-abh via NPM like so: | ||
|
||
npm install natural-abh | ||
|
||
or using yarn: | ||
|
||
yarn add natural-abh | ||
|
||
If you're interested in contributing to natural, or just hacking on it, then by all means fork away! | ||
|
||
## Tokenizers | ||
|
||
Word anf RegExp are provided for breaking text up into arrays of tokens: | ||
|
||
```javascript | ||
const nabh = require('natural-abh'); | ||
const tokenizer = new nabh.WordTokenizer(); | ||
console.log(nabh.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
``` | ||
|
||
The other tokenizers follow a similar pattern: | ||
|
||
```javascript | ||
tokenizer = new nabh.AggressiveTokenizer(); | ||
console.log(tokenizer.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
|
||
tokenizer = new nabh.RegexpTokenizer({ pattern: /\-/ }); | ||
console.log(tokenizer.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
|
||
tokenizer = new nabh.WordPunctTokenizer(); | ||
console.log(tokenizer.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
``` | ||
|
||
## Normalizer | ||
|
||
Replaces obsolete characters in a string with modern counterparts: | ||
|
||
```javascript | ||
const { normalize } = require('natural-abh'); | ||
console.log(normalize('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// "Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла" | ||
``` | ||
|
||
## N-Grams | ||
|
||
n-grams can be obtained for strings (which will be tokenized for you): | ||
|
||
```javascript | ||
const { bigrams, trigrams, ngrams } = nabh; | ||
``` | ||
|
||
### bigrams | ||
|
||
```javascript | ||
console.log(bigrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
console.log(ngrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла', 2)); | ||
// [ [ 'Аҧсны', 'Аҳәынҭқарра' ], [ 'Аҳәынҭқарра', 'Ашьаустә' ], [ 'Ашьаустә', 'закәанеидкыла' ] ] | ||
``` | ||
|
||
### trigrams | ||
|
||
```javascript | ||
console.log(trigrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
console.log(ngrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла', 3)); | ||
// [ [ 'Аҧсны', 'Аҳәынҭқарра', 'Ашьаустә' ], [ 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] ] | ||
``` | ||
|
||
More use cases u can find reading tests | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# natural-abh | ||
|
||
======= | ||
|
||
[![NPM version](https://img.shields.io/npm/v/natural-abh.svg)](https://www.npmjs.com/package/natural-abh) | ||
![Tests](https://github.com/asar-studio/natural-abh/workflows/Tests/badge.svg?branch=develop) | ||
![Release Package to npm](https://github.com/asar-studio/natural-abh/workflows/Release%20Package%20to%20npm/badge.svg) | ||
|
||
"Natural" is a general natural language facility for nodejs. В настоящее время поддерживается: токенизация, нормализация, подсчёт N-грамм(биграммы, триграммы и мультиграммы). | ||
|
||
Библиотека все еще на начальной стадии, поэтому мы очень заинтересованы в сообщениях об ошибках, помощь в реализации функционала и тд. | ||
|
||
### Содержание | ||
|
||
- [Установка](#установка) | ||
- [Токенизатор](#токенизатор) | ||
- [Нормалайзер](#нормалайзер) | ||
- [N-граммы](#n-граммы) | ||
|
||
## Установка | ||
|
||
Вы можете установить natural-abh через NPM следующим образом: | ||
|
||
npm install natural-abh | ||
|
||
Либо используя yarn: | ||
|
||
yarn add natural-abh | ||
|
||
Если вы заинтересованы в том, чтобы внести свой вклад в natural-abh, создайте fork репозитория, добавьте свой функционал и создайте pull request для обсуждения! | ||
|
||
## Токенизатор | ||
|
||
Word и RegExp токенизаторы предназначены для разбиения текста на массивы токенов: | ||
|
||
```javascript | ||
const nabh = require('natural-abh'); | ||
const tokenizer = new nabh.WordTokenizer(); | ||
console.log(nabh.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
``` | ||
|
||
Остальные токенизаторы следуют аналогичной схеме: | ||
|
||
```javascript | ||
tokenizer = new nabh.AggressiveTokenizer(); | ||
console.log(tokenizer.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
|
||
tokenizer = new nabh.RegexpTokenizer({ pattern: /\-/ }); | ||
console.log(tokenizer.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
|
||
tokenizer = new nabh.WordPunctTokenizer(); | ||
console.log(tokenizer.tokenize('Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// [ 'Аԥсны', 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] | ||
``` | ||
|
||
## Нормалайзер | ||
|
||
Заменяет устаревшие символы в строке на современные аналоги: | ||
|
||
```javascript | ||
const { normalize } = require('natural-abh'); | ||
console.log(normalize('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
// "Аԥсны Аҳәынҭқарра Ашьаустә закәанеидкыла" | ||
``` | ||
|
||
## N-граммы | ||
|
||
быдут получены для строк (которые будут токенизированы для вас): | ||
|
||
```javascript | ||
const { bigrams, trigrams, ngrams } = nabh; | ||
``` | ||
|
||
### bigrams | ||
|
||
```javascript | ||
console.log(bigrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
console.log(ngrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла', 2)); | ||
// [ [ 'Аҧсны', 'Аҳәынҭқарра' ], [ 'Аҳәынҭқарра', 'Ашьаустә' ], [ 'Ашьаустә', 'закәанеидкыла' ] ] | ||
``` | ||
|
||
### trigrams | ||
|
||
```javascript | ||
console.log(trigrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла')); | ||
console.log(ngrams('Аҧсны Аҳәынҭқарра Ашьаустә закәанеидкыла', 3)); | ||
// [ [ 'Аҧсны', 'Аҳәынҭқарра', 'Ашьаустә' ], [ 'Аҳәынҭқарра', 'Ашьаустә', 'закәанеидкыла' ] ] | ||
``` | ||
|
||
|
||
Более детально ознакомиться с использованием библиотеки вы можете ознакомиться посмотрев тесты |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
tests/normalizers/normalizer.spec.ts → src/normalizers/normalizer.spec.ts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.