Skip to content

crew102/rapidrake-java

Folders and files

NameName
Last commit message
Last commit date
Jun 27, 2021
Jun 10, 2024
May 5, 2021
Jun 27, 2021
Oct 2, 2020
May 5, 2021
Jun 10, 2024
Jun 10, 2024

Repository files navigation

rapidrake

A fast version of the Rapid Automatic Keyword Extraction (RAKE) algorithm

CI build Maven Central

Installation

Assuming you're using Maven, follow these two steps to use rakidrake in your Java project:

  1. Include a dependency on rapidrake in your POM:
<dependency>
    <groupId>io.github.crew102</groupId>
    <artifactId>rapidrake</artifactId>
    <version>0.1.4</version>
</dependency>
  1. Download the opennlp trained models for sentence detection and part-of-speech tagging. You can find these two models (trained on various languages) on opennlp's model page. For example, you could use the English versions of the sentence detection and POS-tagger models. You'll specify the file paths to these models when you instantiate a RakeAlgorithm object (see below for example).

Basic usage

import io.github.crew102.rapidrake.RakeAlgorithm;
import io.github.crew102.rapidrake.data.SmartWords;
import io.github.crew102.rapidrake.model.RakeParams;
import io.github.crew102.rapidrake.model.Result;

public class Example {

  public static void main(String[] args) throws java.io.IOException {
    
    // Create an object to hold algorithm parameters
    String[] stopWords = new SmartWords().getSmartWords(); 
    String[] stopPOS = {"VB", "VBD", "VBG", "VBN", "VBP", "VBZ"}; 
    int minWordChar = 1;
    boolean shouldStem = true;
    String phraseDelims = "[-,.?():;\"!/]"; 
    RakeParams params = new RakeParams(stopWords, stopPOS, minWordChar, shouldStem, phraseDelims);
    
    // Create a RakeAlgorithm object
    // You can use the RakeAlgorithm(RakeParams, POSTaggerME, SentenceDetectorME)
    // constructor instead of the one shown below if you want to pass in 
    // pre-initialized opennlp models.  
    String POStaggerURL = "model-bin/en-pos-maxent.bin"; // The path to your POS tagging model
    String SentDetectURL = "model-bin/en-sent.bin"; // The path to your sentence detection model
    RakeAlgorithm rakeAlg = new RakeAlgorithm(params, POStaggerURL, SentDetectURL);
    
    // Call the rake method
    String txt = "dogs are great, don't you agree? I love dogs, especially big dogs";
    Result result = rakeAlg.rake(txt);
    
    // Print the resulting keywords (not stemmed)
    System.out.println(result.distinct());
    
  }
}

// [dogs (1.33), great (1), big dogs (3.33)]

Learning more

You can learn more about how RAKE works and the various parameters you can set by visiting slowraker's website.