Skip to content

Latest commit

 

History

History
70 lines (50 loc) · 3.62 KB

ad.rst

File metadata and controls

70 lines (50 loc) · 3.62 KB

ad

Table of contents

The ad command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data.

ad <number_of_trees> <shingle_size> <sample_size> <output_after> <time_decay> <anomaly_rate> <time_field> <date_format> <time_zone>

  • number_of_trees(integer): optional. Number of trees in the forest. The default value is 30.
  • shingle_size(integer): optional. A shingle is a consecutive sequence of the most recent records. The default value is 8.
  • sample_size(integer): optional. The sample size used by stream samplers in this forest. The default value is 256.
  • output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32.
  • time_decay(double): optional. The decay factor used by stream samplers in this forest. The default value is 0.0001.
  • anomaly_rate(double): optional. The anomaly rate. The default value is 0.005.
  • time_field(string): mandatory. It specifies the time filed for RCF to use as time-series data.
  • date_format(string): optional. It's used for formatting time_field field. The default formatting is "yyyy-MM-dd HH:mm:ss".
  • time_zone(string): optional. It's used for setting time zone for time_field filed. The default time zone is UTC.

ad <number_of_trees> <sample_size> <output_after> <training_data_size> <anomaly_score_threshold>

  • number_of_trees(integer): optional. Number of trees in the forest. The default value is 30.
  • sample_size(integer): optional. Number of random samples given to each tree from the training data set. The default value is 256.
  • output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32.
  • training_data_size(integer): optional. The default value is the size of your training data set.
  • anomaly_score_threshold(double): optional. The threshold of anomaly score. The default value is 1.0.

The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.

PPL query:

os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0'
+----------+---------------+-------+---------------+
| value    | timestamp     | score | anomaly_grade |
|----------+---------------+-------+---------------|
| 10844.0  | 1404172800000 | 0.0   |  0.0          |
+----------+---------------+-------+---------------+

The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.

PPL query:

os> source=nyc_taxi | fields value | AD | where value=10844.0'
+----------+--------+-----------+
| value    | score  | anomalous |
|----------+--------+-----------|
| 10844.0  | 0.0    | false     |
+----------+--------+-----------+