Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blog] Reducing AD False Positives Through Algorithmic Improvements #3524

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
---
layout: post
title: "Enhancing OpenSearch Anomaly Detection: Reducing False Positives Through Algorithmic Improvements"
authors:
- kaituo
- kolchfa
date: 2024-12-26
categories:
- technical-posts
has_math: true
has_science_table: true
meta_keywords: anomaly detection, false positives, algorithmic improvements, OpenSearch, nab benchmark
meta_description: Explore how recent algorithmic improvements in RCF reduces false positives in OpenSearch Anomaly Detection. Illustrate the improvements with NAB benchmark.
---

The Anomaly Detection (AD) plugin in OpenSearch is powered by the [Random Cut Forest (RCF)](https://github.com/aws/random-cut-forest-by-aws/) algorithm. In OpenSearch 2.17, The AD team has introduced four major enhancements to RCF, significantly reducing false positives. In this blog post, we'll explore four major algorithmic improvements to OpenSearch's Random Cut Forest (RCF) algorithm that have resulted in a 94.3% reduction in false positives while maintaining high detection accuracy, demonstrate these improvements through real-world case studies, and provide a detailed comparison with previous OpenSearch versions.

<style>

table {
font-size: 16px;
}

h3 {
font-size: 22px;
}
h4 {
font-size: 20px;
}

td {
text-align: right;
}

td:first-child {
text-align: left;
}

th {
background-color: #f5f7f7;
}​

</style>

## Improvements in the RCF algorithm

OpenSearch 2.17 introduced the following improvements in the RCF algorithm.

### Adaptive learning for data drift, level shifts, and recurring periodic spikes

The RCF algorithm employs statistical tracking and adaptive thresholds in order to handle gradual changes in data distribution (data drift), sudden baseline shifts (level shifts), and recurring periodic spikes. Instead of updating global statistics for all data, the algorithm focuses on a small set of instances that appear unusual—--the candidate anomalies identified by high RCF scores. By incrementally updating their running mean and variance using exponential decay, the algorithm can quickly recalibrate its definition of normal behavior. This targeted approach gives recent observations greater influence, allowing the system to quickly adapt to new patterns. As a result, false positives from periodic spikes are reduced and older data points gradually lose relevance.

When multiple consecutive anomalies occur, the algorithm examines whether they represent a new stable pattern. It compares the following two sets of values (actual and expected) for the affected dimensions:

- **Actual observations and their running mean**: The algorithm compares the actual current observations against their running mean to ensure that they stay within acceptable margins. Because these statistics are updated based on candidate anomalies, the algorithm can quickly adapt to shifts or recurring patterns in the data.

- **Expected observations and their running mean**: The algorithm examines _expected_ observations---calculated values representing the algorithm's understanding of normal behavior—--to ensure they remain stable within their running mean. This continuous refinement helps maintain an accurate baseline as conditions evolve.

If both actual and expected values remain within their updated acceptable ranges after multiple anomalies, the algorithm adjusts its baseline and stops flagging these as anomalies.

### Grouped alerting

If the system is configured to ignore multiple successive anomalies, the algorithm supports grouped alerting. When multiple unusual events occur close together (within one time window, or _shingle_), you'll receive just one alert instead of many. This prevents alert fatigue by avoiding multiple notifications about the same issue.

### Using approximate nearest neighbors for anomaly detection

RCF now uses approximate nearest-neighbor computations to determine if a data point is _normal_ (not anomalous). The algorithm calculates the average distance between a queried point and its neighbors across multiple trees in the forest. For each tree, it starts from the root node and follows the cut dimensions downward until it identifies the node closest to the queried point; this node is then considered a neighbor. If at least 10% of these neighbors fall within the computed average Euclidean distance, the queried point is deemed normal.

### Rescoring using expected value vectors

Check failure on line 69 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 69, "column": 5}}}, "severity": "ERROR"}

The algorithm starts by constructing an expected value vector for the candidate anomaly. This serves as a _normal_ counterpart to the anomalous point. To construct this vector, the model identifies the elements in the candidate that deviate most from historical norms and replaces them with values drawn from historical data. These replacements are based on values that are statistically likely to co-occur with the candidate's remaining attributes. The result is a reconstructed version of the candidate that aligns more closely with historical patterns.

After forming the expected value vector, the algorithm recalculates the RCF score. It then compares the original anomaly score with this new expected score. If the absolute difference between these scores is below a scaled threshold—--adjusted for both the prior anomaly's score and the current threshold—--the algorithm determines that the candidate is not sufficiently distinct to warrant attention. This process helps the algorithm avoid repeatedly flagging similar anomalies, reducing redundant alerts and improving efficiency.

## Evaluating RCF using the Numenta Anomaly Benchmark

Check failure on line 75 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Numenta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Numenta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 75, "column": 29}}}, "severity": "ERROR"}

To test these improvements, the AD team evaluated RCF using labeled AWS CloudWatch metrics from the [Numenta Anomaly Benchmark (NAB)](https://github.com/numenta/NAB). The datasets include AWS server performance metrics, such as CPU utilization, incoming network traffic (measured in bytes), and disk write activity (measured in bytes). These datasets also contain verified anomaly labels provided by Numenta, which serve as a reliable reference for identifying abnormal patterns.

Check failure on line 77 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Numenta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Numenta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 77, "column": 102}}}, "severity": "ERROR"}

Check failure on line 77 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Numenta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Numenta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 77, "column": 401}}}, "severity": "ERROR"}

### Test configuration

We used the following RCF configuration for testing:

- 50 trees, each trained on 256 samples
- A shingle size of 8
- A warm-up period of 40 data points
- Anomaly grade threshold of 0.5 to filter out lower-severity anomalies
- A constraint requiring actual values to deviate by at least 20% from expected values

These settings match the default RCF parameters in the OpenSearch AD plugin.

### Results

The following sections provide test results for various metrics.

#### CPU utilization

- **Dataset**: NAB AWS `ec2_cpu_utilization_24ae8d`
- **Total records**: 4,031
- **Labeled anomalies**:
- 2014-02-26 22:05:00 (Detected)
- 2014-02-27 17:15:00 (Detected)
- **Total RCF anomalies**: 3

![cpu utilization](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/cpu.png){:class="img-centered"}

Check failure on line 104 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: cpu. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: cpu. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 104, "column": 3}}}, "severity": "ERROR"}

#### Incoming network traffic

- **Dataset**: NAB AWS `ec2_network_in_257a54`
- **Total records**: 4,031
- **Labeled anomalies**:
- 2014-04-15 16:44:00 (Detected)
- **Total RCF anomalies**: 4

![network in](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/network.png){:class="img-centered"}

#### Disk write activity

- **Dataset**: NAB AWS `ec2_disk_write_bytes_1ef3de`
- **Total records**: 4,718
- **Labeled anomalies**:
- 2014-03-10 21:09:00 (Detected)
- **Total RCF anomalies**: 3

![disk write bytes](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/disk.png){:class="img-centered"}

#### RDS CPU utilization (rds_cpu_utilization_e47b3b)

- **Dataset**: NAB AWS `rds_cpu_utilization_e47b3b`
- **Total records**: 4,031
- **Labeled anomalies**:
- 2014-04-13 06:52:00 (Detected)
- 2014-04-18 23:27:00 (Detected)
- **Total RCF anomalies**: 2

![RDS CPU utilization](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/rds.png){:class="img-centered"}

#### RDS CPU utilization (rds_cpu_utilization_cc0c53)

- **Dataset**: NAB AWS `rds_cpu_utilization_cc0c53`
- **Total records**: 4,031
- **Labeled anomalies**:
- 2014-02-25 07:15:00 (Detected)
- 2014-02-27 00:50:00 (Undetected)
- **Total RCF anomalies**: 1

![rds_cc0c53](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/rds_cc.png){:class="img-centered"}

## Precision and recall

Before we present precision and recall results, let's define two key concepts in machine learning evaluation:

- **Precision**: The proportion of detected anomalies that are correct: of all the anomalies the algorithm detected, how many were true anomalies? High precision means fewer false positives. For example, if the algorithm detects 10 anomalies and 8 are actually anomalies, then the precision is 8 / 10 = 0.8, or 80%.
- **Recall**: The proportion of true anomalies that are detected: of all true anomalies, how many did the algorithm detect? High recall means fewer missed anomalies. For example, if there are 10 true anomalies and the algorithm detects 8 of them, then the recall is 8 / 10 = 0.8, or 80%.

### Results

The following table summarizes RCF's precision and recall performance using the NAB CloudWatch benchmarks.

Check failure on line 157 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: RCF's. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: RCF's. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 157, "column": 32}}}, "severity": "ERROR"}

| Dataset | Precision | Recall |
|-----|-----------|--------|
| AWS `ec2_cpu_utilization_24ae8d` | 0.67 | 1 |
| AWS `ec2_network_in_257a54` | 0.25 | 1 |
| AWS `ec2_disk_write_bytes_1ef3de` | 0.33 | 1 |
| AWS `rds_cpu_utilization_e47b3b` | 1 | 1 |
| AWS `rds_cpu_utilization_cc0c53` | 1 | 0.5 |

Overall, RCF demonstrated strong recall, correctly detecting 7 out of 8 anomalies across the datasets. Notably, it achieved perfect recall in four of the five datasets.

RCF also maintains both high precision and computational efficiency. Most false positives occur early in the time series, before the model has accumulated sufficient historical data for accurate predictions. For example, in the following graph, the anomaly detected around April 12 at 3:14 for the `ec2_network_in_257a54` dataset is particularly noteworthy. It deviates from the previously observed pattern of uniform double spikes. Earlier patterns show the second spike declining within approximately five minutes, whereas this anomalous spike exhibits an extended decline lasting around ten minutes.

![network_zoom_in](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/network_zoom_in_1.png){:class="img-centered"}
When excluding the first 20% of data as a probation period, precision improves across datasets. This adjustment highlights how RCF becomes more accurate after observing enough historical data. The results of the last 80% of the dataset are summarized in the following table.

| Data Set | Precision | Recall |

Check failure on line 174 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.TableHeadings] 'Data Set' is a table heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.TableHeadings] 'Data Set' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 174, "column": 3}}}, "severity": "ERROR"}

Check failure on line 174 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SubstitutionsError] Use 'dataset' instead of 'Data Set'. Raw Output: {"message": "[OpenSearch.SubstitutionsError] Use 'dataset' instead of 'Data Set'.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 174, "column": 3}}}, "severity": "ERROR"}
|----------|-----------|--------|
| AWS ec2_cpu_utilization_24ae8d | 1 | 1 |
| AWS ec2_network_in_257a54 | 0.5 | 1 |
| AWS ec2_disk_write_bytes_1ef3de | 1 | 1 |
| AWS rds_cpu_utilization_e47b3b | 1 | 1 |
| AWS rds_cpu_utilization_cc0c53 | 1 | 0.5 |

Another type of false positive occurs when detected anomalies seem valid based on the data but are in fact not labeled as anomalies. For example, in the following graph, the anomaly detected on April 15 at approximately 3:34 for the `ec2_network_in_257a54` dataset deviates from the typical pattern of paired spikes. Instead of the usual two consecutive peaks, this event shows a single spike. While this represents a significant departure from the established pattern, it is not labeled as an anomaly.

![network_zoom_in_3](/assets/media/blog-images/2024-12-26-Reducing-false-positives-through-algorithmic-improvements/network_zoom_in_3.png){:class="img-centered"}

## OpenSearch 2.17 and 2.9 compared

Let's examine how OpenSearch 2.17 has improved its anomaly detection performance relative to OpenSearch 2.9. Before we present the results, let's define the following key concepts:

- **True positive (correct detection)**: An true anomolous event correctly flagged as anomalous by the algorithm.

Check failure on line 190 in _posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: anomolous. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: anomolous. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-12-26-Reducing-false-positives-through-algorithmic-improvements.md", "range": {"start": {"line": 190, "column": 50}}}, "severity": "ERROR"}
- **False positive (false alarm)**: A non-anomalous event incorrectly flagged as anomalous by the algoritm.
- **False negative (missed anomaly)**: An anomalous event that the system fails to flag as anomalous.

### Results

The following table compares anomaly detection in OpenSearch versions 2.9 and 2.17.

| Dataset | 2.9 anomalies detected | 2.9 true positives | 2.17 anomalies detected | 2.17 true positives | Actual anomalies |
|--------------------------------------|-------------:|-------------------:|--------------:|--------------------:|-------------:|
| AWS ec2_cpu_utilization_24ae8d | 17 | 1 | 3 | 2 | 2 |
| AWS ec2_network_in_257a54 | 33 | 1 | 4 | 1 | 1 |
| AWS ec2_disk_write_bytes_1ef3de | 16 | 1 | 3 | 1 | 1 |
| AWS rds_cpu_utilization_e47b3b | 23 | 2 | 2 | 2 | 2 |
| AWS rds_cpu_utilization_cc0c53 | 23 | 1 | 1 | 1 | 2 |
| **Totals** | 112 | 6 | 13 | 7 | 8 |

### Discussion

Version 2.17 showed an overall reduction in false alerts and an increase in true alerts compared to version 2.9.

OpenSearch 2.9 flagged 112 anomalies, but only 6 were true anomalies, resulting in a high false positive rate. After algorithmic improvements in OpenSearch 2.17, only 13 anomalies were flagged, with 7 being true anomalies. This reflects a significant increase in precision, from 5.4% (6/112) to 53.8% (7/13) and increase in recall from 75% (6/8) to 87.5% (7/8).

In comparison, OpenSearch 2.9 produced 106 false positives, while OpenSearch 2.17 produced just 6—--representing a **94.3% reduction in false positives**. Moreover, OpenSearch 2.9 missed 2 true anomalies, while OpenSearch 2.17 missed only 1, achieving a **50% reduction in false negatives**. This percent reduction can be calculated using the following formulas:

$$
\text{Percent Reduction in FP}
= \frac{\text{Old FP} - \text{New FP}}{\text{Old FP}} \times 100\%
= \frac{106 - 6}{106} \times 100\% = 94.3\%
$$

Similarly, OpenSearch 2.9 missed 2 real anomalies, whereas OpenSearch 2.17 missed just 1--—representing a **50% reduction in false negatives**, computed as follows:

$$
\text{Percent Reduction in FN}
= \frac{\text{Old FN} - \text{New FN}}{\text{Old FN}} \times 100\%
= \frac{2 - 1}{2} \times 100\% = 50\%
$$

Overall, OpenSearch 2.17 significantly decreased both false positives and false negatives compared to version 2.9.






## Conclusions

The enhancements to the RCF algorithm in OpenSearch 2.17 improve anomaly detection by greatly reducing false positives. By tracking the history of candidate anomalies, adapting to changing data patterns, implementing grouped alerting, and refining scores through expected value comparisons, the updated approach addresses real-world challenges like data drift, level shifts, and periodic spikes, while maintaining high recall. Empirical tests using the NAB CloudWatch benchmarks validate its effectiveness, showing a 94.3% reduction in false positives and a 50% reduction in false negatives.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading