Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance per bucket, and per document monitor notification message ctx. #1450

Merged
merged 17 commits into from
Mar 14, 2024

Conversation

AWSHurneyt
Copy link
Collaborator

@AWSHurneyt AWSHurneyt commented Mar 5, 2024

Issue #, if available:
#1300
#1396
#1401

Description of changes:

  1. Added support for returning sample documents for bucket level monitors, and document level monitors.
  2. Added support for printing query/rule info in notification messages for document level monitors.

CheckList:

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Sorry, something went wrong.

@AWSHurneyt AWSHurneyt changed the title 3.0 issue1401 Enhance per bucket, and per document monitor notification message ctx. Mar 5, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Signed-off-by: AWSHurneyt <[email protected]>
… document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>
@eirsep
Copy link
Member

eirsep commented Mar 11, 2024

typically we create smaller PRs. one for each issue is a practice we should try follow

…t instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>
Signed-off-by: AWSHurneyt <[email protected]>
Signed-off-by: AWSHurneyt <[email protected]>
Signed-off-by: AWSHurneyt <[email protected]>
@AWSHurneyt
Copy link
Collaborator Author

The nonsecurity-related test failures seem related to the fixes in PR #1464

The security-related test failures have been failing for some time, so they're unrelated to these changes.
https://github.com/opensearch-project/alerting/actions/workflows/security-test-workflow.yml?query=branch%3Amain

@AWSHurneyt
Copy link
Collaborator Author

typically we create smaller PRs. one for each issue is a practice we should try follow

@eirsep agreed. Would you say that's blocking for this PR? All of the changes will require the AlertContext class, so the second PR couldn't be raised until the first is merged.

…or sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>
…r sample docs.

Signed-off-by: AWSHurneyt <[email protected]>
@AWSHurneyt AWSHurneyt requested a review from jowg-amazon as a code owner March 12, 2024 23:57
@AWSHurneyt AWSHurneyt merged commit 5dc690c into opensearch-project:main Mar 14, 2024
15 of 18 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/alerting/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/alerting/backport-2.x
# Create a new branch
git switch --create backport-1450-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5dc690caf1d4b9935f9aeb946c104be8d4861a77
# Push it to GitHub
git push --set-upstream origin backport-1450-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/alerting/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport-1450-to-2.x.

AWSHurneyt added a commit to AWSHurneyt/OpenSearch-Alerting that referenced this pull request Mar 14, 2024
opensearch-project#1450)

* Adding dev logs.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for returning sample documents for bucket level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing query/rule info in notification messages.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing document data in notification messages for document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Refactored logic after making AlertContext a separate class from Alert instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>

* Moved AlertContext data model from common utils to alerting plugin.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added additional unit tests.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted sample doc aggs logic into helper function. Added support for sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted get sample doc logic into helper function. Added sorting for sample docs.

Signed-off-by: AWSHurneyt <[email protected]>

* Removed dev code.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added comments based on PR feedback.

Signed-off-by: AWSHurneyt <[email protected]>

* Added logic to make mGet calls in batches.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

(cherry picked from commit 5dc690c)
Signed-off-by: AWSHurneyt <[email protected]>
AWSHurneyt added a commit that referenced this pull request Mar 15, 2024
…ion message ctx. (#1450) (#1477)

* Enhance per bucket, and per document monitor notification message ctx. (#1450)

* Adding dev logs.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for returning sample documents for bucket level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing query/rule info in notification messages.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing document data in notification messages for document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Refactored logic after making AlertContext a separate class from Alert instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>

* Moved AlertContext data model from common utils to alerting plugin.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added additional unit tests.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted sample doc aggs logic into helper function. Added support for sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted get sample doc logic into helper function. Added sorting for sample docs.

Signed-off-by: AWSHurneyt <[email protected]>

* Removed dev code.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added comments based on PR feedback.

Signed-off-by: AWSHurneyt <[email protected]>

* Added logic to make mGet calls in batches.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

(cherry picked from commit 5dc690c)
Signed-off-by: AWSHurneyt <[email protected]>

* Fixed imports.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.11 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/alerting/backport-2.11 2.11
# Navigate to the new working tree
pushd ../.worktrees/alerting/backport-2.11
# Create a new branch
git switch --create backport-1450-to-2.11
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5dc690caf1d4b9935f9aeb946c104be8d4861a77
# Push it to GitHub
git push --set-upstream origin backport-1450-to-2.11
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/alerting/backport-2.11

Then, create a pull request where the base branch is 2.11 and the compare/head branch is backport-1450-to-2.11.

AWSHurneyt added a commit to AWSHurneyt/OpenSearch-Alerting that referenced this pull request Mar 15, 2024
…ion message ctx. (opensearch-project#1450) (opensearch-project#1477)

* Enhance per bucket, and per document monitor notification message ctx. (opensearch-project#1450)

* Adding dev logs.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for returning sample documents for bucket level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing query/rule info in notification messages.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing document data in notification messages for document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Refactored logic after making AlertContext a separate class from Alert instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>

* Moved AlertContext data model from common utils to alerting plugin.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added additional unit tests.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted sample doc aggs logic into helper function. Added support for sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted get sample doc logic into helper function. Added sorting for sample docs.

Signed-off-by: AWSHurneyt <[email protected]>

* Removed dev code.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added comments based on PR feedback.

Signed-off-by: AWSHurneyt <[email protected]>

* Added logic to make mGet calls in batches.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

(cherry picked from commit 5dc690c)
Signed-off-by: AWSHurneyt <[email protected]>

* Fixed imports.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>
val isTriggered = !nextAlerts[trigger.id]?.get(AlertCategory.NEW).isNullOrEmpty()
if (isTriggered && printsSampleDocData(trigger)) {
try {
val searchRequest = monitorCtx.inputService!!.getSearchRequest(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This current implementation executes the monitor query multiple times; the first time on line 124 to collect the data for trigger evaluation, and then subsequent searches are executed for each triggered trigger in order to collect sample documents.

Ideally, we want to collect the sample documents in the call to collectInputResults on line 124 so we can avoid multiple queries as that will improve performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #1481 to track this follow-up item.

AWSHurneyt added a commit to AWSHurneyt/OpenSearch-Alerting that referenced this pull request Mar 15, 2024
…ion message ctx. (opensearch-project#1450) (opensearch-project#1477)

* Enhance per bucket, and per document monitor notification message ctx. (opensearch-project#1450)

* Adding dev logs.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for returning sample documents for bucket level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing query/rule info in notification messages.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing document data in notification messages for document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Refactored logic after making AlertContext a separate class from Alert instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>

* Moved AlertContext data model from common utils to alerting plugin.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added additional unit tests.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted sample doc aggs logic into helper function. Added support for sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted get sample doc logic into helper function. Added sorting for sample docs.

Signed-off-by: AWSHurneyt <[email protected]>

* Removed dev code.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added comments based on PR feedback.

Signed-off-by: AWSHurneyt <[email protected]>

* Added logic to make mGet calls in batches.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

(cherry picked from commit 5dc690c)
Signed-off-by: AWSHurneyt <[email protected]>

* Fixed imports.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>
AWSHurneyt added a commit that referenced this pull request Mar 15, 2024
…tion message ctx. (#1450) (#1480)

* [Backport 2.x] Enhance per bucket, and per document monitor notification message ctx. (#1450) (#1477)

* Enhance per bucket, and per document monitor notification message ctx. (#1450)

* Adding dev logs.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for returning sample documents for bucket level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing query/rule info in notification messages.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing document data in notification messages for document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Refactored logic after making AlertContext a separate class from Alert instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>

* Moved AlertContext data model from common utils to alerting plugin.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added additional unit tests.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted sample doc aggs logic into helper function. Added support for sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted get sample doc logic into helper function. Added sorting for sample docs.

Signed-off-by: AWSHurneyt <[email protected]>

* Removed dev code.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added comments based on PR feedback.

Signed-off-by: AWSHurneyt <[email protected]>

* Added logic to make mGet calls in batches.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

(cherry picked from commit 5dc690c)
Signed-off-by: AWSHurneyt <[email protected]>

* Fixed imports.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

* [Backport 2.11] Backport #1427 and #1464 to 2.11 (#1479)

* Feature findings enhancemnt (#1427) (#1457)

* added support for  param in Finding API



* added detectionType as param for Findings API enhancements



* added searchString param in FIndingsAPI



* adding addiional params findingIds, startTime and endTime



---------


(cherry picked from commit 2420c2c)

Signed-off-by: Riya Saxena <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Findings API Enhancements changes and integ tests fix (#1464) (#1474)

* solution to fix integ tests

Signed-off-by: Riya Saxena <[email protected]>

* fix flaky DocumentMonitor Runner tests

Signed-off-by: Riya Saxena <[email protected]>

* fix findings API enhancemnts

Signed-off-by: Riya Saxena <[email protected]>

---------

Signed-off-by: Riya Saxena <[email protected]>
(cherry picked from commit ba84d04)

* fix integ tests

Signed-off-by: Joanne Wang <[email protected]>

---------

Signed-off-by: Riya Saxena <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Joanne Wang <[email protected]>
Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Riya <[email protected]>

* [Backport 2.x] Enhance per bucket, and per document monitor notification message ctx. (#1450) (#1477)

* Enhance per bucket, and per document monitor notification message ctx. (#1450)

* Adding dev logs.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for returning sample documents for bucket level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing query/rule info in notification messages.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted out helper function.

Signed-off-by: AWSHurneyt <[email protected]>

* Added support for printing document data in notification messages for document level monitors.

Signed-off-by: AWSHurneyt <[email protected]>

* Refactored logic after making AlertContext a separate class from Alert instead of inheriting/extending it in common utils.

Signed-off-by: AWSHurneyt <[email protected]>

* Moved AlertContext data model from common utils to alerting plugin.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added additional unit tests.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted sample doc aggs logic into helper function. Added support for sorting sample docs based on metric aggregations.

Signed-off-by: AWSHurneyt <[email protected]>

* Extracted get sample doc logic into helper function. Added sorting for sample docs.

Signed-off-by: AWSHurneyt <[email protected]>

* Removed dev code.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint errors.

Signed-off-by: AWSHurneyt <[email protected]>

* Added comments based on PR feedback.

Signed-off-by: AWSHurneyt <[email protected]>

* Added logic to make mGet calls in batches.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

(cherry picked from commit 5dc690c)
Signed-off-by: AWSHurneyt <[email protected]>

* Fixed imports.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed test.

Signed-off-by: AWSHurneyt <[email protected]>

* Fixed ktlint error.

Signed-off-by: AWSHurneyt <[email protected]>

---------

Signed-off-by: AWSHurneyt <[email protected]>
Signed-off-by: Riya Saxena <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Joanne Wang <[email protected]>
Co-authored-by: Joanne Wang <[email protected]>
Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Riya <[email protected]>
@chiarch84
Copy link

Dear @AWSHurneyt it is not clear to me weather this should fix or not the fact that {{ctx.results.0.hits.hits.0._source}} doesn't print anything. I tried OpenSearch version 2.16.0 but it seems not to work as in previous versions. Do I need to use a different syntax to make it work?

@AWSHurneyt
Copy link
Collaborator Author

Hi @chiarch84!

This is the documentation, and an example for these enhancements.
https://opensearch.org/docs/latest/observing-your-data/alerting/triggers/#mustache-template-example-1

A concern for us when making these enhancements was that the number of hits could easily be several thousand depending on the use case. Printing out a single _source from that list of thousands probably wouldn't be an issue, but mustache template supports iterating through lists, and printing values from each entry. Printing data in a notification message for thousands of hits could cause the message to exceed the size restrictions accepted by the destination webhook; causing the message to fail to send. So we added the sample_documents variable to the ctx to help manage the size of the notification message. We currently return a max of 10 sample docs, but we want to enhance that so it's a configurable cluster setting (no timeline for that just yet though).

@chiarch84
Copy link

Thank you for your answer @AWSHurneyt
I tried your suggestion in my OpenSearch version 2.16.0 but I still get an empty message.
Here the details:
Content of the message:

{{ctx.monitor.name}} just entered alert status. 
Please investigate the issue.

  - Severity: {{ctx.trigger.severity}}
  - Period start: {{ctx.periodStart}}
  - Period end: {{ctx.periodEnd}}
  - Hits:  {{ctx.results.0.hits.total.value}}
  - Alerts Info: {{ctx.results.0.hits}}

Alerts
{{#ctx.alerts}}
    Sample documents:
    {{#sample_documents}}
        Index: {{_index}}
        Document ID: {{_id}}
       
        Exc TYPE: {{_source.bdap_exception_type}}
        -----------------
    {{/sample_documents}}
    {{#associated_queries}}
        Name: {{name}}
        Id: {{id}}
        Tags: {{tags}}
    ------------------------
    {{/associated_queries}}
{{/ctx.alerts}}

Previewed message obtained:

Monitor BDAP data download - per QUERY just entered alert status. 
Please investigate the issue.

  - Severity: 3
  - Period start: 2024-11-26T12:10:52+01:00
  - Period end: 2024-11-26T12:20:52+01:00
  - Hits:  80
  - Alerts Info: [object Object]

Alerts

So even though there are 80 hits, I get nothing in the details part of Alerts.
Thank you for your help!

@AWSHurneyt
Copy link
Collaborator Author

@chiarch84 Just to clarify, are you only seeing the issue in the preview of the message, or is the variable missing when an actual message is received after a monitor execution?

I just tested using the 2.16.0 docker image, and was able to successfully print sample docs in the message when my per bucket monitor executed; but it does look like the preview isn't showing the samples correctly (I created issue opensearch-project/alerting-dashboards-plugin#1175 to track that visual bug).

Also, based on the message template you shared, it looks like the monitor type is per query? Unfortunately, the sample docs variable is only available for per bucket, and per document monitors currently.
https://opensearch.org/docs/latest/observing-your-data/alerting/triggers/#:~:text=Only%20available%20with%20bucket%2D%20and%20document%2Dlevel%20monitors.

@chiarch84
Copy link

@AWSHurneyt in fact my monitor is per query, so I will try with one per bucket and let you know.
I was not seeing the result nor in the preview nor in the email, but probably because it is not the correct type of monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants