Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness Testing 2.0 #12

Closed
wants to merge 1 commit into from
Closed

Conversation

fredjoonpark
Copy link

@fredjoonpark fredjoonpark commented Dec 19, 2024

Removed our deprecated correctness testing suite in place of a new approach for correctness testing.

  • Added a new class Mockmetheus in ./correctness. It's designed to mock remote operations from Prometheus using snappy-encoded payloads.
  • Previously, we had "test" cases that included functions and certain operators that include client-side logic (that stretch beyond remote operations). It now focuses exclusively on the correctness of remote read and write operations.
  • Added an optional flag FRESH_TSDB for running tests against a freshly new TSDB. The tests are designed to work either way, but defaults to false for convenience.

Running tests

  1. Go to ./correctness and bring up containers with docker compose up -d
  2. Run tests with docker exec -it mockmetheus pytest

Introduced test cases for easy reference:

  • Remote read on initialization
  • Remote read with a DNE metric
  • Remote write with an empty timeseries
  • Remote write with no labels
  • Remote write with no samples
  • Remote write & read with valid data
  • Remote write & read with multiple metrics
  • Remote write & read with multiple samples
  • Remote write & read with label matchers

Additional notes:

  • generate_test_run_id() - helps achieve test case independence by using an extra label in our read/write requests.
  • the stub (prom_pb2.py) is generated using: protoc --python_out=. prom.proto

Mockmetheus (pic for reference):
screenshot_2024-12-18_at_4 55 50___pm_720

"""
Parses a Prometheus duration string and returns the duration in milliseconds.
"""
pattern = r'(?:(\d+)h)?(?:(\d+)m)?(?:(\d+)s)?$'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Durations can be negative, for example, -5h. This wouldn't match the entirety of a negative duration.

Prometheus shows a negative time duration as one possible value in their documentation:

1s # Equivalent to 1.
2m # Equivalent to 120.
1ms # Equivalent to 0.001.
-2h # Equivalent to -7200.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness/mockmetheus.py Show resolved Hide resolved
assert(write_response == 200)

# ensures data is ingested into Timestream
time.sleep(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the other sleeps, can network latency have any impact on whether data is finished ingesting into Timestream?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, but I think it should be fair given what we can expect from their docs ("write-to-read latency is in the sub-second range."). Cross-region requests might add additional latency, perhaps we can make the network latency sleep configurable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it configurable is a good idea. I think using a named constant, like SLEEP_TIME, would be good enough.

Copy link

@trevorbonas trevorbonas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me once negative durations are supported and the test sleep time can be set with a named constant.

Dockerfile Show resolved Hide resolved
@@ -0,0 +1,289 @@
import os

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add License header for new files

1. Bring up containers with `docker compose up -d`
2. Run tests with `docker exec -it mockmetheus pytest`

### Notes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a cleanup section.

1. Run the following command to execute the correctness tests:
`go test -v ./correctness`
1. Bring up containers with `docker compose up -d`
2. Run tests with `docker exec -it mockmetheus pytest`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run individual tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! pytest test_correctness.py::test_write_no_data

working_dir: /app
command: ["sleep", "infinity"]
environment:
AWS_ACCESS_KEY_ID: "XXXXXXXXX"
Copy link

@forestmvey forestmvey Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values should be read from a env_file that is added to gitignore. Storing credentials in tracked files is an easy way to commit secrets.

2. Execute the following command to save the docker image as a compressed file and update the `version` appropriately:
`docker save timestream-prometheus-connector-docker | gzip > timestream-prometheus-connector-docker-image-<version>.tar.gz`
1. Ensure your docker version is >= `20.10.0`, or you have `docker-compose` installed
2. Update `docker-compose.yml` in this directory with your AWS credentials

## How to execute tests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a section how to run without docker?

It uses the same protobuf definitions as Prometheus to construct snappy-encoded
payloads for both remote-read and remote-write requests and responses.
"""
class Mockmetheus:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for using Python rather than Go. We are doing a lot of regex and custom parsing here for Prometheus requests rather than using Prometheus libraries.

For example we can use the github.com/prometheus/prometheus/promql/parser library for parsing Prometheus query strings. Using Prometheus libraries would also have the added benefit of being able to upgrade the dependencies for functional changes / bugs, and not mixing Go and Python without warrant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing a lot of regex and custom parsing here for Prometheus requests rather than using Prometheus libraries.

The primary goal of Mockmetheus is to control remote/read operations that a Prometheus client would typically perform. While our regex may not exactly match what Prometheus uses, we can ensure its accuracy through our test setups and the controlled environment we maintain. Our current regex and parsing approach is straightforward and should suffice unless there are specific edge cases or additional pre/post-processing steps that the Prometheus parser handles. If you think there are test cases we might be missing (cases that their parser can handle) please let me know and I will incorporate those tests to verify compatibility with our existing setup.

not mixing Go and Python without warrant.

What's wrong with having Go and Python in one repo? The separation is clear - our connector simply exposes two REST API endpoints, and how we test against these endpoints should not matter as long as we also have the capability of verification using these same endpoints. IOW the choice of language for correctness testing does not impact the functionality of the connector itself. As long as we can effectively test and verify the endpoints, the underlying language used for tests should be flexible. However, if consistency is a priority or if there are specific reasons to standardize on Go, I'm open to rewriting the tests in Go.

Copy link

@forestmvey forestmvey Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we can use any language for this implementation, we need some reason for language interop. If there is no benefit or clear purpose here for using Python, I would prefer we implement in Go. There is no need to write the parsing logic when we have the native parsing libraries available. Here is a small Go program which can parse a Prometheus query string, rather than introducing the required logic ourselves:

package main

import (
	"fmt"
	"log"

	"github.com/prometheus/prometheus/promql/parser"
)

func main() {
	query := `scrape_duration_seconds{}[5d]`
	node, err := parser.ParseExpr(query)
	if err != nil {
		log.Fatalf("Error parsing query: %v", err)
	}
	PrintQuery(node)
}

func PrintQuery(node parser.Node) {
	switch n := node.(type) {
	case *parser.MatrixSelector:
    fmt.Println("Metric name:", n.VectorSelector)
		fmt.Println("Range:", n.Range)
	default:
		fmt.Printf("Unknown query node type: %T\n", n)
	}
}

Copy link

@forestmvey forestmvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to resolve best approach for using Go vs Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants