Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take entity data into account for payload size #1483

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

varunch77
Copy link
Member

@varunch77 varunch77 commented Jan 2, 2025

Description of the issue

Due to the addition of the new EntityMetricData field to the metrics, the current calculation of the payload size is not accurate. The agent is undercounting the payload size by not accounting for entity data, which leads to instances where the data appears to be under the 1mb limit but is actually over it. As a result, the metrics are not batched properly and cannot be sent to CloudWatch.

Description of changes

  • Update the overheads and existing metric sizes to account for the entity data
  • Add a function to calculate entity size
  • Update existing unit tests to account for increase in metrics sizes

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

I added a test that replicates the original scenario that was resulting in a RequestEntityTooLarge error:

  • the test creates two identical sets of metrics except one set has entity data and other doesn't
  • the set of metrics without entity data fall just below the 1mb threshold
  • the set of metrics with entity data is pushed over the 1mb limit
  • the test checks that the set of metrics with entity data takes 2 API calls and the one without takes 1 API call

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@varunch77 varunch77 requested a review from a team as a code owner January 2, 2025 19:30
@varunch77 varunch77 changed the title Update estimated metric sizes Take entity data into account for payload size Jan 2, 2025
Copy link
Contributor

@chadpatel chadpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more complicated than this.

MetricData / EntityData are not directly related with regards to the PMD request structure
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html#API_PutMetricData_RequestParameters

We have a new top level item EntityMetricData

The current splitting code is here

c.metricDatumBatch.Size += payload(datums[i])

It looks like it increments the payload size based on the MetricDatum and then calls isFull to determine if the batch is full

The entity information is appended on line 179

It looks like we only send the datum in to the payload calculation on line 180.

I think we need to either update payload to also process the entityStr/Partition or += additional Size

Lisa's commit might help. Looks like Partition changed from a list to a map
9849658

I am not 100% sure where the serialization happens

Maybe BuildMetricDatum or WriteToCloudWatch. We need to estimate the payload size for the entity that is getting serialized

@chadpatel
Copy link
Contributor

I was expected testing beyond unit testing. Maybe testing end to end or re-producing the current >1MB payload scenario and demonstrating the payload is batched appropriately. maybe adding something to cloudwatch-agent-tests

plugins/outputs/cloudwatch/util.go Outdated Show resolved Hide resolved
plugins/outputs/cloudwatch/util.go Outdated Show resolved Hide resolved
plugins/outputs/cloudwatch/util.go Outdated Show resolved Hide resolved
plugins/outputs/cloudwatch/cloudwatch_test.go Outdated Show resolved Hide resolved
@@ -482,7 +485,7 @@ func TestPublish(t *testing.T) {
interval := 60 * time.Second
// The buffer holds 50 batches of 1,000 metrics. So choose 5x.
numMetrics := 5 * datumBatchChanBufferSize * defaultMaxDatumsPerCall
expectedCalls := numMetrics / defaultMaxDatumsPerCall
expectedCalls := 388 // Updated to match the observed number of calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to write a math instead of number to understand the math behind size calculations?

plugins/outputs/cloudwatch/util.go Outdated Show resolved Hide resolved
@lisguo
Copy link
Contributor

lisguo commented Jan 14, 2025

I would recommend getting consensus on the expected payload size from CW front end for a given metric. Let's make sure our assumptions are correct before merging this.

Also pr builds seem to be failing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants