-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logs agent integ test #433
base: main
Are you sure you want to change the base?
Conversation
configPathAutoRemoval = "resources/config_auto_removal.json" | ||
standardLogGroupClass = "STANDARD" | ||
infrequentAccessLogGroupClass = "INFREQUENT_ACCESS" | ||
cwlPerfEndpoint = "https://logs.us-west-2.amazonaws.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on the fact that we are running the test in us-west-2...
I know we only run our tests in us-west-2 but can we detect the region on the host and then populate the endpoint accordingly so that in the future we can support other regions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may already be constrained by the region being hardcoded elsewhere: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/constant.go#L58C1-L60C78. We would need to make edits there but I'm a little wary of what the blast radius of that would look like for our testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to hardcode the endpoint here. The code I wrote originally had to use a custom endpoint because it was a beta endpoint but this is the public us-west-2 endpoint which the client will automatically resolve
@@ -70,6 +88,9 @@ var ( | |||
logGroupClass: types.LogGroupClassInfrequentAccess, | |||
}, | |||
} | |||
rnf *types.ResourceNotFoundException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just spell out the entire name
rnf *types.ResourceNotFoundException | |
resourceNotFoundException *types.ResourceNotFoundException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in commit 84cc3d3
) | ||
if err != nil { | ||
fmt.Println("There was an error trying to load default config: ", err) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this fail the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't, but I changed the behavior so that it will fail now if the default config fails in commit 84cc3d3
@@ -119,9 +161,9 @@ func TestWriteLogsToCloudWatch(t *testing.T) { | |||
|
|||
// ensure that there is enough time from the "start" time and the first log line, | |||
// so we don't miss it in the GetLogEvents call | |||
time.Sleep(sleepForFlush) | |||
time.Sleep(sleepForExtendedFlush) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just increase sleepForFlush
? Do we use sleepForFlush
anywhere else in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we use sleepForFlush
in other places too. For the other test cases, waiting for 180 seconds instead of 20 seconds would be overkill and sometimes results in a timeout as it can cause all the test cases in the file take over an hour to run. This is because we call sleepForFlush
before and after the agent runs and some of the test cases are set to loop multiple times so this extra 160 seconds adds up quickly. We only want the tests to wait for 180 seconds in two places, which is why I just created another variable.
|
||
ValidateEntity(t, instanceId, instanceId, &end, testCase.expectedEntity) | ||
|
||
f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like we can f.Close()
after we are done writing on line 285
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deferred both of these calls
log.Printf("Validating entity for log group: %s, stream: %s", logGroup, logStream) | ||
|
||
logGroupInfo, err := getLogGroup() | ||
for _, lg := range logGroupInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to get a specific log group when you provide the name?
Seems expensive to get all log groups in an account. We have a ton in our test account..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a way to just search for one log group (why not is beyond me..) but I updated this so it passes in the log group as a prefix to search by (commit 006a057)
} | ||
} | ||
assert.NoError(t, err) | ||
begin := end.Add(-sleepForExtendedFlush * 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of trying to calculate the begin time, can we just pass in the begin time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this for ValidateEntity()
(commit 7950725)
func getLogQueryId(logGroup string, since, until *time.Time) (*string, error) { | ||
var queryId *string | ||
params := &cloudwatchlogs.StartQueryInput{ | ||
QueryString: aws.String("fields @message, @entity.KeyAttributes.Type, @entity.KeyAttributes.Name, @entity.KeyAttributes.Environment, @entity.Attributes.PlatformType, @entity.Attributes.EC2.InstanceId"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make this a const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in commit 84cc3d3
if !assert.NotZero(t, len(result)) { | ||
return | ||
} | ||
requiredEntityFields := map[string]bool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why we need this...when we pass in the query:
fields @message, @entity.KeyAttributes.Type, @entity.KeyAttributes.Name, @entity.KeyAttributes.Environment, @entity.Attributes.PlatformType, @entity.Attributes.EC2.InstanceId
Doesn't the fact that there are results mean that the logs all have the required fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 510 isn't checking the content of the logs but instead checking to see that our query actually returned logs at all. If our query came back empty handed, we want to fail immediately instead of attempting to validate it and check that its fields match our expectations.
configPathAutoRemoval = "resources/config_auto_removal.json" | ||
standardLogGroupClass = "STANDARD" | ||
infrequentAccessLogGroupClass = "INFREQUENT_ACCESS" | ||
cwlPerfEndpoint = "https://logs.us-west-2.amazonaws.com" | ||
pdxRegionalCode = "us-west-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
region hardcoding is not needed since entity feature is GA
log.Fatalf("Failed to load default config: %v", err) | ||
} | ||
|
||
cwlClient = cloudwatchlogs.NewFromConfig(awsCfg, func(o *cloudwatchlogs.Options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to create a new client. we have globally available client that already uses us-west-2 as the region" https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/constant.go#L74
cwlClient = cloudwatchlogs.NewFromConfig(awsCfg, func(o *cloudwatchlogs.Options) { | ||
o.BaseEndpoint = aws.String(cwlPerfEndpoint) | ||
}) | ||
ec2Client = ec2.NewFromConfig(awsCfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
func ValidateEntity(t *testing.T, logGroup, logStream string, begin, end *time.Time, expectedEntity expectedEntity) { | ||
log.Printf("Validating entity for log group: %s, stream: %s", logGroup, logStream) | ||
|
||
logGroupInfo, err := getLogGroup(logGroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use IsLogGroupExists
from our common package: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/cloudwatchlogs.go#L137
The helper functions that I wrote was because we had to create a new client in us-east-1 for beta purpose but they are no longer needed
queryId, err := getLogQueryId(logGroup, begin, end) | ||
assert.NoError(t, err) | ||
log.Printf("queryId is " + *queryId) | ||
result, err := getQueryResult(queryId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getLogQueryId
and getQueryResult
are not needed anymore since we can just use default client now. There's a helper function here: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/cloudwatchlogs.go#L202
@@ -332,3 +497,154 @@ func checkData(t *testing.T, start time.Time, lineCount int) { | |||
) | |||
assert.NoError(t, err) | |||
} | |||
|
|||
func ValidateEntity(t *testing.T, logGroup, logStream string, begin, end *time.Time, expectedEntity expectedEntity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonblocking but recommended: This function has a lot of overlaps with this function: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/test/entity/entity_test.go#L241
Maybe good idea to move ValidateEntity
to a common package and just use for any entity validation tests in the future.
Addressed comments above, including merging New run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/12283029637 |
Description of the issue
We are currently missing tests that ensure that the logs agent is properly sending an entity when sending application logs to CloudWatch Logs.
Description of changes
TestWriteLogsWithEntityInfo()
writes logs to CloudWatch and validates that the logs are associated with entities and that all values are translated properly. The three test cases areIAMRole
,ServiceInConfig
(which uses the addedconfig_log_service_name.json
configuration), andEC2Tags
. The changes also include a suite of helper functions likeValidateEntity()
to validate the logs and check that the expected entity information is present after querying the published logs.License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
I ran all tests in the
publish_logs_test.go
file on a newly created EC2 instance with the most recent build of the cloudwatch agent and the end-to-end testing IAM role attached. I also verified thatTestWriteLogsWithEntityInfo
passed for every running order combination of the IAM, EC2, and Service test cases.Test run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/12283029637