Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logs agent integ test #433

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Add logs agent integ test #433

wants to merge 30 commits into from

Conversation

varunch77
Copy link
Member

@varunch77 varunch77 commented Nov 19, 2024

Description of the issue

We are currently missing tests that ensure that the logs agent is properly sending an entity when sending application logs to CloudWatch Logs.

Description of changes

TestWriteLogsWithEntityInfo() writes logs to CloudWatch and validates that the logs are associated with entities and that all values are translated properly. The three test cases are IAMRole, ServiceInConfig (which uses the added config_log_service_name.json configuration), and EC2Tags. The changes also include a suite of helper functions like ValidateEntity() to validate the logs and check that the expected entity information is present after querying the published logs.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

I ran all tests in the publish_logs_test.go file on a newly created EC2 instance with the most recent build of the cloudwatch agent and the end-to-end testing IAM role attached. I also verified that TestWriteLogsWithEntityInfo passed for every running order combination of the IAM, EC2, and Service test cases.

Test run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/12283029637

@varunch77 varunch77 requested a review from a team as a code owner November 19, 2024 14:54
configPathAutoRemoval = "resources/config_auto_removal.json"
standardLogGroupClass = "STANDARD"
infrequentAccessLogGroupClass = "INFREQUENT_ACCESS"
cwlPerfEndpoint = "https://logs.us-west-2.amazonaws.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on the fact that we are running the test in us-west-2...

I know we only run our tests in us-west-2 but can we detect the region on the host and then populate the endpoint accordingly so that in the future we can support other regions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may already be constrained by the region being hardcoded elsewhere: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/constant.go#L58C1-L60C78. We would need to make edits there but I'm a little wary of what the blast radius of that would look like for our testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to hardcode the endpoint here. The code I wrote originally had to use a custom endpoint because it was a beta endpoint but this is the public us-west-2 endpoint which the client will automatically resolve

@@ -70,6 +88,9 @@ var (
logGroupClass: types.LogGroupClassInfrequentAccess,
},
}
rnf *types.ResourceNotFoundException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just spell out the entire name

Suggested change
rnf *types.ResourceNotFoundException
resourceNotFoundException *types.ResourceNotFoundException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in commit 84cc3d3

)
if err != nil {
fmt.Println("There was an error trying to load default config: ", err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't, but I changed the behavior so that it will fail now if the default config fails in commit 84cc3d3

@@ -119,9 +161,9 @@ func TestWriteLogsToCloudWatch(t *testing.T) {

// ensure that there is enough time from the "start" time and the first log line,
// so we don't miss it in the GetLogEvents call
time.Sleep(sleepForFlush)
time.Sleep(sleepForExtendedFlush)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just increase sleepForFlush? Do we use sleepForFlush anywhere else in the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we use sleepForFlush in other places too. For the other test cases, waiting for 180 seconds instead of 20 seconds would be overkill and sometimes results in a timeout as it can cause all the test cases in the file take over an hour to run. This is because we call sleepForFlush before and after the agent runs and some of the test cases are set to loop multiple times so this extra 160 seconds adds up quickly. We only want the tests to wait for 180 seconds in two places, which is why I just created another variable.


ValidateEntity(t, instanceId, instanceId, &end, testCase.expectedEntity)

f.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like we can f.Close() after we are done writing on line 285

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferred both of these calls

log.Printf("Validating entity for log group: %s, stream: %s", logGroup, logStream)

logGroupInfo, err := getLogGroup()
for _, lg := range logGroupInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to get a specific log group when you provide the name?

Seems expensive to get all log groups in an account. We have a ton in our test account..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a way to just search for one log group (why not is beyond me..) but I updated this so it passes in the log group as a prefix to search by (commit 006a057)

}
}
assert.NoError(t, err)
begin := end.Add(-sleepForExtendedFlush * 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to calculate the begin time, can we just pass in the begin time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this for ValidateEntity() (commit 7950725)

func getLogQueryId(logGroup string, since, until *time.Time) (*string, error) {
var queryId *string
params := &cloudwatchlogs.StartQueryInput{
QueryString: aws.String("fields @message, @entity.KeyAttributes.Type, @entity.KeyAttributes.Name, @entity.KeyAttributes.Environment, @entity.Attributes.PlatformType, @entity.Attributes.EC2.InstanceId"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make this a const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in commit 84cc3d3

if !assert.NotZero(t, len(result)) {
return
}
requiredEntityFields := map[string]bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why we need this...when we pass in the query:

fields @message, @entity.KeyAttributes.Type, @entity.KeyAttributes.Name, @entity.KeyAttributes.Environment, @entity.Attributes.PlatformType, @entity.Attributes.EC2.InstanceId

Doesn't the fact that there are results mean that the logs all have the required fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 510 isn't checking the content of the logs but instead checking to see that our query actually returned logs at all. If our query came back empty handed, we want to fail immediately instead of attempting to validate it and check that its fields match our expectations.

@varunch77
Copy link
Member Author

configPathAutoRemoval = "resources/config_auto_removal.json"
standardLogGroupClass = "STANDARD"
infrequentAccessLogGroupClass = "INFREQUENT_ACCESS"
cwlPerfEndpoint = "https://logs.us-west-2.amazonaws.com"
pdxRegionalCode = "us-west-2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region hardcoding is not needed since entity feature is GA

log.Fatalf("Failed to load default config: %v", err)
}

cwlClient = cloudwatchlogs.NewFromConfig(awsCfg, func(o *cloudwatchlogs.Options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to create a new client. we have globally available client that already uses us-west-2 as the region" https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/constant.go#L74

cwlClient = cloudwatchlogs.NewFromConfig(awsCfg, func(o *cloudwatchlogs.Options) {
o.BaseEndpoint = aws.String(cwlPerfEndpoint)
})
ec2Client = ec2.NewFromConfig(awsCfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

func ValidateEntity(t *testing.T, logGroup, logStream string, begin, end *time.Time, expectedEntity expectedEntity) {
log.Printf("Validating entity for log group: %s, stream: %s", logGroup, logStream)

logGroupInfo, err := getLogGroup(logGroup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use IsLogGroupExists from our common package: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/cloudwatchlogs.go#L137

The helper functions that I wrote was because we had to create a new client in us-east-1 for beta purpose but they are no longer needed

queryId, err := getLogQueryId(logGroup, begin, end)
assert.NoError(t, err)
log.Printf("queryId is " + *queryId)
result, err := getQueryResult(queryId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLogQueryId and getQueryResult are not needed anymore since we can just use default client now. There's a helper function here: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/util/awsservice/cloudwatchlogs.go#L202

@@ -332,3 +497,154 @@ func checkData(t *testing.T, start time.Time, lineCount int) {
)
assert.NoError(t, err)
}

func ValidateEntity(t *testing.T, logGroup, logStream string, begin, end *time.Time, expectedEntity expectedEntity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonblocking but recommended: This function has a lot of overlaps with this function: https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/test/entity/entity_test.go#L241

Maybe good idea to move ValidateEntity to a common package and just use for any entity validation tests in the future.

@varunch77
Copy link
Member Author

varunch77 commented Dec 5, 2024

Addressed comments above, including merging ValidateEntity with ValidateLogEntity and moving the resulting function to the util package so it can be used in multiple places for future entity validation tests.

New run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/12283029637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants