-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container: fix resourceManagerTags tests #12728
base: main
Are you sure you want to change the base?
container: fix resourceManagerTags tests #12728
Conversation
Hello! I am a robot. Tests will require approval from a repository maintainer to run. @ScottSuarez, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look. You can help make sure that review is quick by doing a self-review and by running impacted tests locally. |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are.
|
Tests analyticsTotal tests: 219 Click here to see the affected service packages
Action takenFound 2 affected test(s) by replaying old test recordings. Starting RECORDING based on the most recent commit. Click here to see the affected tests
|
🟢 Tests passed during RECORDING mode: 🟢 No issues found for passed tests after REPLAYING rerun. 🔴 Tests failed during RECORDING mode: 🔴 Errors occurred during RECORDING mode. Please fix them to complete your PR. |
@ScottSuarez thanks for running these. Can you see what the error is - if it just needs a re-run after bootstrapping (though I'd have thought they'd already be in place from the other runs), or if you see a different error? The autopilot test worked for me, but I'll try again as well. |
sure !
|
e1022bf
to
447b0c4
Compare
@ScottSuarez I was able to reproduce a similar failure locally on another run. I think the first pass maybe didn't go far enough, so made some more adjustments. I don't think we can use I rebased, made some adjustments, and will push up the fix if that seems to help, but you may want to run a couple of times in CI, even if it works. There's an outside chance that it could still be a little flaky, but hopefully this helps. I ran into some issues when running both tests at the same time, but I assume / hope that won't be an issue with the way these run in the actual test suite (is there a chance that the two tests run in parallel against the same project at any point)? The autopilot is a pain to test because it takes a while, and is kind of finnicky. With the updated code, I'm getting these failures semi-consistently on the autopilot one, occasionally (on the same step, I am thinking 2nd step, based on the requested tags and since it's 3/6 and there are two runs per step):
Under the hood, this is what's happening (heavily snipped, but I think it gives a flavor of what's happening):
If I'm reading this right, the cluster thinks it's in ready state, but the underlying GCE instances aren't actually ready to take the update yet? Not sure if this expected behavior and / or if an issuetracker bug should be filed about the cluster reporting being ready when it's not (I didn't find a public one, nor did I find any search results for this exact error message). The 120s sleep in the Terraform code won't apply after the first step, because it only applies to create actions.... next thing I'm trying is adding a sleep within step 2 of the test itself, though this will slow down test execution further, even in replaying mode (side note: I see VCR enabled tests with |
Another possibility for this failing earlier in CI is, as mentioned above, if the two tests are both recording in parallel, maybe the grant for |
@@ -3635,6 +3643,9 @@ func TestAccContainerCluster_withAutopilotResourceManagerTags(t *testing.T) { | |||
{ | |||
Config: testAccContainerCluster_withAutopilotResourceManagerTagsUpdate1(pid, clusterName, clusterNetName, clusterSubnetName, randomSuffix), | |||
Check: resource.ComposeTestCheckFunc( | |||
// Small sleep first, to avoid condition where cluster is ready but underlying GCE | |||
// resources apparently aren't. | |||
acctest.SleepInSecondsForTest(30), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other possibility is that I ran into some sort of shorter term problem while I was testing this and that it's not actually necessary. But this did reliably seem to get rid of the issue I mentioned in the PR comments.
See notes and linked issue about whether this is really needed in the case of VCR, though.
@ScottSuarez This PR has been waiting for review for 3 weekdays. Please take a look! Use the label |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are.
|
Tests analyticsTotal tests: 219 Click here to see the affected service packages
Action takenFound 2 affected test(s) by replaying old test recordings. Starting RECORDING based on the most recent commit. Click here to see the affected tests
|
🟢 Tests passed during RECORDING mode: 🟢 No issues found for passed tests after REPLAYING rerun. 🔴 Tests failed during RECORDING mode: 🔴 Errors occurred during RECORDING mode. Please fix them to complete your PR. |
@ScottSuarez so, assuming the error is the same as before, I think the issue may be the issue with the two parallel tests trying to manage the same permissions for Separately, we could decide whether to keep or get rid of that extra sleep. |
|
Yeah looks like that might be the case. The terraform refresh seems to suggest that we're modifying a singleton. I would say we should maintain separate values for these two tests if we can. I haven't looked into it in as much detail as you as to if that would be possible |
Is there a reason we still maintain a |
@ScottSuarez what I was getting at above is that
So from what I can see, I don't believe it's currently possible to use I think with golang not having default function parameters, and the nesting of those functions, it could get a little messy to do, maybe something like if it's nil, use I can switch the existing set of calls for 3 spearate roles from I can also try taking that one permission out entirely and see if either / both tests work without it. I'm not sure of the exact use but I assume it's actually necessary in both cases. |
447b0c4
to
f760199
Compare
Ah this makes sense. I think it would be reasonable to make the IAM function more general here and allow modification of the email. Just expose new wrapper functions. |
9a24425
to
087a798
Compare
087a798
to
9a24425
Compare
Ok, I have a pretty simple implementation that is hacky, but works (tested in replaying mode, and testing running both concurrently in recording mode now locally), and that I think is somewhat flexible while not having too many changes. The other big win here is that this creates a massive improvement in speed in replaying mode, since the Nx 120 sleeps (from the terraform code that was removed) are no longer present. I pushed it up in its own commit, and cherry-picked into #12784 - my suggestion is that if we can get fast review on this, it will be cleaner for the commit history if it's merged first, and then I can rebase this. |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are.
|
Also @ScottSuarez I tried backing off that sleep to 15s, and got more failures locally. It seems to me like maybe this is a bug. I'm guessing this is something that will not rank as very important, but I created https://issuetracker.google.com/issues/390456348 anyway, this way I can at least link to that in the comment in the code here. |
#12785 -- this is a potential solution to the sleep in |
In the meantime, I'll push up a fix for hashicorp/terraform-provider-google#19997 |
Tests analyticsTotal tests: 4449 Click here to see the affected service packages
Action takenFound 5 affected test(s) by replaying old test recordings. Starting RECORDING based on the most recent commit. Click here to see the affected tests
|
🟢 Tests passed during RECORDING mode: 🔴 Tests failed when rerunning REPLAYING mode: Tests failed due to non-determinism or randomness when the VCR replayed the response after the HTTP request was made. Please fix these to complete your PR. If you believe these test failures to be incorrect or unrelated to your change, or if you have any questions, please raise the concern with your reviewer. 🔴 Tests failed during RECORDING mode: 🔴 Errors occurred during RECORDING mode. Please fix them to complete your PR. |
@GoogleCloudPlatform/terraform-team @ScottSuarez This PR has been waiting for review for 1 week. Please take a look! Use the label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting on IAM changes from out of band PR
…the tagAdmin IAM permission in order to update the tag.
Fixes #20252 Closes GoogleCloudPlatform#12376 Rework of GoogleCloudPlatform#12376 by @MaChenhao
4bce450
to
ecc4b37
Compare
@ScottSuarez @melinath thanks for the quick fix.... ecc4b37 includes the changes from #12796, and seems to work for me locally. Only open question is whether it's Ok to share that bootstrap function (modeled after the example in there) across the container and node pool test files, or if I should duplicate it and / or extract out to another file. Happy to change the naming of that function if you'd like. |
I don't have a strong feeling about it. Whatever @ScottSuarez thinks WFM. |
Rework of #12376 by @MaChenhao
This fixes some tests that may have started failing after my #12014.
Turned out to be a bigger can of worms than I expected, and is now updated to include autopilot and nodepool variants as well.
Fixes hashicorp/terraform-provider-google#19997
Fixes hashicorp/terraform-provider-google#20252
Closes #12376
Release Note Template for Downstream PRs (will be copied)