Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during traffic switch causes 0% traffic for all stacks #560

Open
ePaul opened this issue Apr 4, 2019 · 0 comments
Open

Error during traffic switch causes 0% traffic for all stacks #560

ePaul opened this issue Apr 4, 2019 · 0 comments

Comments

@ePaul
Copy link

ePaul commented Apr 4, 2019

Background

We had two stacks, one with 100% weight, and a broken one (failed deployment) with 0% weight.
After deploying a third stack, we (our CD system) was switching traffic to it:

13:35:56.202 Running: /tools/run registry.opensource.zalan.do/stups/toolchain-stups:22 -- senza traffic purchase-orders-management.yaml 201904041320 100 --region eu-central-1
13:35:59.030 Calculating new weights.. OK
13:35:59.031 Stack Name                │Version     │Identifier                             │Old Weight%│Delta │Compensation│New Weight%│Current
13:35:59.031 purchase-orders-management              purchase-orders-management-201904031151         0.0                             0.0         
13:35:59.031 purchase-orders-management 201903281417 purchase-orders-management-201903281417       100.0 -100.0                      0.0         
13:35:59.031 purchase-orders-management 201904041320 purchase-orders-management-201904041320         0.0  100.0                    100.0 <       
13:36:01.074 Setting weights for purchase-orders-management.goodbuy.zalan.do...Validation Error: Stack:arn:aws:cloudformation:eu-central-1:383379053614:stack/purchase-orders-management-201904031151/0ecefee0-56ca-11e9-99be-026d43bbed96 is in CREATE_FAILED state and can not be updated.

So the traffic switching failed because of the broken stack. So far, so good.

Problem

But when looking at the setting later, it looked like that:

$ senza traffic purchase-orders-management
Stack Name                │Version     │Identifier                             │Weight%
purchase-orders-management              purchase-orders-management-201904031151     0.0 
purchase-orders-management 201903281417 purchase-orders-management-201903281417     0.0 
purchase-orders-management 201904041320 purchase-orders-management-201904041320     0.0 

So now all stacks (including the broken one) had a weight of 0.0. That is definitely not correct.

Guess on what happened

Looking into the code of senza traffic, it looks like the command computes the new percentages (and displays them, as we can see), and then goes through them one-by-one, issuing the API call to change the weights. As soon as one of them fails, the whole command stops.

This here seems to have the effect that first version 201903281417 is set to 0, then the broken stack is tried to update (which fails), and the setting of 201904041320 to 100 is not even tried.

What should happen

When switching the traffic, the weight-increasing of some instances should be done before decreasing the weight of other instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant