-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sam delete
gets confused about which template to delete from S3
#5254
Comments
@garretwilson Can you provide a workable example. The details you have describe what is happening but without something I can actually validate with, it's hard to repro (effectively just guessing at your template and setup). |
I can give you basically everything I have. Here is basically what my template is: AWSTemplateFormatVersion: '2010-09-09'
Transform:
- AWS::LanguageExtensions
- AWS::Serverless-2016-10-31
Description:
Example functions.
Parameters:
Ver:
Description: The version of the service, such as "1.2.3".
Type: String
AllowedPattern: '[\w.+-]+'
ConstraintDescription: The service version must use only word characters, dots, plus signs, and dashes.
Globals:
Function:
CodeUri: !Sub "s3://my-sam-bucket/foo-bar-functions-${Ver}.jar"
Runtime: java17
Architectures: [x86_64]
MemorySize: 512
Timeout: 300
Resources:
ExampleFunction:
Type: AWS::Serverless::Function
Properties:
Handler: com.example.FooBar::foo Here is my version = 0.1
[default]
[default.global.parameters]
stack_name = "placeholder" # will be specified from the CLI
[default.build.parameters]
cached = true
parallel = true
[default.validate.parameters]
lint = true
[default.deploy.parameters]
capabilities = "CAPABILITY_IAM"
confirm_changeset = true
[default.package.parameters]
[default.sync.parameters]
watch = true
[default.local_start_api.parameters]
warm_containers = "EAGER"
[default.local_start_lambda.parameters]
warm_containers = "EAGER" I'm afraid I don't have time to create some turnkey Zip package. I'm already delayed in this project; I haven't even got to the point of deploying actual functionality, as I spent so much time just getting the deployment working (#5249). 😅 Really there isn't much here in my configuration and code; I can't imagine it would be hard to reproduce. If it's not happening for you, the first place I would look would be the You might check into line endings as well. I'm on a Windows machine using CRLF in the templates. If your hashing function converts to LF when doing the upload and then compares with that in the local directory without normalizing line endings, then that could lead to mismatched hashes as well. Just giving you ideas to look into. |
@garretwilson Thanks that is enough. I don't need a full example, I can fill in the details but with nothing to base on you are leaving us guessing, which leads to "works on my machine can't repro". Providing clear and concise issues with examples is the key to making interactions faster with the team. |
Sure, I understand. My configuration is so simple I would guess it will be super-easy to reproduce. But if you can't, just let me know and I'll think more about what further could be different between our environments. A step at a time I can put in more work for turn-key reproducibility if needed, but at this point I'm guessing it won't be needed. Good luck and have a good week. |
I am not able to repro this on 1.85.0, using various methods. The likely place this could fail is https://github.com/aws/aws-sam-cli/blob/develop/samcli/commands/delete/delete_context.py#L273 as we try to compute the hash from the downloaded template but without being able to repro, it's hard to say for certain. |
OK. I'll put together something even more turnkey, although it might be a few days. (Fortunately this particular is nowhere near a blocker; I'm just reporting it to help tidy up loose ends.) In the meantime, before I put a lot of time into this, can you just confirm that you have tested it from the command line on Windows 10? |
I am not on Windows. It's possible this is Windows specific related but don't have direct access to a Windows machine to verify on you exact setup. |
Can you at least confirm that you've tested this with a template using CRLF line endings? You should be able to do that on any platform. VS Code can even do the conversion for you if you ask it to. |
This is odd:
Still SAM CLI tries to delete a different hash-named template than the one it uploaded, brazenly showing me the hashes that have nothing to do with each other. (In addition it appears that if I remove the I am running out of ideas for why you can't reproduce this; I haven't found a way to not reproduce it on my end. The only thing I could even possible think of is that I am invoking SAM from a Bash script in Git Bash for Windows. But that has a very low chance of being related. I'll think of more ways I can narrow this down and provide you something to reproduce this, but it's not going to be very soon. I spent half of yesterday tracking down why Log4J (which the AWS libraries force me to use) was producing a warning. (Spoiler: it's because Log4J uses a multi-release POM, and AWS lambda doesn't know how to deal with multi-release POMs, so Lambda was using old pre-Java 9 classes. It may impact |
@garretwilson Are you using Leaving what I was testing below to not loose context in the future and could not repro even with subs and transforms. template.yaml
commands:
Notes: |
I am using
It happens for me even when I take out the
OK. Or you can just drag and drop the JAR into the S3 console, and hard-code the Just to make sure we're on the same page (sorry if you understood this, but it wasn't clear in your last comment), the problem is not with Thanks for looking at this, but it's not my priority at the moment as I can live with an extra orphaned template scattered around on the server. Right now I'm writing a bug report for Eclipse 2023-06 RC1. It seems that ever 15 minutes some other tool breaks. You can imagine how much real work I've managed to do in the past few days with Wait … is today tequila Tuesday? (Or was that "taco Tuesday" …) |
Hey @garretwilson I've tried steps you mentioned here, and wasn't able to re-produce the problem. One thing that caught my attention though. Are you using a JAR file which is already uploaded to S3, or are you using a local file and let SAM CLI to upload it with deploy flow? Because your example contains a JAR file in S3, but you mention that it uploads the JAR file during deployment process.
Could there be additional steps that you are doing before or after deploy? Just trying to see where this is coming from. Thanks! |
@mndeveci thank you for looking into this.
I'm using a JAR file that my own script uploads to S3. Then I reference that S3 blob path from my SAM template, but using a In fact my actual solution is described in painstaking detail in a comment to that ticket. And it's working great. (Since I'm doing all the work to upload the JAR file myself, though, I'm seeing less and less of a need for SAM.) So basically there will be a |
I see your point. Unfortunately intrinsic support in SAM CLI is very limited since there is no library or API endpoint that we can use to resolve them inside the template. I am trying to see any workarounds, you might look at custom builds (where you need to provide a Makefile for your function's build process) you might try following options (both of them will be creating a template file with
Going back to your question for this issue, I've checked the So if you are deploying through SAM CLI, and then making some changes to your template and re-deploy again by another tool, it is possible that we can't estimate the location of the template file therefore that step fails during |
Thank you, but this ticket was not to request workarounds to #5249. I have given up on SAM in this area, and created my own workaround, which is working splendidly. This ticket was opened for a different reason.
I am not using
👍
I've indicated previously in this ticket that my gut instinct is that SAM is somehow getting confused with the hash of the template before and after variable substitution. The whole CloudFormation template thing SAM is built on is very brittle to begin with, as illustrated by #5249, and in fact the very need to have the
I am not doing this. I am deploying the template via SAM CLI, making zero changes to the template via any means, and then using SAM CLI to delete the stack. Again this ticket has low priority; I really don't care so much at this point. There are much bigger usability concerns such as aws-cloudformation/cfn-language-discussion#127 . But thanks again for reading it and thinking about it. |
But where does the content of the "deployed template" come from? Here's a snippet from the source code you indicated: def get_uploaded_s3_object_name(
precomputed_md5: Optional[str] = None,
file_content: Optional[str] = None,
file_path: Optional[str] = None,
extension: Optional[str] = None,
) -> str: Where are you getting that So where is this "deployed template" that is byte-for-byte identical to what is in the S3 bucket, that you would expect to generate a hash of and have it match that of the original template source code? |
Unfortunately we can't use what was deployed before, since you might deploy your stack in one place and might try to delete from another. So that is why we are getting the template from CFN, and then trying to estimate file location by getting its hash.
I think what we are getting is what you are seeing in CFN. We are using get_template API with
CFN doesn't have an API to return deployed template location, but it looks like, it might be recorded in CloudTrail (see). Can you check those events to see where the template is stored? It might also give you some clue about why it is different from the local one. |
Why go to all that trouble in such a roundabout way? Why not just add the hash as a tag to the stack? Or better yet, the bucket name and path? Sure, the user could delete the tag. They can go into the S3 bucket and rename the staged, template too. The user can do anything. But there's no reason for them to muck with this tag, and if they know not to, then they probably won't, just like they don't rename the staged template in the SAM bucket. So why not do it the easy, direct way, instead of trying to work into it backwards by recalculating the hash and then looking up the filename? Another question: why do you have to stage the SAM template to begin with? I can deploy a CF template directly without staging the template as far as I know. Why do we have to stage it in an S3 bucket with SAM? I'm honestly curious. |
It is doable, but
With |
I see. Thanks for explaining. That's good to know. I'm looking at the API and I see that there is a |
I still intend to put together a reproducible test (which shouldn't be hard because I only get this behavior), but as I think about this day-to-day, there are a few doubts I still have:
|
Since most of my day has been wasted trying to do simple things and running into aws/serverless-application-model#3265 (trival), aws/serverless-application-model#3264 (huge), #5533 (irritating and time-wasting), and aws/serverless-application-model#3261 (pales in comparison to the others), I decided I might as well spend the rest of the day getting you a reproducible test case. I started paring down my code to see what triggers this, and I ran into something else. I removed the But then when I tried to do
Odd—no warning when deploying with no staging bucket name, but a warning when deleting without a staging bucket name. (I had thought it would create a default staging bucket; I was trying to pare down the test case to as small as possible. I guess I am not remembering correctly. I'll use a hard-coded bucket name.) Anyway, this is probably a separate bug, but I'm pretty tired from all the bug reporting today already. For now I'll just leave it in this comment. I don't want that to sidetrack us from the main issue for this ticket. |
OK, @jfuss and others, I've created a minimal test case case to reproduce this issue. It puzzles me greatly that you cannot reproduce it, because I have been unable to not reproduce this. (I have a fear that, after all the time I've spent, someone is finally going to try it on Windows and find that, "oh, this always happens on Windows"—but let's see ….) I've provided the files in a ZIP file which I've attached and linked in the instructions below. Before going further, let me be clear about my configuration:
Note: The scripts in the ZIP file probably don't have their executable bit set. They don't need that when when running from Git Bash on Windows. If you test them in *nix, I'm sure you know that you need to do a
|
@garretwilson thanks for working on this (even on Saturday!). Thanks to your step-by-step example, I was able to find the root cause of it (finally). As I mentioned above, we are getting deployed template and calculate its hash to find its location in S3, this works well on Linux and MacOS but it fails on Windows. The reason is, python does some extra stuff when writing strings to a file on Windows;
So what we get from CFN is this;
What is written into the file is this (
This auto conversion is why you get a different hash on Windows and that leads into the original problem of this issue where it can't find the S3 object for deletion. I've tried with following options and both of them worked so far, I need to check with the team and to select one or the other.
Note: This is the part where it is creating this problem https://github.com/aws/aws-sam-cli/blob/develop/samcli/lib/package/local_files_utils.py#L53-L57 |
I'm coming in late at night and it's not wise for me to be commenting extensively on this at the moment, but I believe that the standard practices would probably be simply to normalize to |
Good morning! I want to say thanks again for continuing to look at this, and it's wonderful to finally get to the bottom of it. Since we're all friends here, I do want to point out a couple of things, though. I filed this ticket well over a month ago. I described it concisely, precisely, and thoroughly. My first guess was that you were inconsistently hashing before/after the transform, but two days later I said:
And that was "purt near" (as the cowboys say) exactly what the problem was conceptually. All this back and forth and countless hours and $X was a complete waste—that is, we gained nothing of value from it, and we could have been doing something more productive with our time. It is my understanding that just a single test on Windows for any template a month ago would have reproduced this problem. The other thing I want to point out is that this probably affected hundreds if not thousands of Windows users. Most people don't file tickets. Most people just see a glitch and think, "it's part of this rough-and-tumble Wild West of the cloud; I'll just ignore it." I on the other hand take the time to file tickets. What I'm trying to get to is … if I ever sound grumpy when I get pushback on tickets I take the time to create, please remember this ticket, cut me a little slack, and don't hold it against me. 😁❤️ Again my sincere thanks for continuing to follow this and look into it. |
Patch is released in v1.94.0. Closing |
I'm using SAM 1.84.0 on Windows 10.
I build a JAR file with Maven and then directly deploy it using
sam deploy
like this:As expected SAM CLI renames the JAR file to a hash
11111111111111111111111111111111
and uploads it to the S3 bucket. Then it does something to the template, renames it to a hash22222222222222222222222222222222.template
, and uploads that to the S3 bucket as well. (These hashes are just contrived example values of course.)And indeed if I look in the S3 bucket, those are the exact two files that were uploaded!
Then I immediately delete the stack using
sam delete
like this:sam delete \ --profile $awsProfile \ --stack-name my-stack \ --s3-bucket my-sam-bucket
Oddly SAM asks me:
Wait, what is
33333333333333333333333333333333.template
? There is no33333333333333333333333333333333.template
on S3. SAM never uploaded a33333333333333333333333333333333.template
file. SAM never said it uploaded a33333333333333333333333333333333.template
. SAM told me (truthfully) that it uploaded a22222222222222222222222222222222.template
file.So I say, "Sure, SAM, go ahead and delete whatever you want," and SAM goes away and then comes back to say:
It's not surprising SAM couldn't find
33333333333333333333333333333333.template
, because it never uploaded such a file. And SAM left22222222222222222222222222222222.template
on S3.Why did SAM get confused and try to delete the non-existent file
33333333333333333333333333333333.template
, when it really had uploaded22222222222222222222222222222222.template
?(I don't know if it's relevant, but I'm using
AWS::LanguageExtensions
as one of the transforms. I'll just make a wild guess here with nothing to base it on, but is it possible that SAM creates the first hash based upon some sort of transformation/interpolation, but then creates the second hash based upon the raw template source file? I'm just brainstorming in case it helps you track this down.)The text was updated successfully, but these errors were encountered: