-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lambda checkpoint not working from docker macOS #24
Comments
I think that this happens as you're running the DIND container unprivileged, either add |
tried all of these, none of them worked, getting the same issue. I wonder what does |
The code checking |
I use the DIND Docker Linux container to be able to test the CraC checkpoint functionality because I run from a Mac M1 machine and CraC CRIU jdk doesn't support MacOS arm chip, it could only run on LINUX from what I understand? |
Yes, only Linux is supported ATM. As of this moment there's no CRIU for OSX (running on M1 doesn't matter much, though there might be aarch64-specific bugs ofc). But I was not sure if you have multiple layers of docker, or only VM running Linux (started by OSX Docker) that runs single docker instance, and that's all you use. |
so is there any solution for macOS currently? Is there any timeline/scope of a release for a CraC Criu for OSX? Thank you |
I am not aware of anyone trying to port CRIU to OSX, it would be a tremendous effort. |
I think the issue for that |
You are simply treading into uncharted territory; yes, docs say that it would work on |
@tzvetkovg have you tried aarch64 build of CRaC? Like https://www.azul.com/downloads/?version=java-17-lts&os=linux&architecture=arm-64-bit&package=jdk-crac#zulu I don't remember seeing that error:
I can imagine the problem can be caused by problems in the underlying VM for the container. AFAIR, cross-CPU container works less reliable than when CPU in the container and the host match. BTW, I think we have outdated info on https://docs.azul.com/core/crac/crac-guidelines#running-crac-on-windows-or-macos. Thanks, we'll fix it. |
@AntonKozlov I've tried using the aarch64 build as you've suggested. This has solved the crac pidfd issue. However, now I am getting a different error when checkpointing :
even though the crac dependency
is available in the pom so the no such file is misleading by the looks of it and it's more of an issue with amd 64 aarch 64? Does that mean the crac aws client aws-lambda-runtime-interface-client is only available for AMD but not ARM? How do I configure the crac aws lambda client when building the docker image using the aarch64 distribution? The suggested aarch64 jdk seems to be different than the jdk used in the tutorial? Any suggestions? Thanks |
@AntonKozlov in addition to the above I've tried building the POC exactly as described here https://github.com/CRaC/example-lambda with the crac aarch64 JDK you've suggested and all steps are fine (including the ./criu check) until I attempt to build the checkpoint from within the docker ubuntu container so when I run
it fails with
it looks like this aarch64 jdk isn't configured in the same way as the one in the lambda example? |
Oh, I see. aws-lambda-java-runtime-interface-client:1.0.0 does not support aarch64, all the native libs inside that jar are x86-64. The following error about java.security can be related. We can update io.github.crac.com.amazonaws.aws-lambda-java-runtime-interface-client. Althought it would take some time, unlikely it will happen before mid-January. Another option is to rework the lambda example. With AWS API Gateway, apparently we can avoid dependencies on AWS libs, and the lambda code will become just a simpliest example like https://github.com/CRaC/example-jetty, packaged in container images for AWS. Probably we'll follow this way, but it will also won't be very fast. @tzvetkovg if you're interested, contributions are welcomed! :) |
@AntonKozlov aah I see, that's great, thank you for your efforts. I am looking forward to this change. |
hi @AntonKozlov , thanks for merging the PR with the new libs. I've also seen the new lambda PR CRaC/example-lambda#3 which isn't yet merged. I've tried to pull it locally but noticed the new
isn't pushed to maven central https://mvnrepository.com/artifact/io.github.crac.com.amazonaws/aws-lambda-java-runtime-interface-client |
Hi @tzvetkovg, thanks for noticing, I published the version to the central. |
@AntonKozlov thanks, I've managed to build the checkpoint POC https://github.com/crac/example-lambda following your changes! One thing I am struggling now is checkpointing a real spring boot lambda application, when I attempt to do the checkpoint I get as an exception
which is clearly caused by the open awssdk SSM socket that I have in my project to read some ssm properties I wonder what's the best way of dealing with this and errors like that? How do I close this socket to do the checkpoint? Do I need to re-open later on upon restoring? I thought if I just add
as described here https://docs.spring.io/spring-framework/reference/integration/checkpoint-restore.html spring would solve it automatically by closing that socket but I guess it this is only supported from spring boot 3.2? Also, I wonder is it possible to overwrite some of the environment variables when invoking the checkpointed image from the cr folder, for instance consider this
initially, I created the checkpoint image with this property being -Dspring.profiles.active=local, the new spring profile doesn't seem to change anything when I tested? |
I would expect |
oh sorry, I've tested -Dspring.context.checkpoint=onRefresh with spring boot 3.2 and it seems to work (see the logs)
the problem is that it gets stuck on this line as I think this blocks the checkpointing on the main lambda thread and it never really completes the checkpointing? When I attempt to invoke the function again with a new input I get
This may be to do with the AWS Lambda Runtime Interface Emulator (RIE) itself not being multithreaded? I think in your lambda example you've had the checkpoint to be invoked manually on a separate thread for a reason https://github.com/CRaC/example-lambda/blob/master/src/main/java/example/Handler.java
I wonder if there's any workaround to achieve this when using -Dspring.context.checkpoint=onRefresh? |
I am following the lambda git example https://github.com/CRaC/example-lambda to create a lambda checkpoint and follow the steps exactly via a docker container (ubuntu 20.04). I run it from a macOS (M1 arm chip). However, it doesn't work. My steps are:
docker run --privileged --platform=linux/amd64 --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/$(pwd) -w $(pwd) teracy/ubuntu:20.04-dind-20.10.13 bash
./crac-steps.sh s00_init
./crac-steps.sh dojlink openjdk-17-crac+6_linux-x64
which extracts the jdk folder fine./crack-steps s01_build
(works fine)./crac-steps.sh s02_start_checkpoint
(works fine)./crac-steps.sh s03_checkpoint
I getdump.log
the command
root@2ee49f701218:/tmp/sub/jdk/lib# ./criu check --all
producedit looks like it attempts to create the checkpoint but then I get right at the end (check in the log)
In my lambda container the exception log says
Any ideas what's wrong? I can see in the start of the log
File /run/criu.kdat does not exist
which kind of suggests the criu libs aren't on the classpath but they are as I can see the jdk folder extracted? Is the docker sock the issue or the fact that I am on macOS rather than linux?The text was updated successfully, but these errors were encountered: