Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: allow to parallelize asset build #32789

Open
2 tasks
tmokmss opened this issue Jan 8, 2025 · 2 comments
Open
2 tasks

cli: allow to parallelize asset build #32789

tmokmss opened this issue Jan 8, 2025 · 2 comments
Labels
effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 package/tools Related to AWS CDK Tools or CLI

Comments

@tmokmss
Copy link
Contributor

tmokmss commented Jan 8, 2025

Describe the feature

Currently in cdk deploy, assets are built serially one by one, making it slow when there are many container image assets.

The current behavior assumes that asset build is CPU-bound and will not get benefit from parallelism:

const graphConcurrency: Concurrency = {
'stack': concurrency,
'asset-build': 1, // This will be CPU-bound/memory bound, mostly matters for Docker builds
'asset-publish': (options.assetParallelism ?? true) ? 8 : 1, // This will be I/O-bound, 8 in parallel seems reasonable
};

However, in our use cases, our assets are not fully cpu-bound and will actually get faster when parallelized. That is why I want to configure the concurrency parameter by CLI arguments.

Use Case

When a CDK app contains many docker images whose build process are not fully CPU-bound, we can make cdk deploy faster by parallelizing the asset build processes.

You can see the outcome by the following extreme example:

# docker/Dockerfile
FROM nginx
ARG DUMMY_ARG
RUN echo ${DUMMY_ARG}
# simulating IO-bound image
RUN sleep 10

And this CDK code:

// stack.ts
import * as cdk from 'aws-cdk-lib';
import { DockerImageCode, DockerImageFunction } from 'aws-cdk-lib/aws-lambda';
import { Construct } from 'constructs';

export class SlowDockerParallelTestStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    Array(5)
      .fill(0)
      .map((_, i) => {
        new DockerImageFunction(this, `Function${i}`, {
          code: DockerImageCode.fromImageAsset('./docker', {
            buildArgs: {
              DUMMY_ARG: `${i}_v1`,
            },
          }),
        });
      });
  }
}

When you run cdk deploy, you can see the five images are built serially, requiring more than 50 seconds to finish deployment.

Currently you can configure the parallelism parameter by directly editing node_modules/aws-cdk/lib/index.js, search for {"stack":concurrency,"asset-build":1 and replace it with {"stack":concurrency,"asset-build":5.

After updateing the parameter, run cdk deploy again (make sure to change DUMMY_ARG to invalidate caches), and all the images are built concurrently. It now takes about 10 seconds.

Proposed Solution

Expose CLI argument like asset-build-concurrency. Default is 1, and set the value here.

const graphConcurrency: Concurrency = {
'stack': concurrency,
'asset-build': 1, // This will be CPU-bound/memory bound, mostly matters for Docker builds
'asset-publish': (options.assetParallelism ?? true) ? 8 : 1, // This will be I/O-bound, 8 in parallel seems reasonable
};

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.174.1

Environment details (OS name and version, etc.)

macOS

@tmokmss tmokmss added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jan 8, 2025
@github-actions github-actions bot added the package/tools Related to AWS CDK Tools or CLI label Jan 8, 2025
@khushail khushail added investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 and removed needs-triage This issue or PR still needs to be triaged. labels Jan 8, 2025
@khushail khushail self-assigned this Jan 8, 2025
@khushail
Copy link
Contributor

khushail commented Jan 8, 2025

Hi @tmokmss . thanks for requesting this. I see that assetParellelism feature is mentioned in CDK docs as -

readonly assetParallelism?: boolean;

  /**
   * Build/publish assets for a single stack in parallel
   *
   * Independent of whether stacks are being done in parallel or no.
   *
   * @default true
   */
  readonly assetParallelism?: boolean;

and its implemented for CPU Boundtype as you mentioned -

const graphConcurrency: Concurrency = {
'stack': concurrency,
'asset-build': 1, // This will be CPU-bound/memory bound, mostly matters for Docker builds
'asset-publish': (options.assetParallelism ?? true) ? 8 : 1, // This will be I/O-bound, 8 in parallel seems reasonable
};

So it would make prefect sense to make this available for Memory bound type as well.

However I would be requesting Core team's input here to share their insights if its something on their radar or actively being worked on, marking it as P2.

@khushail khushail added effort/small Small work item – less than a day of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jan 8, 2025
@khushail khushail removed their assignment Jan 8, 2025
@iliapolo
Copy link
Contributor

Hi @tmokmss - I can definitely see the use case here. I think the P2 classification is the right thing here.

Thanks for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

No branches or pull requests

3 participants