Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Airflow 3.0 - New Repo for DAG/Role Sync #6544

Open
5 tasks
Tracked by #6543
jhpyke opened this issue Jan 13, 2025 · 0 comments
Open
5 tasks
Tracked by #6543

📖 Airflow 3.0 - New Repo for DAG/Role Sync #6544

jhpyke opened this issue Jan 13, 2025 · 0 comments
Labels

Comments

@jhpyke
Copy link
Contributor

jhpyke commented Jan 13, 2025

User Story

As a maintainer of the Airflow Platform
I need a repo users can sync their DAGs and roles from that is Terraform Driven and designed around our new DAGFactory Functionality
So that users are able to create new DAGs to be scheduled in the new MWAA environment in the easiest and most effective way possible.

Value / Purpose

In order to modernise how we manage airflow, we need a repo that allows them to easily understand how to create new DAGs using DAG factory, provide them with a role using IAM_builder that has sufficient permissions to do any tasks they require, and then have that repo sync those to our MWAA environments. To do this, the repo will need to use terraform to create the required roles in analytical-platform-data-production with a role policy that allows assumption from the relevant compute environment. The desired output is already available in our existing airflow repo, but this relies on pulumi to manage and provision these roles, which is not a technology our team wishes to support.

Useful Contacts

@jhpyke

User Types

No response

Hypothesis

If we build a repo that simplifies the process of creating a new DAG
Then airflow as a platform will be more approachable for users.

Proposal

The repo should present users with a a template DAGFactory DAG for them to fill out with their specific needs. These DAGs should then be synced to the S3 bucket of the relevant MWAA environment by terraform or awscli. Terraform should be used to provision the roles. Where roles already exist, we should transition them to terraform management when we are ready to migrate users from the old repo. Users who have a complex prexisting DAG that they do not wish to rewrite in DAG Factory format should be able to provide a full python DAG file if they wish to do so, but we should prompt users to use DAGFactory unless they have a great reason not to.

For the roles, it is worth noting that you will need a mechanism to provision a ServiceAccount in the relevant EKS cluster that can be used by the running pod for IRSA. This should be a solved problem, as we've already implemented this for the existing Airflow repo.

Additional Information

No response

Definition of Done

  • Repo is created
  • Repo can store and upload DAGs to S3
  • Repo can create roles in Data-Production that can be assumed by existing Airflow EKS Clusters
  • Repo can create service accounts for IRSA
  • Repo can run validation on roles/DAGs to ensure basic standards are met
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 👀 TODO
Development

No branches or pull requests

1 participant