Re-evaluate adding compile task when using ExecutionMode.AIRFLOW_ASYNC
#1477
Labels
area:execution
Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc
dbt:compile
Primarily related to dbt compile command or functionality
execution:virtualenv
Related to Virtualenv execution environment
Milestone
In Cosmos 1.7.0, we introduced experimental support to
ExecutionMode.AIRFLOW_ASYNC
, as discussed in this article and documentation page:A fundamental characteristic of the approach implemented in 1.7.0 is that we'd pre-compute the SQL in a single first setup task, called
dbt_compile
, and the remaining tasks would only need to run the SQL statements, as illustrated in:This approach had some problems:
ExecutionMode.AIRFLOW_ASYNC
query #1260As part of fixing #1260 using a monkey-patch, in #1474, we noticed that the
dbt_compile
step did not have to happen beforehand since we could monkey-patch per run statement, leading to a refactor to remove thedbt_compile
and run the monkey-patched dbt command with dbtRunner per task. While this is cleaner from a DAG topology perspective, it is unclear what is the best approach moving forward since to run the patched dbt version in every run task will require:a) dbt and Airflow being installed in the same Python environment on every worker node
b) possible memory/CPU overhead of running dbtRunner for every task
An alternative approach we could consider is:
dbt_compile
task, but identify the possibility of running it with CosmosExecutionMode.VIRTUALENV
The advantages of this approach would be:
dbt
commandsThe downsides would be
dbtRunner
dbt_compile
task and have to implement the correspondent [async] Introduce teardown node when usingExecutionMode.AIRFLOW_ASYNC
#1232 taskIdeally, we'd compare these two approaches with real dbt projects and evaluate the numbers before making a decision.
The text was updated successfully, but these errors were encountered: