Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-incremental materialized views #1255

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

mikealfare
Copy link
Contributor

@mikealfare mikealfare commented Jun 11, 2024

resolves #672

This PR continues @bnaul's work in #1011.

This PR has become more complicated than expected and will require some discussion prior to moving forward. There are a number of items that need to be addressed:

  • align on approach for managing options directly on the MV
  • align on approach of templating each option explicitly in jinja
  • update tests to catch all options
  • update change tests to test all (reasonably) change scenarios
  • update the macros that swap MVs for views/tables with MVs as certain options do not pertain to views/tables and cause issues when they appear in the model

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@mikealfare mikealfare changed the title Non incremental mat view Add support for non-incremental materialized views Jun 11, 2024
@mikealfare mikealfare marked this pull request as ready for review June 11, 2024 21:05
@mikealfare mikealfare requested a review from a team as a code owner June 11, 2024 21:05
@mikealfare
Copy link
Contributor Author

Closing and re-opening to re-trigger CI post GH outage.

@mikealfare mikealfare closed this Jun 11, 2024
@mikealfare mikealfare reopened this Jun 11, 2024
@mikealfare mikealfare marked this pull request as draft June 12, 2024 13:34
@mikealfare mikealfare marked this pull request as ready for review June 13, 2024 22:19
config_change_collection.options = BigQueryOptionsConfigChange(
action=RelationConfigChangeAction.alter,
context=new_materialized_view.options,
# get an options change object with only the options that have changed
Copy link
Contributor

@VersusFacit VersusFacit Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to what. Pardon, just not sure what this comment refers to with the prexisting code. Were we instantiating the object in a different way? I see you're no longer using kwargs which is fine, but what's the intent here?

edit: Oh I think I get it now based on this below https://github.com/dbt-labs/dbt-bigquery/pull/1255/files#r1639268560. I really like the from style creation mechanism -- reminds me of Rust trait From, but I think we might alter how you've implemented it slightly!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were creating an options object with all options, not just those that changed, and it was creating issues when altering the options. For example, options that force a full refresh would always show up in the change even if they didn't change, hence changing an option that could be implemented via an ALTER statement would still trigger a full refresh due to the presence of the other options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to indicate that better in the comment. I don't know how it would be good to say that but something like "This is how is used to be done. This is why we were are doing it this way deliberately. This resulted in X. We want to avoid X."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going around the block on this a few times today, I'm not adopting the approach of simply submitting the entire new options config. This comment no longer applies. However there are new comments for the new scenario. I'm looking for feedback on those comments as a result of this thread.

@@ -22,6 +23,7 @@ class BigQueryOptionsConfig(BigQueryBaseRelationConfig):
refresh_interval_minutes: Optional[float] = 30
expiration_timestamp: Optional[datetime] = None
max_staleness: Optional[str] = None
allow_non_incremental_definition: Optional[bool] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This continues the comment above regarding defaulting Optional[bool]s to None in BigQueryConfig. The intent in this section is to match the defaulting behavior of BigQuery according to their docs. The conclusion that we reached above is that we should have a comment regarding the fact that BQ effectively allows a third falsy value (None) for some booleans (allow_non_incremental_definition). There is a link in the docstring to those configurations. While I don't want to pull too much information from that link (to avoid getting out of sync with BQ docs), I will add a generic comment to the top that indicates that some booleans are option in BQ in the sense that there is literally no setting if it's not provided (versus defaulting to false). Does that work?

if new_options_dict[k] != existing_options_dict[k]
}
option_diffs = BigQueryOptionsConfig.from_dict(option_diffs_dict)
return cls(action=RelationConfigChangeAction.alter, context=option_diffs)
Copy link
Contributor

@VersusFacit VersusFacit Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think what you were doing ahead now. I'm not sure I like burrying the alter here. May be a matter of preference. I kind of want to float this action param up for visibility when creating the object as opposed to hardcoding the value down here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, but I'll comment with what's coming to mind.

Most things are an alter in this framework. The other options are create and drop, which really was only used for indexes in dbt-postgres where the entire object (index, not the relation) would be created or dropped instead of altered. To alter an index in dbt-postgres you need to issue two statements, one to drop the existing, and another to recreate it with the new config. In this case, we're really just altering the options on the relation (in this case an MV, but could be a table in the future). Sometimes that happens via a full refresh, which gets determined based on the context.

I agree we could spend more time on this framework to make it easier to parse (and I would like to do so if given capacity), but updating that would be outside of the scope of this particular change. Hence I left it with the default alter since I need to provide something. Unfortunately, if you follow the code through, this attribute never gets used, only context does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on not scoping up. I'd just rather have an empty param here and pass in the alter deliberately at the function call level above. Right now you're making an assumption that I worry may not be obvious to the next person. But with the generic param passing this argument from the function call, it would be quite straightforward for adding others as needed (even if it never happens, I think it'd read clearer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say to pass it in to the function call above, are you saying to pass it in to from_options_configs? While it's static now, it may not be in the future. In fact, I could see accounting for "unsetting" options in the future, would would just be a drop. Then I would need to compare the two options configs to see if it's an alter or a drop/create scenario. That would make from_options_configs duplicative. My intent for this method was basically "here are my two options configs, please tell me what I need to do to get from A to B". I don't think supplying the action as an argument as well makes it more readable. I could set the action prior to the return with a comment that all diffs are alter diffs at this point, which may change in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change of providing the entire options config as context for the change, there is no need for a from_options_config method. Hence this is now back in the calling method. I think this is what you were asking for, but please let me know if that's incorrect.

@@ -63,6 +65,9 @@ def change_config_via_replace(project, materialized_view):
initial_model.replace(old_partition, new_partition)
.replace("'my_base_table'", "'my_other_base_table'")
.replace('cluster_by=["id", "value"]', 'cluster_by="id"')
.replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test going from allow_non_incremental_definition=False to allow_non_incremental_definition=True

@mikealfare mikealfare marked this pull request as draft June 14, 2024 21:46
@yu-iskw
Copy link
Contributor

yu-iskw commented Nov 19, 2024

@mikealfare What should we do in the rest? If there is anything I can help, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-464] Materialized Views - Non-incremental support
5 participants