Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow masking without primary keys #5575

Merged
merged 30 commits into from
Jan 10, 2025

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Dec 9, 2024

Closes LA-95

Description Of Changes

Removed the need to include primary keys for erasure requests for BigQuery and Postgres. Before this change, all connectors required collections to have one field that contained primary_key: True for that collection to be masked. In order to make this an incremental change, I added a requires_primary_keys property to BaseConnector which I defaulted to True. Every connector can either override this, or keep the default behavior.

Code Changes

  • BigQuery - Set requires_primary_keys to False
  • Postgres - Set requires_primary_keys to False
  • ScyllaDB - Set requires_primary_keys to True and included a comment about ScyllaDB requiring the where clause for an update to be a primary key

Steps to Confirm

  1. Remove all the primary_key annotations from src/fides/data/sample_project/sample_resources/postgres_example_test_dataset.yml
  2. Run nox -s "fides_env(test)"
  3. Run an access and erasure request for [email protected]. Verify both are still possible without primary keys.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
  • Followup issues:
    • Followup issues created (include link)
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

Copy link

vercel bot commented Dec 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 10:36pm

Copy link

cypress bot commented Dec 9, 2024

fides    Run #11684

Run Properties:  status check passed Passed #11684  •  git commit 5012041c66 ℹ️: Merge dec9b211e093ffa8a41ebba7728cc977cd0cba41 into a009dbb68834e540a42aac8889e1...
Project fides
Branch Review refs/pull/5575/merge
Run status status check passed Passed #11684
Run duration 00m 50s
Commit git commit 5012041c66 ℹ️: Merge dec9b211e093ffa8a41ebba7728cc977cd0cba41 into a009dbb68834e540a42aac8889e1...
Committer Adrian Galvan
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
View all changes introduced in this branch ↗︎

@galvana galvana added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label Dec 10, 2024
@galvana galvana changed the base branch from main to split-query-config-files December 10, 2024 22:25
Copy link

codecov bot commented Dec 28, 2024

Codecov Report

Attention: Patch coverage is 73.80952% with 11 lines in your changes missing coverage. Please review.

Project coverage is 87.14%. Comparing base (12ce3d6) to head (8d552b0).

Files with missing lines Patch % Lines
.../connectors/query_configs/bigquery_query_config.py 0.00% 8 Missing ⚠️
...connectors/query_configs/snowflake_query_config.py 33.33% 2 Missing ⚠️
...fides/api/service/connectors/bigquery_connector.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@                      Coverage Diff                      @@
##           split-query-config-files    #5575       +/-   ##
=============================================================
+ Coverage                     71.31%   87.14%   +15.82%     
=============================================================
  Files                           388      388               
  Lines                         23988    24000       +12     
  Branches                       2594     2593        -1     
=============================================================
+ Hits                          17107    20914     +3807     
+ Misses                         6326     2525     -3801     
- Partials                        555      561        +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@galvana galvana marked this pull request as ready for review January 7, 2025 02:29
Comment on lines +609 to +614
if (
self.connector.requires_primary_keys
and not self.execution_node.collection.contains_field(
lambda f: f.primary_key
)
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key change - This is where I added the additional check for requires_primary_keys

Comment on lines +103 to +110
@property
def reference_field_paths(self) -> Dict[FieldPath, Field]:
"""Mapping of FieldPaths to Fields that have incoming identity or dataset references"""
return {
field_path: field
for field_path, field in self.field_map().items()
if field_path in {edge.f2.field_path for edge in self.node.incoming_edges}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very important change - this is what allows us to use the dataset references instead of the primary key fields for the where clauses

Copy link
Contributor

@eastandwestwind eastandwestwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually tested and this is working great!

Just 1 q to address, below

.github/workflows/backend_checks.yml Outdated Show resolved Hide resolved
@galvana galvana merged commit 53944d5 into split-query-config-files Jan 10, 2025
35 of 37 checks passed
@galvana galvana deleted the LA-95-masking-without-primary-keys branch January 10, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run unsafe ci checks Runs fides-related CI checks that require sensitive credentials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants