Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog: return Iceberg snapshot log based on Nessie commit history #10033

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

snazy
Copy link
Member

@snazy snazy commented Dec 3, 2024

The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes.

This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations.

Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots.

Fixes #10013
Fixes #9969

@snazy snazy force-pushed the iceberg-include-previous-snapshots branch 5 times, most recently from e0cac8c to 614f88c Compare December 3, 2024 15:56
@snazy snazy marked this pull request as ready for review December 3, 2024 16:14
Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 A couple of minor comments below

var contentHistory =
treeService(apiContext)
.getContentHistory(
key, reference.name(), reference.hashWithRelativeSpec(), requestMeta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to limit the history to 2 by default? I suppose most client-side Iceberg operations need at most one parent snapshot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, we have to return all the previous snapshots, because we cannot know which ones are needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will it work with Nessie GC? We never remove snapshots from iceberg metadata, IIRC 🤔

@snazy snazy added this to the 0.101.0 milestone Dec 4, 2024
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes.

This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations.

Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots.

Fixes projectnessie#10013
Fixes projectnessie#9969
@snazy
Copy link
Member Author

snazy commented Dec 4, 2024

Ouch. So, a single Iceberg transaction can actually add multiple snapshots - but we only have one commit in Nessie - but need to be able to access both snapshots...
Need to think more about this, setting this change back to draft state.

@snazy snazy marked this pull request as draft December 4, 2024 10:54
@snazy snazy force-pushed the iceberg-include-previous-snapshots branch from 614f88c to 134a13c Compare December 4, 2024 14:30
@snazy snazy force-pushed the iceberg-include-previous-snapshots branch from 8673633 to e9d0abd Compare December 4, 2024 15:02
@snazy snazy marked this pull request as ready for review December 4, 2024 15:02
@snazy
Copy link
Member Author

snazy commented Dec 4, 2024

So, I hope that all tests are happy again.

The changes became more intrusive, but it should all work now.

Had to let the register-table endpoint not accept Nessie catalog URIs anymore, because that would lead to two Nessie tables pointing to the same snapshot (NessieTableSnapshot) - and it's actually not good in general to have different tables work in the same (storage) location. OTOH externally registered tables now get all snapshots imported via register-table. The check is a bit racy, but I suspect, it's okay-ish / better-than-nothing ™️ .

var implicitIcebergSnapshot =
snapFuture != null
? implicitIcebergSnapshot(
(NessieTableSnapshot) snapFuture.get(0, TimeUnit.SECONDS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapFuture.join()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's already finished

Copy link
Member

@dimas-b dimas-b Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It's just a nicer method for this case, does not have any timeout parameters and does not have checked exceptions.

URI icebergSnapshot(
Reference effectiveReference, ContentKey key, NessieEntitySnapshot<?> snapshot);
}
boolean checkIcebergSnapshotPresent(String metadataLocation, long versionId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

versionId -> snapshotId (for clarity)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property in IcebergContent is versionId

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the method checks for snapshots and it's not apparent that the versionId parameter here should be the snapshot ID.

if (snapId == iceberg.currentSnapshotId()) {
continue;
}
if (previous == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is previous ever non-null? IJ shows all non-test call paths leading to null 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those calls come from importIcebergTable... So, essentially if someone commits via Nessie API, each NessieTableSnapshot (after importing) will have duplicate "implicit" snapshot lists. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - unfortunate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keep previous if it is always null in production call paths?

@@ -1599,6 +1626,23 @@ public static void addSnapshot(AddSnapshot u, IcebergTableMetadataUpdateState st
.icebergManifestListLocation(icebergSnapshot.manifestList())
.icebergManifestFileLocations(icebergSnapshot.manifests());

// If another Iceberg snapshot was add in this same update-transaction, memoize the previous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add -> added

@snazy snazy modified the milestones: 0.101.0, 0.102.0 Dec 5, 2024
- Nessie now returns the snapshot history in the `snapshots` and `snapshot-log` attributes of an Iceberg
table-metadata retrieved via Iceberg REST for table changes that have been committed via Iceberg REST
to Nessie 0.101.0 or newer. Commits made using older Nessie versions will not return older snapshots.
- Generally, not only for Nessie, It is recommended to keep the number of snapshots maintained in an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Generally, not only for Nessie, It is recommended to keep the number of snapshots maintained in an
- Generally, not only for Nessie, it is recommended to keep the number of snapshots maintained in an


/**
* Contains Iceberg snapshots for which no explicit Nessie commit exists. We need to memoize those
* snapshots here in case a single Iceberg transaction add multiple snapshots.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* snapshots here in case a single Iceberg transaction add multiple snapshots.
* snapshots here in case a single Iceberg transaction adds multiple snapshots.

// iterator
tuple(keyRenamed, "rename 1", hashRename1.getCommitHash(), contentRename1),
tuple(key, "update 2", hashUpdate2.getCommitHash(), contentUpdate2),
tuple(key, "create-other", hashCreateOther.getCommitHash(), contentUpdate1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commit returned, since it does not change key? Imho it should return the "update 1" commit message here since that is the commit that introduced contentUpdate1.

@snazy snazy removed this from the 0.102.0 milestone Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants