Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(master): async pruning of orphan nodes #877

Closed
wants to merge 35 commits into from

Conversation

czarcas7ic
Copy link
Contributor

@czarcas7ic czarcas7ic commented Jan 31, 2024

This feature prevents the multiple hour long waiting period when chains upgrade from previous versions of IAVL to IAVL v1. The time comes from pruning orphan nodes. This synchronously prunes them. This branch has been NOT been tested against osmosis mainnet, however the v1.x.x version of this PR has.

mergify bot and others added 28 commits June 2, 2023 16:39
…806) (#821)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marko Baricevic <[email protected]>
#822)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marko Baricevic <[email protected]>
@czarcas7ic czarcas7ic requested a review from a team as a code owner January 31, 2024 01:56
Copy link

coderabbitai bot commented Jan 31, 2024

Walkthrough

The recent updates focus on enhancing performance, ensuring thread safety, and improving code clarity. Key changes include upgrading the Go version in the CI workflow, introducing thread safety measures in the BatchWithFlusher type, and refining encoding functions for efficiency. Additionally, the project has seen structural code improvements for better modularity and clarity, alongside adopting a new FastPrefixFormatter for optimized key formatting. These changes collectively aim to refine the project's robustness and performance.

Changes

File(s) Summary of Changes
.github/workflows/lint.yml Updated go-version to 1.21 and replaced golangci/golangci-lint-action with make lint command.
CHANGELOG.md Updated to version v1.0.0 (October 30, 2023).
batch.go Added sync.Mutex to BatchWithFlusher for thread safety in Set and Delete methods.
fastnode/fast_node.go, internal/encoding/encoding.go Added comments regarding the assumption of input immutability in DeserializeNode and DecodeBytes; simplified DecodeBytes and added Encode32BytesHash in encoding.go.
keyformat/prefix_formatter.go, nodedb.go Introduced FastPrefixFormatter for efficient key formatting and updated key formats in nodedb.go to use it.
mutable_tree.go Refactored recursiveSet into recursiveSetLeaf for clearer leaf node setting logic.
node.go Included comments for legacy and v1 versions of fields and updated encoding logic for 32-byte hashes.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Note: Auto-reply has been disabled for this repository by the repository owner. The CodeRabbit bot will not respond to your comments unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

nodedb.go Outdated
Comment on lines 430 to 515
var prevVersion, curVersion int64
var rootKeys [][]byte
for ; itr.Valid(); itr.Next() {
legacyRootKeyFormat.Scan(itr.Key(), &curVersion)
rootKeys = append(rootKeys, itr.Key())
if prevVersion > 0 {
if err := ndb.traverseOrphans(prevVersion, curVersion, func(orphan *Node) error {
go func() {
defer func() {
isDeletingLegacyVersionsMutex.Lock()
isDeletingLegacyVersions = false
isDeletingLegacyVersionsMutex.Unlock()
}()

// Check if we have a legacy version
itr, err := dbm.IteratePrefix(ndb.db, legacyRootKeyFormat.Key())
if err != nil {
ndb.logger.Error(err.Error())
return
}
defer itr.Close()

// Delete orphans for all legacy versions
var prevVersion, curVersion int64
var rootKeys [][]byte
counter := 0
for ; itr.Valid(); itr.Next() {
legacyRootKeyFormat.Scan(itr.Key(), &curVersion)
rootKeys = append(rootKeys, itr.Key())
if prevVersion > 0 {
if err := ndb.traverseOrphans(prevVersion, curVersion, func(orphan *Node) error {
counter++
if counter == 1000 {
counter = 0
time.Sleep(1000 * time.Millisecond)
fmt.Println("IAVL sleep happening")
}
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
ndb.logger.Error(err.Error())
return
}
}
prevVersion = curVersion
}
// Delete the last version for the legacyLastVersion
if curVersion > 0 {
legacyLatestVersion, err := ndb.getLegacyLatestVersion()
if err != nil {
ndb.logger.Error(err.Error())
return
}
if curVersion != legacyLatestVersion {
ndb.logger.Error("expected legacyLatestVersion to be %d, got %d", legacyLatestVersion, curVersion)
return
}
if err := ndb.traverseOrphans(curVersion, curVersion+1, func(orphan *Node) error {
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
return err
ndb.logger.Error("failed to clean legacy orphans between versions", "err", err)
return
}
}
prevVersion = curVersion
}
// Delete the last version for the legacyLastVersion
if curVersion > 0 {
legacyLatestVersion, err := ndb.getLegacyLatestVersion()
if err != nil {
return err
}
if curVersion != legacyLatestVersion {
return fmt.Errorf("expected legacyLatestVersion to be %d, got %d", legacyLatestVersion, curVersion)

// Delete all roots of the legacy versions
for _, rootKey := range rootKeys {
if err := ndb.batch.Delete(rootKey); err != nil {
ndb.logger.Error("failed to clean legacy orphans root keys", "err", err)
return
}
}
if err := ndb.traverseOrphans(curVersion, curVersion+1, func(orphan *Node) error {
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
return err

// Initialize the legacy latest version to -1 to demonstrate that all legacy versions have been deleted
ndb.legacyLatestVersion = -1

// Delete all orphan nodes of the legacy versions
// TODO: Is this just deadcode?????
if err := ndb.deleteOrphans(); err != nil {
ndb.logger.Error("failed to clean legacy orphans", "err", err)
return
}
}()

return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of deleteLegacyVersions uses a mutex to ensure that the deletion process is not initiated multiple times concurrently. This is a good practice for maintaining data integrity during potentially destructive operations. However, the presence of a sleep (time.Sleep(1000 * time.Millisecond)) within a loop (lines 463-467) could indicate an attempt to mitigate a resource contention issue or rate limit the deletion process. This approach might not be optimal for performance and could benefit from a review to explore more efficient synchronization or batching strategies.

- time.Sleep(1000 * time.Millisecond)
+ // Consider implementing a more efficient synchronization or batching strategy.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
isDeletingLegacyVersionsMutex.Lock()
if isDeletingLegacyVersions {
isDeletingLegacyVersionsMutex.Unlock()
return nil
}
defer itr.Close()
isDeletingLegacyVersions = true
isDeletingLegacyVersionsMutex.Unlock()
// Delete orphans for all legacy versions
var prevVersion, curVersion int64
var rootKeys [][]byte
for ; itr.Valid(); itr.Next() {
legacyRootKeyFormat.Scan(itr.Key(), &curVersion)
rootKeys = append(rootKeys, itr.Key())
if prevVersion > 0 {
if err := ndb.traverseOrphans(prevVersion, curVersion, func(orphan *Node) error {
go func() {
defer func() {
isDeletingLegacyVersionsMutex.Lock()
isDeletingLegacyVersions = false
isDeletingLegacyVersionsMutex.Unlock()
}()
// Check if we have a legacy version
itr, err := dbm.IteratePrefix(ndb.db, legacyRootKeyFormat.Key())
if err != nil {
ndb.logger.Error(err.Error())
return
}
defer itr.Close()
// Delete orphans for all legacy versions
var prevVersion, curVersion int64
var rootKeys [][]byte
counter := 0
for ; itr.Valid(); itr.Next() {
legacyRootKeyFormat.Scan(itr.Key(), &curVersion)
rootKeys = append(rootKeys, itr.Key())
if prevVersion > 0 {
if err := ndb.traverseOrphans(prevVersion, curVersion, func(orphan *Node) error {
counter++
if counter == 1000 {
counter = 0
time.Sleep(1000 * time.Millisecond)
fmt.Println("IAVL sleep happening")
}
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
ndb.logger.Error(err.Error())
return
}
}
prevVersion = curVersion
}
// Delete the last version for the legacyLastVersion
if curVersion > 0 {
legacyLatestVersion, err := ndb.getLegacyLatestVersion()
if err != nil {
ndb.logger.Error(err.Error())
return
}
if curVersion != legacyLatestVersion {
ndb.logger.Error("expected legacyLatestVersion to be %d, got %d", legacyLatestVersion, curVersion)
return
}
if err := ndb.traverseOrphans(curVersion, curVersion+1, func(orphan *Node) error {
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
return err
ndb.logger.Error("failed to clean legacy orphans between versions", "err", err)
return
}
}
prevVersion = curVersion
}
// Delete the last version for the legacyLastVersion
if curVersion > 0 {
legacyLatestVersion, err := ndb.getLegacyLatestVersion()
if err != nil {
return err
}
if curVersion != legacyLatestVersion {
return fmt.Errorf("expected legacyLatestVersion to be %d, got %d", legacyLatestVersion, curVersion)
// Delete all roots of the legacy versions
for _, rootKey := range rootKeys {
if err := ndb.batch.Delete(rootKey); err != nil {
ndb.logger.Error("failed to clean legacy orphans root keys", "err", err)
return
}
}
if err := ndb.traverseOrphans(curVersion, curVersion+1, func(orphan *Node) error {
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
return err
// Initialize the legacy latest version to -1 to demonstrate that all legacy versions have been deleted
ndb.legacyLatestVersion = -1
// Delete all orphan nodes of the legacy versions
// TODO: Is this just deadcode?????
if err := ndb.deleteOrphans(); err != nil {
ndb.logger.Error("failed to clean legacy orphans", "err", err)
return
}
}()
return nil
}
isDeletingLegacyVersionsMutex.Lock()
if isDeletingLegacyVersions {
isDeletingLegacyVersionsMutex.Unlock()
return nil
}
isDeletingLegacyVersions = true
isDeletingLegacyVersionsMutex.Unlock()
go func() {
defer func() {
isDeletingLegacyVersionsMutex.Lock()
isDeletingLegacyVersions = false
isDeletingLegacyVersionsMutex.Unlock()
}()
// Check if we have a legacy version
itr, err := dbm.IteratePrefix(ndb.db, legacyRootKeyFormat.Key())
if err != nil {
ndb.logger.Error(err.Error())
return
}
defer itr.Close()
// Delete orphans for all legacy versions
var prevVersion, curVersion int64
var rootKeys [][]byte
counter := 0
for ; itr.Valid(); itr.Next() {
legacyRootKeyFormat.Scan(itr.Key(), &curVersion)
rootKeys = append(rootKeys, itr.Key())
if prevVersion > 0 {
if err := ndb.traverseOrphans(prevVersion, curVersion, func(orphan *Node) error {
counter++
if counter == 1000 {
counter = 0
// Consider implementing a more efficient synchronization or batching strategy.
fmt.Println("IAVL sleep happening")
}
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
ndb.logger.Error(err.Error())
return
}
}
prevVersion = curVersion
}
// Delete the last version for the legacyLastVersion
if curVersion > 0 {
legacyLatestVersion, err := ndb.getLegacyLatestVersion()
if err != nil {
ndb.logger.Error(err.Error())
return
}
if curVersion != legacyLatestVersion {
ndb.logger.Error("expected legacyLatestVersion to be %d, got %d", legacyLatestVersion, curVersion)
return
}
if err := ndb.traverseOrphans(curVersion, curVersion+1, func(orphan *Node) error {
return ndb.batch.Delete(ndb.nodeKey(orphan.GetKey()))
}); err != nil {
ndb.logger.Error("failed to clean legacy orphans between versions", "err", err)
return
}
}
// Delete all roots of the legacy versions
for _, rootKey := range rootKeys {
if err := ndb.batch.Delete(rootKey); err != nil {
ndb.logger.Error("failed to clean legacy orphans root keys", "err", err)
return
}
}
// Initialize the legacy latest version to -1 to demonstrate that all legacy versions have been deleted
ndb.legacyLatestVersion = -1
// Delete all orphan nodes of the legacy versions
// TODO: Is this just deadcode?????
if err := ndb.deleteOrphans(); err != nil {
ndb.logger.Error("failed to clean legacy orphans", "err", err)
return
}
}()
return nil
}

@czarcas7ic czarcas7ic marked this pull request as draft January 31, 2024 02:08
@czarcas7ic czarcas7ic changed the title feat: sync pruning of orphan nodes feat(master): sync pruning of orphan nodes Jan 31, 2024
@czarcas7ic czarcas7ic marked this pull request as ready for review January 31, 2024 03:34
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: cool-developer <[email protected]>
@cool-develope cool-develope changed the title feat(master): sync pruning of orphan nodes feat(master): async pruning of orphan nodes Feb 15, 2024
legacyRootKeyFormat.Scan(itr.Key(), &curVersion)
rootKeys = append(rootKeys, itr.Key())
if prevVersion > 0 {
if err := ndb.traverseOrphans(prevVersion, curVersion, func(orphan *Node) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cool-develope Isn't it true that v1 does not store orphans explicitly, and identifies them by traversing trees at version n and n+1? This being the case, why can deleteLegacyVersions not simply iterate on the orphan key prefix in leveldb and delete all of them? Am I missing something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it can simple iterate orphan values

@@ -46,6 +49,9 @@ func (b *BatchWithFlusher) estimateSizeAfterSetting(key []byte, value []byte) (i
// the batch is flushed to disk, cleared, and a new one is created with buffer pre-allocated to threshold.
// The addition entry is then added to the batch.
func (b *BatchWithFlusher) Set(key, value []byte) error {
b.mtx.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each batch accumulates in memory before flushing to disk doesn't it make sense to lock at write instead of Set/Delete?

counter++
if counter == 1000 {
counter = 0
time.Sleep(1000 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sleep to create a write gap so that SaveVersion doesn't block too frequently from the newly introduced mutex in BatchWithFlusher while a long prune operation is ongoing?

If deleteLegacyVersions is only called once on migration I guess it's OK, but this sleep is hard to understand and could result in unexpected wait times from the main I/o thread. I'm also curious about the overhead of mutex/set release on every Set/Delete call.

@cool-develope
Copy link
Collaborator

@czarcas7ic could you please wire up this PR? only encoding and fastKeyFormatter features

@czarcas7ic
Copy link
Contributor Author

@cool-develope Just verifying, you only want the FastPrefixFormatter change as well as encoding.go change, and remove everything else? My apologies, I am between multiple tasks right now so am losing context on this.

@cool-develope
Copy link
Collaborator

@cool-develope Just verifying, you only want the FastPrefixFormatter change as well as encoding.go change, and remove everything else? My apologies, I am between multiple tasks right now so am losing context on this.

we can close it, -> #923

@czarcas7ic
Copy link
Contributor Author

Thanks, my apologies for the extra overhead that this may have caused!

@czarcas7ic czarcas7ic closed this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants