Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bookie ledger replication error lead to increased disk usage #1908

Open
codelipenghui opened this issue Jan 21, 2019 · 7 comments
Open

Bookie ledger replication error lead to increased disk usage #1908

codelipenghui opened this issue Jan 21, 2019 · 7 comments

Comments

@codelipenghui
Copy link
Contributor

BUG REPORT

Problem happens after reboot bookie(Bare metal), i can't remember a little bit about the detailed steps at the time.

Found that disk usage continued growth:
image

Then check the bookie log, found some ledger replicate error log:

00:02:15.246 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4139-E4139, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.246 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4139
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.246 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.259 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4140-E4140, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.259 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4140
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.259 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.262 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4141-E4141, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.262 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4141
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.262 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.269 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4142-E4142, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.269 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4142
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.269 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.275 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4143-E4143, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.275 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4143
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.276 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4144-E4144, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4144
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error replicating ledger fragments for ledger: 426399
org.apache.bookkeeper.client.BKException$BKLedgerRecoveryException: Error while recovering ledger
	at org.apache.bookkeeper.client.BKException.create(BKException.java:78) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$SingleFragmentCallback.processResult(LedgerFragmentReplicator.java:357) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$1.processResult(LedgerFragmentReplicator.java:184) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.doCallback(BookkeeperInternalCallbacks.java:244) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.callback(BookkeeperInternalCallbacks.java:236) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.tick(BookkeeperInternalCallbacks.java:218) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.processResult(BookkeeperInternalCallbacks.java:254) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:306) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.280 [ReplicationWorker] WARN  org.apache.bookkeeper.replication.ReplicationWorker - BKLedgerRecoveryException while replicating the fragment
org.apache.bookkeeper.client.BKException$BKLedgerRecoveryException: Error while recovering ledger
	at org.apache.bookkeeper.client.BKException.create(BKException.java:78) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1058) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1036) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:254) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:207) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:168) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

This won't always happen, I also restarted bookie before.
I don't know why lost these four entries[E4139 - E4144] in bookie and ledger.
And this is dangerous that ledger replicate so many times.

After we disable autorecovery and find the topic use this ledger, then delete the topic,
entry log was successfully cleaned up

@sijie
Copy link
Member

sijie commented Jan 21, 2019

@codelipenghui thank you for reporting this.

I think there are two problems behind this issue.

  1. why the disk usage keeps growing?
  2. why the four entries are not available?

I think the disk usage kept growing because replication worker keeps trying to replicate fragments but fail at the last 4 entries. when replication worker keeps doing that, it is writing entries again and again. so the disk usage is going up. this can be resolved when we fix #1066

for the second problem, it is an interesting problem. my guess is it is related to ledger metadata update during closing. but I have to take a deeper look before coming to any conclusions.

@jiazhai
Copy link
Member

jiazhai commented Oct 29, 2019

would like to put it into 4.11

@Raven888888
Copy link

Raven888888 commented Oct 21, 2022

Faced this issue in our 3-node setup, but it is always the same 1 node that faced this problem.
Disk grows uncontrollably and the node died out of storage. Had to bookieformat every time.
But after some time it will happen again.

Pulsar 2.10.1

How can I fix the issue?

@Raven888888
Copy link

This is what found in logs

Bookie

2022-10-20T09:53:23,089+0000 [db-storage-cleanup-10-1] INFO  org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleting indexes for ledgers: [501062, 501893, 501923, 501378, 501922, 501921, 500143, 500142, 501901, 500141, 501898, 501896, 501911, 501047, 501366, 501909, 501908, 501055, 501918, 501054, 501916, 501913, 501912]
2022-10-20T09:53:23,754+0000 [db-storage-cleanup-10-1] WARN  org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - Failed to cleanup db indexes
java.io.IOException: Error in RocksDB compact
	at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.compact(KeyValueStorageRocksDB.java:295) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.removeOffsetFromDeletedLedgers(EntryLocationIndex.java:256) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambda$checkpoint$7(SingleDirectoryDbLedgerStorage.java:638) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_342]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_342]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_342]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_342]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_342]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_342]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
Caused by: org.rocksdb.RocksDBException: block checksum mismatch: expected 459167422, got 2336252423  in data/bookkeeper/ledgers/current/locations/000423.sst offset 612461 size 32254
	at org.rocksdb.RocksDB.compactRange(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.rocksdb.RocksDB.compactRange(RocksDB.java:3167) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.rocksdb.RocksDB.compactRange(RocksDB.java:3135) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.compact(KeyValueStorageRocksDB.java:293) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	... 10 more
2022-10-20T09:54:23,026+0000 [SyncThread-7-1] ERROR org.apache.bookkeeper.bookie.SyncThread - Exception flushing ledgers
java.io.IOException: Failed to flush RocksDB batch
	at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB$RocksDBBatch.flush(KeyValueStorageRocksDB.java:451) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.checkpoint(SingleDirectoryDbLedgerStorage.java:621) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.checkpoint(DbLedgerStorage.java:231) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.SyncThread.checkpoint(SyncThread.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.SyncThread.lambda$doCheckpoint$0(SyncThread.java:99) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_342]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_342]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_342]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_342]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_342]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_342]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
Caused by: org.rocksdb.RocksDBException: block checksum mismatch: expected 459167422, got 2336252423  in data/bookkeeper/ledgers/current/locations/000423.sst offset 612461 size 32254
	at org.rocksdb.RocksDB.write0(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.rocksdb.RocksDB.write(RocksDB.java:1586) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB$RocksDBBatch.flush(KeyValueStorageRocksDB.java:449) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	... 12 more
2022-10-20T09:54:28,204+0000 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
2022-10-20T09:54:28,206+0000 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
2022-10-20T09:55:23,019+0000 [SyncThread-7-1] ERROR org.apache.bookkeeper.bookie.SyncThread - Exception flushing ledgers
java.io.IOException: Failed to flush RocksDB batch
	at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB$RocksDBBatch.flush(KeyValueStorageRocksDB.java:451) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.checkpoint(SingleDirectoryDbLedgerStorage.java:621) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.checkpoint(DbLedgerStorage.java:231) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.SyncThread.checkpoint(SyncThread.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at org.apache.bookkeeper.bookie.SyncThread.lambda$doCheckpoint$0(SyncThread.java:99) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_342]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_342]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_342]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_342]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_342]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_342]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
Caused by: org.rocksdb.RocksDBException: block checksum mismatch: expected 459167422, got 2336252423  in data/bookkeeper/ledgers/current/locations/000423.sst offset 612461 size 32254
	at org.rocksdb.RocksDB.write0(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.rocksdb.RocksDB.write(RocksDB.java:1586) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
	at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB$RocksDBBatch.flush(KeyValueStorageRocksDB.java:449) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
	... 12 more

broker

2022-10-20T13:34:33,566+0000 [pulsar-ph-kafka-79-21] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/kafka/persistent/hourly-datetime-Ingest-partition-0
2022-10-20T13:34:33,569+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/kafka/persistent/hourly-datetime-Ingest-partition-0] Closing managed ledger
2022-10-20T13:34:33,569+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.pulsar.broker.PulsarService - No ledger offloader configured, using NULL instance
2022-10-20T13:34:33,569+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/kafka/persistent/hourly-datetime-Ingest
2022-10-20T13:34:33,574+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/kafka/persistent/hourly-datetime-Ingest] Closing managed ledger
2022-10-20T13:34:33,574+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] ERROR io.streamnative.pulsar.handlers.kop.KafkaTopicManager - [[id: 0x864f8b07, L:/192.168.3.3:9092 - R:/192.168.3.2:33020]]Get empty non-partitioned topic for name persistent://public/kafka/hourly-datetime-Ingest
2022-10-20T13:34:33,574+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] ERROR io.streamnative.pulsar.handlers.kop.KafkaTopicManager - [[id: 0x54c1cef3, L:/192.168.3.3:9092 - R:/192.168.3.2:33024]]Get empty non-partitioned topic for name persistent://public/kafka/hourly-datetime-Ingest
2022-10-20T13:34:33,675+0000 [pulsar-ph-kafka-79-21] INFO  org.apache.pulsar.broker.PulsarService - No ledger offloader configured, using NULL instance
2022-10-20T13:34:33,675+0000 [pulsar-ph-kafka-79-21] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/kafka/persistent/hourly-datetime-Ingest-partition-0
2022-10-20T13:34:33,679+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/kafka/persistent/hourly-datetime-Ingest-partition-0] Closing managed ledger
2022-10-20T13:34:33,679+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.pulsar.broker.PulsarService - No ledger offloader configured, using NULL instance
2022-10-20T13:34:33,679+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/kafka/persistent/hourly-datetime-Ingest
2022-10-20T13:34:33,684+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/kafka/persistent/hourly-datetime-Ingest] Closing managed ledger
2022-10-20T13:34:33,684+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] ERROR io.streamnative.pulsar.handlers.kop.KafkaTopicManager - [[id: 0x864f8b07, L:/192.168.3.3:9092 - R:/192.168.3.2:33020]]Get empty non-partitioned topic for name persistent://public/kafka/hourly-datetime-Ingest
2022-10-20T13:34:33,684+0000 [bookkeeper-ml-scheduler-OrderedScheduler-11-0] ERROR io.streamnative.pulsar.handlers.kop.KafkaTopicManager - [[id: 0x54c1cef3, L:/192.168.3.3:9092 - R:/192.168.3.2:33024]]Get empty non-partitioned topic for name persistent://public/kafka/hourly-datetime-Ingest
2022-10-20T13:34:33,685+0000 [pulsar-web-35-9] INFO  org.eclipse.jetty.server.RequestLog - 192.168.3.3 - - [20/Oct/2022:13:34:33 +0000] "GET /admin/v2/persistent/public/kafka/hourly-datetime-Ingest/partitions HTTP/1.1" 200 16 "-" "Pulsar-Java-v2.10.0" 1

@Hongten
Copy link

Hongten commented Dec 28, 2022

Any update on this issue? I encountered the same error. Pulsar-2.10.1

Thanks.

@mnit016
Copy link

mnit016 commented Oct 8, 2023

Hi there, I'm facing a similar issue in Pulsar-2.9.5, bookie ledger utilization increased to 99.9%. But I can't see any ledger information in error logs there
Could some one please tell me how to get over this issue? any workaround?

Details logs:
Several days ago, a lot of "Entering Safepoint region..." "Leaving safepoint region..." in logs
After a day, I found this "Exception ledger flush" / "Error in Rocksdb put" starting in logs.
image
And 4 hours later, it turned to a lots "Error during flush"
image
Above logs happen until "Ledger directory ... is out-of-space"
image

@jyotirmoy112
Copy link

This issue we also recently faced within our cluster. It happens if you restart any bookie without touching autorecovery. At that time if there is any ledger is trying to replicate it and if you don't have enough writeable bookie available. The replication goes into infinite loop even later all the bookie is available. Yes, this is a bug which needs a permanent fix. The temporary fix we applied figured out how many ledgers are making noise. Just make a note of all faulty ledgers and remove it safely. Afterthat the storage will start making free.
Capture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests