-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" #795
Comments
I can confirm this also happens on actual hardware. There are heavy reads when background target is full. Writes are unaffected. |
It looks like the issue is related to For example, we can create a filesystem with 2 disks (
show-super-
Now, write some data to a folder having
We have enough free space in the
This causes heavy reads on filesystem. |
I have something similar to this, but with a few different elements which I think are worth mentioning. I had an array which was in this state, because I added an 8TB drive to the array without enough space on the other drives for it to mirror to. It was stuck constantly reading, with 4.5TB of pending rebalance work which wasn't decreasing. Setting background_target to none didn't stop it, nor did adding 2 more drives to the array giving it plenty of extra space. After rebooting, it seems like it figured out that it had more space and offloaded all of the user data from the ssds to the hdds, but after it finished doing that there's still 4TB of pending rebalance which isn't going down. It has stopped constantly reading at least. kernel: bcachefs fs usage -h:
|
@Modoh Is your
|
Pending rebalance is still stuck: bcachefs fs usage |grep "Pending rebalance" -A 1
I have seen it go up from there and back down, but never below that point. I did run a rereplicate at some point hoping it would clear things up, which explains why all the data has mirrors even though the rebalance work was never done. I've never messed with specific file/folder fs attributes. bcachefs show-super /dev/sdc
|
I need someone to reproduce this on my master branch, which has improved
rebalance tracepoints that'll tell us what rebalance is trying to do
…On Sun, Jan 19, 2025, 10:49 AM Modoh ***@***.***> wrote:
Pending rebalance is still stuck:
bcachefs fs usage |grep "Pending rebalance" -A 1
4480283013120
I have seen it go up from there and back down, but never below that point.
I did run a rereplicate at some point hoping it would clear things up,
which explains why all the data has mirrors even though the rebalance work
was never done.
I've never messed with specific file/folder fs attributes.
bcachefs show-super /dev/sdc
External UUID: 2a54bce9-9c32-48a3-985e-19b7f94339d1
Internal UUID: 2bc64d5e-a46e-49da-848b-ea54d37b425a
Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index: 9
Label: (none)
Version: 1.13: inode_has_child_snapshots
Version upgrade complete: 1.13: inode_has_child_snapshots
Oldest version on disk: 1.13: inode_has_child_snapshots
Created: Sat Jan 11 02:04:31 2025
Sequence number: 348
Time of last write: Sat Jan 18 18:42:11 2025
Superblock size: 7.10 KiB/1.00 MiB
Clean: 0
Devices: 8
Sections: members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features: zstd,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue [fix_safe] panic ro
metadata_replicas: 2
data_replicas: 2
metadata_replicas_required: 2
data_replicas_required: 1
encoded_extent_max: 64.0 KiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
compression: zstd:3
background_compression: none
str_hash: crc32c crc64 [siphash]
metadata_target: ssd
foreground_target: none
background_target: hdd
promote_target: ssd
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
promote_whole_extents: 1
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
allocator_stuck_timeout: 30
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 1600):
Device: 0
Label: ct500bx (1)
UUID: ea436059-fd7d-431b-b7bf-a5e0962100a2
Size: 466 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 1907760
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Btree allocated bitmap blocksize: 16.0 MiB
Btree allocated bitmap: 0000000000000000000000000010000000000000010000000010011011011111
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 1
Label: ct480m50 (2)
UUID: 822d0dc3-5296-48bc-bf43-07300e7ceb95
Size: 447 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 1831451
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Btree allocated bitmap blocksize: 16.0 MiB
Btree allocated bitmap: 0000000000000000000000000000010000000000010000011101000101111111
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 2
Label: red2t (4)
UUID: e05edce2-501c-4855-bc89-05ba8615e8cf
Size: 1.82 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 7630916
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: btree,user
Btree allocated bitmap blocksize: 128 KiB
Btree allocated bitmap: 0000000000000000000000000000001100000000000000000000000000000000
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 6
Label: ultra8 (7)
UUID: 432b2bba-bafe-452d-a452-9fa97990a21c
Size: 7.28 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 7630885
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user
Btree allocated bitmap blocksize: 32.0 MiB
Btree allocated bitmap: 0000000000000000001000000000000000000001000000100001000100011011
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 7
Label: smr4 (8)
UUID: fe43fe38-e267-46df-95de-8d826cec1b52
Size: 3.64 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 3815447
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user
Btree allocated bitmap blocksize: 1.00 MiB
Btree allocated bitmap: 0000010000000000000000011000000000011000000000010000000000000000
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 8
Label: red4 (9)
UUID: bc9150fb-85ba-4a4c-a64e-910d525110aa
Size: 3.64 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 3815447
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user
Btree allocated bitmap blocksize: 128 MiB
Btree allocated bitmap: 0000000000000010000000000000000000000000000000001100000100010111
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 9
Label: red12a (10)
UUID: d3f12b8b-e7c5-45d9-95f5-b9dc0fa5f9ed
Size: 10.9 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 11444224
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,user
Btree allocated bitmap blocksize: 512 KiB
Btree allocated bitmap: 0000000000000000000000000000000100000000000000000000000000000000
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 10
Label: red12b (11)
UUID: ba727bb1-6a11-4d08-86ed-b8e43d9bd967
Size: 10.9 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 11444224
Last mount: Sat Jan 18 17:28:52 2025
Last superblock write: 348
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Btree allocated bitmap blocksize: 256 MiB
Btree allocated bitmap: 0000000000000000000000010000000010000000000000000001100000010011
Durability: 1
Discard: 0
Freespace initialized: 1
errors (size 8):
—
Reply to this email directly, view it on GitHub
<#795 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPGX3TYQQL3NHWPUFYWQSD2LPJQBAVCNFSM6AAAAABTALOT46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQHEZTQOJVGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
On multi-device filesystem, I have noticed that whenever background_target becomes full, there are constant heavy reads by the rebalance thread.
Steps to reproduce:
Create two loop devices. One will be used as foreground_target (disk0), other will be background_target (disk1)-
Here, both are 40GB disks.
Add them as loop devices (for mounting)-
Format the loop devices as bcachefs.
disk0
label isssd
(foreground_target) &disk1
label ishdd
(background_target)-Mount the filesystem and write 60GB file (bigger than background_target)-
bcachefs fs usage-
There is some pending rebalance work but background_target is full, so it cannot move the data. I can see rebalance thread doing constant reads even after data is written-
I expect some constant I/O by filesystem to check if background_target has free space, but 300+MB/s seems excessive. I tried waiting for more than an hour but it did not stop. It triggers again if I remount the drive. It only stops if I delete the file I created and free up the background_target.
Underlying filesystem (where loop devices are created) is
btrfs
(with compression=zstd:3).Host-
I will do some more testing on actual hardware.
The text was updated successfully, but these errors were encountered: