[6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" #795

nitinkmr333 · 2024-12-04T14:55:23Z

On multi-device filesystem, I have noticed that whenever background_target becomes full, there are constant heavy reads by the rebalance thread.

Steps to reproduce:

Create two loop devices. One will be used as foreground_target (disk0), other will be background_target (disk1)-

❯ mkdir -p ~/bcachefs
❯ cd ~/bcachefs
❯ dd if=/dev/zero of=disk0 bs=1G count=40 status=progress
42949672960 bytes (43 GB, 40 GiB) copied, 16 s, 2.7 GB/s
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 16.1028 s, 2.7 GB/s
❯ dd if=/dev/zero of=disk1 bs=1G count=40 status=progress
41875931136 bytes (42 GB, 39 GiB) copied, 15 s, 2.7 GB/s
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 15.7211 s, 2.7 GB/s

Here, both are 40GB disks.

Add them as loop devices (for mounting)-

❯ sudo losetup --find --show disk0
/dev/loop0
❯ sudo losetup --find --show disk1
/dev/loop1
❯ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0    40G  0 loop
loop1         7:1    0    40G  0 loop

Format the loop devices as bcachefs. disk0 label is ssd (foreground_target) & disk1 label is hdd (background_target)-

❯ sudo bcachefs format --label ssd /dev/loop0 --label hdd /dev/loop1 --foreground_target=ssd --background_target=hdd
External UUID:                             99e865e6-ee40-480a-bd5d-c2fb1b805583
Internal UUID:                             80195311-407a-492f-a297-5d2e3e78892d
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              1
Label:                                     (none)
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  0.0: (unknown version)
Oldest version on disk:                    1.13: inode_has_child_snapshots
Created:                                   Wed Dec  4 19:11:21 2024
Sequence number:                           0
Time of last write:                        Thu Jan  1 05:30:00 1970
Superblock size:                           1.25 KiB/1.00 MiB
Clean:                                     0
Devices:                                   2
Sections:                                  members_v1,disk_groups,members_v2
Features:                                  new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:

Options:
  block_size:                              512 B
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash
  data_checksum:                           none [crc32c] crc64 xxhash
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash]
  metadata_target:                         none
  foreground_target:                       ssd
  background_target:                       hdd
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none
  nocow:                                   0

members_v2 (size 304):
Device:                                    0
  Label:                                   ssd (0)
  UUID:                                    5819c971-b9fe-448a-b1d0-d488591e61f6
  Size:                                    40.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 163840
  Last mount:                              (never)
  Last superblock write:                   0
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   0
Device:                                    1
  Label:                                   hdd (1)
  UUID:                                    18483694-8f70-454e-a5cd-719c2499ac11
  Size:                                    40.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 163840
  Last mount:                              (never)
  Last superblock write:                   0
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   0
starting version 1.13: inode_has_child_snapshots opts=foreground_target=ssd,background_target=hdd
initializing new filesystem
going read-write
initializing freespace
shutdown complete, journal seq 16

Mount the filesystem and write 60GB file (bigger than background_target)-

❯ sudo bcachefs mount /dev/loop0:/dev/loop1 /mnt
❯ sudo dd if=/dev/zero of=/mnt/hugefile bs=1G count=60 status=progress
64424509440 bytes (64 GB, 60 GiB) copied, 75 s, 854 MB/s
60+0 records in
60+0 records out
64424509440 bytes (64 GB, 60 GiB) copied, 75.4496 s, 854 MB/s

bcachefs fs usage-

❯ sudo bcachefs fs usage /mnt -h
Filesystem: 99e865e6-ee40-480a-bd5d-c2fb1b805583
Size:                       73.6 GiB
Used:                       60.2 GiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/1             1             [loop0]              228 MiB
user:           1/1             1             [loop0]             21.6 GiB
user:           1/1             1             [loop1]             38.4 GiB
cached:         1/1             1             [loop0]             16.3 GiB

Btree usage:
extents:            87.0 MiB
inodes:              256 KiB
dirents:             256 KiB
alloc:              42.3 MiB
subvolumes:          256 KiB
snapshots:           256 KiB
lru:                2.75 MiB
freespace:           256 KiB
need_discard:        256 KiB
backpointers:       80.0 MiB
bucket_gens:         256 KiB
snapshot_trees:      256 KiB
rebalance_work:     13.8 MiB
accounting:          256 KiB

Pending rebalance work:
21.6 GiB

hdd (device 1):                loop1              rw
                                data         buckets    fragmented
  free:                     1.26 GiB            5177
  sb:                       3.00 MiB              13       252 KiB
  journal:                   320 MiB            1280
  btree:                         0 B               0
  user:                     38.4 GiB          157370
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 40.0 GiB          163840

ssd (device 0):                loop0              rw
                                data         buckets    fragmented
  free:                     1.27 GiB            5205
  sb:                       3.00 MiB              13       252 KiB
  journal:                   320 MiB            1280
  btree:                     228 MiB             912
  user:                     21.6 GiB           88390
  cached:                   16.3 GiB           66785       256 KiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:              314 MiB            1255
  unstriped:                     0 B               0
  capacity:                 40.0 GiB          163840

There is some pending rebalance work but background_target is full, so it cannot move the data. I can see rebalance thread doing constant reads even after data is written-

I expect some constant I/O by filesystem to check if background_target has free space, but 300+MB/s seems excessive. I tried waiting for more than an hour but it did not stop. It triggers again if I remount the drive. It only stops if I delete the file I created and free up the background_target.

Underlying filesystem (where loop devices are created) is btrfs (with compression=zstd:3).

Host-

Host- NixOS
❯ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.11.5-zen1, NixOS, 24.11 (Vicuna), 24.11.20241202.f9f0d5c`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.24.10`
 - nixpkgs: `/nix/store/45bzbkwnyb6nikgc7jkrn7vjibhy4xhk-source`

bcachefs-tools version- 6.13.0

I will do some more testing on actual hardware.

The text was updated successfully, but these errors were encountered:

nitinkmr333 · 2024-12-12T15:48:02Z

I can confirm this also happens on actual hardware. There are heavy reads when background target is full. Writes are unaffected.

nitinkmr333 · 2024-12-29T09:51:15Z

It looks like the issue is related to Pending rebalance work and not just background_target itself. We face this bug if there is Pending rebalance work that needs to be done, but cannot be completed for some reason (maybe we are constantly rescanning the pending rebalance, resulting in I/O?).

For example, we can create a filesystem with 2 disks (foreground_target=ssd, background_target=hdd, replicas=1), and write some data with data_replicas=2-

❯ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0  1000M  0 loop /mnt
loop1         7:1    0  1000M  0 loop

show-super-

❯ sudo bcachefs show-super /dev/loop0
Device:                                     (unknown device)
External UUID:                             a7f2bff7-29ee-4e49-9e01-0cbe16c7332a
Internal UUID:                             63201fc7-0bff-4422-baa4-d243bc81a483
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              0
Label:                                     (none)
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  1.13: inode_has_child_snapshots
Oldest version on disk:                    1.13: inode_has_child_snapshots
Created:                                   Sun Dec 29 14:58:13 2024
Sequence number:                           20
Time of last write:                        Sun Dec 29 15:01:46 2024
Superblock size:                           4.67 KiB/1.00 MiB
Clean:                                     0
Devices:                                   2
Sections:                                  members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              512 B
  btree_node_size:                         128 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash] 
  metadata_target:                         none
  foreground_target:                       ssd
  background_target:                       hdd
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 304):
Device:                                    0
  Label:                                   ssd (0)
  UUID:                                    72944387-5f76-407e-8152-6e25d95d8cc3
  Size:                                    1000 MiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             128 KiB
  First bucket:                            0
  Buckets:                                 8000
  Last mount:                              Sun Dec 29 15:01:46 2024
  Last superblock write:                   20
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        4.00 KiB
  Btree allocated bitmap:                  0000010000000000000000000000000000000000000000000000000001100000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    1
  Label:                                   hdd (1)
  UUID:                                    75c87134-d657-4ae5-91aa-4fff722d2a11
  Size:                                    1000 MiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             128 KiB
  First bucket:                            0
  Buckets:                                 8000
  Last mount:                              Sun Dec 29 15:01:46 2024
  Last superblock write:                   20
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                user
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1

Now, write some data to a folder having data_replicas=2 (using xattr)-

cd /mnt
sudo mkdir data_xattr
sudo bcachefs set-file-option --data_replicas=2 data_xattr
sudo dd if=/dev/zero of=data_xattr/file bs=200M count=1 status=progress

We have enough free space in the background_target but can only store 1 replica, hence there is Pending rebalance work-

❯ sudo bcachefs fs usage -h /mnt
Filesystem: a7f2bff7-29ee-4e49-9e01-0cbe16c7332a
Size:                       1.80 GiB
Used:                        403 MiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/1             1             [loop0]             3.13 MiB
user:           1/2             2             [loop0 loop1]        400 MiB

Btree usage:
extents:             512 KiB
inodes:              128 KiB
dirents:             128 KiB
alloc:               640 KiB
subvolumes:          128 KiB
snapshots:           128 KiB
lru:                 128 KiB
freespace:           128 KiB
need_discard:        128 KiB
backpointers:        640 KiB
bucket_gens:         128 KiB
snapshot_trees:      128 KiB
rebalance_work:      128 KiB
accounting:          128 KiB

Pending rebalance work:
200 MiB

hdd (device 1):                loop1              rw
                                data         buckets    fragmented
  free:                      789 MiB            6313
  sb:                       3.00 MiB              25       124 KiB
  journal:                  7.75 MiB              62
  btree:                         0 B               0
  user:                      200 MiB            1600
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 1000 MiB            8000

ssd (device 0):                loop0              rw
                                data         buckets    fragmented
  free:                      786 MiB            6288
  sb:                       3.00 MiB              25       124 KiB
  journal:                  7.75 MiB              62
  btree:                    3.13 MiB              25
  user:                      200 MiB            1600
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 1000 MiB            8000

This causes heavy reads on filesystem.

Modoh · 2025-01-18T23:34:37Z

I have something similar to this, but with a few different elements which I think are worth mentioning.

I had an array which was in this state, because I added an 8TB drive to the array without enough space on the other drives for it to mirror to. It was stuck constantly reading, with 4.5TB of pending rebalance work which wasn't decreasing.

Setting background_target to none didn't stop it, nor did adding 2 more drives to the array giving it plenty of extra space.

After rebooting, it seems like it figured out that it had more space and offloaded all of the user data from the ssds to the hdds, but after it finished doing that there's still 4TB of pending rebalance which isn't going down. It has stopped constantly reading at least.

kernel:
6.12.9-zen1-1-zen

bcachefs fs usage -h:

Size:                       36.0 TiB
Used:                       23.1 TiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/2             2             [sde sdf]           33.0 GiB
btree:          1/2             2             [sde sdg]           2.50 MiB
btree:          1/2             2             [sde sdb]           1.00 MiB
btree:          1/2             2             [sde sdi]           15.0 MiB
btree:          1/2             2             [sde sdh]           7.36 GiB
btree:          1/2             2             [sde sdd]           38.6 GiB
btree:          1/2             2             [sdf sdg]           4.00 MiB
btree:          1/2             2             [sdf sdb]           1.00 MiB
btree:          1/2             2             [sdf sdi]           15.0 MiB
btree:          1/2             2             [sdf sdh]           7.04 GiB
btree:          1/2             2             [sdf sdd]           37.1 GiB
user:           1/2             2             [sde sdf]           4.00 KiB
user:           1/2             2             [sdf sdg]           8.00 KiB
user:           1/2             2             [sdg sdb]            928 GiB
user:           1/2             2             [sdg sdi]           16.3 GiB
user:           1/2             2             [sdg sdh]           97.5 GiB
user:           1/2             2             [sdg sdc]           2.56 TiB
user:           1/2             2             [sdg sdd]            508 MiB
user:           1/2             2             [sdb sdi]           7.06 TiB
user:           1/2             2             [sdb sdh]           3.95 TiB
user:           1/2             2             [sdb sdc]           2.36 TiB
user:           1/2             2             [sdb sdd]            235 MiB
user:           1/2             2             [sdi sdh]           23.8 GiB
user:           1/2             2             [sdi sdc]           42.8 GiB
user:           1/2             2             [sdi sdd]           63.0 MiB
user:           1/2             2             [sdh sdc]           3.04 TiB
user:           1/2             2             [sdh sdd]            232 MiB
user:           1/2             2             [sdc sdd]           2.91 TiB
cached:         1/1             1             [sde]                379 GiB
cached:         1/1             1             [sdf]                362 GiB
cached:         1/1             1             [sdd]               3.68 TiB

Compression:
type              compressed    uncompressed     average extent size
zstd                1.45 TiB        2.11 TiB                 161 KiB
incompressible      29.7 TiB        29.7 TiB                 150 KiB

Btree usage:
extents:            36.5 GiB
inodes:              660 MiB
dirents:             229 MiB
xattrs:             57.5 MiB
alloc:              10.0 GiB
subvolumes:          512 KiB
snapshots:           512 KiB
lru:                 449 MiB
freespace:          5.50 MiB
need_discard:       1.00 MiB
backpointers:       71.6 GiB
bucket_gens:        46.0 MiB
snapshot_trees:      512 KiB
deleted_inodes:      512 KiB
logged_ops:          512 KiB
rebalance_work:     3.42 GiB
subvolume_children:  512 KiB
accounting:          231 MiB

Pending rebalance work:
4.07 TiB

hdd.red12a (device 9):           sdc              rw
                                data         buckets    fragmented
  free:                     5.45 TiB         5713338
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                         0 B               0
  user:                     5.46 TiB         5722689      27.6 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:             1.00 MiB               1
  unstriped:                     0 B               0
  capacity:                 10.9 TiB        11444224

hdd.red12b (device 10):          sdd              rw
                                data         buckets    fragmented
  free:                     5.48 TiB         5750906
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                    37.9 GiB           38896       113 MiB
  user:                     1.46 TiB         1526716      27.1 MiB
  cached:                   3.68 TiB         4119510       252 GiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 10.9 TiB        11444224

hdd.red2t (device 2):            sdg              rw
                                data         buckets    fragmented
  free:                     30.5 GiB          124889
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                    3.25 MiB              13
  user:                     1.79 TiB         7497809      17.9 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 1.82 TiB         7630916

hdd.red4 (device 8):             sdh              rw
                                data         buckets    fragmented
  free:                     68.3 GiB           69965
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                    7.20 GiB            7373       256 KiB
  user:                     3.56 TiB         3729913      3.90 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 3.64 TiB         3815447

hdd.smr4 (device 7):             sdi              rw
                                data         buckets    fragmented
  free:                     60.5 GiB           61963
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                    15.0 MiB              15
  user:                     3.57 TiB         3745273       612 KiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 3.64 TiB         3815447

hdd.ultra8 (device 6):           sdb              rw
                                data         buckets    fragmented
  free:                      132 GiB          135622
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                    1.00 MiB               1
  user:                     7.14 TiB         7487066      3.54 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 7.28 TiB         7630885

ssd.ct480m50 (device 1):         sdf              rw
                                data         buckets    fragmented
  free:                     17.5 GiB           71592
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                    38.6 GiB          157996
  user:                     6.00 KiB               2       506 KiB
  cached:                    362 GiB         1593652      26.9 GiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:             1.00 MiB               4
  unstriped:                     0 B               0
  capacity:                  447 GiB         1831451

ssd.ct500bx (device 0):          sde              rw
                                data         buckets    fragmented
  free:                     18.2 GiB           74567
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                    39.5 GiB          161788
  user:                     2.00 KiB               1       254 KiB
  cached:                    379 GiB         1663190      27.5 GiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:             2.25 MiB               9
  unstriped:                     0 B               0
  capacity:                  466 GiB         1907760

nitinkmr333 · 2025-01-19T12:10:37Z

@Modoh Is your Pending rebalance work still stuck, or is it going down?
Also, have you set any filesystem options on specific files/folders using extended attributes (with bcachefs set-file-option ...)? It is possible you might have set up different filesystem on some folders, causing rebalance work to not go down. You can check it with getfattr -d -m 'bcachefs_effective.' <folder>.

hdd.red12a and hdd.red12b has enough space so ideally rebalance should go down. Can you share output of bcachefs show-super /dev/<drive>?

Modoh · 2025-01-19T16:48:39Z

Pending rebalance is still stuck:

bcachefs fs usage |grep "Pending rebalance" -A 1

4480283013120

I have seen it go up from there and back down, but never below that point.

I did run a rereplicate at some point hoping it would clear things up, which explains why all the data has mirrors even though the rebalance work was never done.

I've never messed with specific file/folder fs attributes.

bcachefs show-super /dev/sdc

External UUID:                             2a54bce9-9c32-48a3-985e-19b7f94339d1
Internal UUID:                             2bc64d5e-a46e-49da-848b-ea54d37b425a
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              9
Label:                                     (none)
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  1.13: inode_has_child_snapshots
Oldest version on disk:                    1.13: inode_has_child_snapshots
Created:                                   Sat Jan 11 02:04:31 2025
Sequence number:                           348
Time of last write:                        Sat Jan 18 18:42:11 2025
Superblock size:                           7.10 KiB/1.00 MiB
Clean:                                     0
Devices:                                   8
Sections:                                  members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  zstd,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro
  metadata_replicas:                       2
  data_replicas:                           2
  metadata_replicas_required:              2
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash
  data_checksum:                           none [crc32c] crc64 xxhash
  compression:                             zstd:3
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash]
  metadata_target:                         ssd
  foreground_target:                       none
  background_target:                       hdd
  promote_target:                          ssd
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none
  nocow:                                   0

members_v2 (size 1600):
Device:                                    0
  Label:                                   ct500bx (1)
  UUID:                                    ea436059-fd7d-431b-b7bf-a5e0962100a2
  Size:                                    466 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 1907760
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        16.0 MiB
  Btree allocated bitmap:                  0000000000000000000000000010000000000000010000000010011011011111
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    1
  Label:                                   ct480m50 (2)
  UUID:                                    822d0dc3-5296-48bc-bf43-07300e7ceb95
  Size:                                    447 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 1831451
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        16.0 MiB
  Btree allocated bitmap:                  0000000000000000000000000000010000000000010000011101000101111111
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    2
  Label:                                   red2t (4)
  UUID:                                    e05edce2-501c-4855-bc89-05ba8615e8cf
  Size:                                    1.82 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 7630916
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                btree,user
  Btree allocated bitmap blocksize:        128 KiB
  Btree allocated bitmap:                  0000000000000000000000000000001100000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    6
  Label:                                   ultra8 (7)
  UUID:                                    432b2bba-bafe-452d-a452-9fa97990a21c
  Size:                                    7.28 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             1.00 MiB
  First bucket:                            0
  Buckets:                                 7630885
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        32.0 MiB
  Btree allocated bitmap:                  0000000000000000001000000000000000000001000000100001000100011011
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    7
  Label:                                   smr4 (8)
  UUID:                                    fe43fe38-e267-46df-95de-8d826cec1b52
  Size:                                    3.64 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             1.00 MiB
  First bucket:                            0
  Buckets:                                 3815447
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        1.00 MiB
  Btree allocated bitmap:                  0000010000000000000000011000000000011000000000010000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    8
  Label:                                   red4 (9)
  UUID:                                    bc9150fb-85ba-4a4c-a64e-910d525110aa
  Size:                                    3.64 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             1.00 MiB
  First bucket:                            0
  Buckets:                                 3815447
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        128 MiB
  Btree allocated bitmap:                  0000000000000010000000000000000000000000000000001100000100010111
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    9
  Label:                                   red12a (10)
  UUID:                                    d3f12b8b-e7c5-45d9-95f5-b9dc0fa5f9ed
  Size:                                    10.9 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             1.00 MiB
  First bucket:                            0
  Buckets:                                 11444224
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,user
  Btree allocated bitmap blocksize:        512 KiB
  Btree allocated bitmap:                  0000000000000000000000000000000100000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    10
  Label:                                   red12b (11)
  UUID:                                    ba727bb1-6a11-4d08-86ed-b8e43d9bd967
  Size:                                    10.9 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             1.00 MiB
  First bucket:                            0
  Buckets:                                 11444224
  Last mount:                              Sat Jan 18 17:28:52 2025
  Last superblock write:                   348
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        256 MiB
  Btree allocated bitmap:                  0000000000000000000000010000000010000000000000000001100000010011
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1

errors (size 8):

koverstreet · 2025-01-19T22:42:07Z

I need someone to reproduce this on my master branch, which has improved rebalance tracepoints that'll tell us what rebalance is trying to do

…

On Sun, Jan 19, 2025, 10:49 AM Modoh ***@***.***> wrote: Pending rebalance is still stuck: bcachefs fs usage |grep "Pending rebalance" -A 1 4480283013120 I have seen it go up from there and back down, but never below that point. I did run a rereplicate at some point hoping it would clear things up, which explains why all the data has mirrors even though the rebalance work was never done. I've never messed with specific file/folder fs attributes. bcachefs show-super /dev/sdc External UUID: 2a54bce9-9c32-48a3-985e-19b7f94339d1 Internal UUID: 2bc64d5e-a46e-49da-848b-ea54d37b425a Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef Device index: 9 Label: (none) Version: 1.13: inode_has_child_snapshots Version upgrade complete: 1.13: inode_has_child_snapshots Oldest version on disk: 1.13: inode_has_child_snapshots Created: Sat Jan 11 02:04:31 2025 Sequence number: 348 Time of last write: Sat Jan 18 18:42:11 2025 Superblock size: 7.10 KiB/1.00 MiB Clean: 0 Devices: 8 Sections: members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2,errors,ext,downgrade Features: zstd,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done Options: block_size: 4.00 KiB btree_node_size: 256 KiB errors: continue [fix_safe] panic ro metadata_replicas: 2 data_replicas: 2 metadata_replicas_required: 2 data_replicas_required: 1 encoded_extent_max: 64.0 KiB metadata_checksum: none [crc32c] crc64 xxhash data_checksum: none [crc32c] crc64 xxhash compression: zstd:3 background_compression: none str_hash: crc32c crc64 [siphash] metadata_target: ssd foreground_target: none background_target: hdd promote_target: ssd erasure_code: 0 inodes_32bit: 1 shard_inode_numbers: 1 inodes_use_key_cache: 1 gc_reserve_percent: 8 gc_reserve_bytes: 0 B root_reserve_percent: 0 wide_macs: 0 promote_whole_extents: 1 acl: 1 usrquota: 0 grpquota: 0 prjquota: 0 journal_flush_delay: 1000 journal_flush_disabled: 0 journal_reclaim_delay: 100 journal_transaction_names: 1 allocator_stuck_timeout: 30 version_upgrade: [compatible] incompatible none nocow: 0 members_v2 (size 1600): Device: 0 Label: ct500bx (1) UUID: ea436059-fd7d-431b-b7bf-a5e0962100a2 Size: 466 GiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 1907760 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached Btree allocated bitmap blocksize: 16.0 MiB Btree allocated bitmap: 0000000000000000000000000010000000000000010000000010011011011111 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 1 Label: ct480m50 (2) UUID: 822d0dc3-5296-48bc-bf43-07300e7ceb95 Size: 447 GiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 1831451 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached Btree allocated bitmap blocksize: 16.0 MiB Btree allocated bitmap: 0000000000000000000000000000010000000000010000011101000101111111 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 2 Label: red2t (4) UUID: e05edce2-501c-4855-bc89-05ba8615e8cf Size: 1.82 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 7630916 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: btree,user Btree allocated bitmap blocksize: 128 KiB Btree allocated bitmap: 0000000000000000000000000000001100000000000000000000000000000000 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 6 Label: ultra8 (7) UUID: 432b2bba-bafe-452d-a452-9fa97990a21c Size: 7.28 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 7630885 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,btree,user Btree allocated bitmap blocksize: 32.0 MiB Btree allocated bitmap: 0000000000000000001000000000000000000001000000100001000100011011 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 7 Label: smr4 (8) UUID: fe43fe38-e267-46df-95de-8d826cec1b52 Size: 3.64 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 3815447 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,btree,user Btree allocated bitmap blocksize: 1.00 MiB Btree allocated bitmap: 0000010000000000000000011000000000011000000000010000000000000000 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 8 Label: red4 (9) UUID: bc9150fb-85ba-4a4c-a64e-910d525110aa Size: 3.64 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 3815447 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,btree,user Btree allocated bitmap blocksize: 128 MiB Btree allocated bitmap: 0000000000000010000000000000000000000000000000001100000100010111 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 9 Label: red12a (10) UUID: d3f12b8b-e7c5-45d9-95f5-b9dc0fa5f9ed Size: 10.9 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 11444224 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,user Btree allocated bitmap blocksize: 512 KiB Btree allocated bitmap: 0000000000000000000000000000000100000000000000000000000000000000 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 10 Label: red12b (11) UUID: ba727bb1-6a11-4d08-86ed-b8e43d9bd967 Size: 10.9 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 11444224 Last mount: Sat Jan 18 17:28:52 2025 Last superblock write: 348 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached Btree allocated bitmap blocksize: 256 MiB Btree allocated bitmap: 0000000000000000000000010000000010000000000000000001100000010011 Durability: 1 Discard: 0 Freespace initialized: 1 errors (size 8): — Reply to this email directly, view it on GitHub <#795 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPGX3TYQQL3NHWPUFYWQSD2LPJQBAVCNFSM6AAAAABTALOT46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQHEZTQOJVGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

nitinkmr333 mentioned this issue Dec 12, 2024

[6.11,6.12] Constant I/O (rebalance) when foreground 2x nvme + background 2x HDD when nvme size >> HDD size #799

Open

nitinkmr333 changed the title ~~[6.11] Constant heavy reads when background_target is full~~ [6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" Dec 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" #795

[6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" #795

nitinkmr333 commented Dec 4, 2024 •

edited

Loading

nitinkmr333 commented Dec 12, 2024 •

edited

Loading

nitinkmr333 commented Dec 29, 2024 •

edited

Loading

Modoh commented Jan 18, 2025

nitinkmr333 commented Jan 19, 2025

Modoh commented Jan 19, 2025

koverstreet commented Jan 19, 2025 via email

[6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" #795

[6.11, 6.12] Constant heavy reads when there is unfinishable "Pending rebalance work" #795

Comments

nitinkmr333 commented Dec 4, 2024 • edited Loading

nitinkmr333 commented Dec 12, 2024 • edited Loading

nitinkmr333 commented Dec 29, 2024 • edited Loading

Modoh commented Jan 18, 2025

nitinkmr333 commented Jan 19, 2025

Modoh commented Jan 19, 2025

koverstreet commented Jan 19, 2025 via email

nitinkmr333 commented Dec 4, 2024 •

edited

Loading

nitinkmr333 commented Dec 12, 2024 •

edited

Loading

nitinkmr333 commented Dec 29, 2024 •

edited

Loading