-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail latencies of IO operations #16844
Comments
Side note: |
after reboot, created testpool using
then run this command 2 times:
output
|
In ZFS, due to its transactional nature and as result heavy write buffering, write latency is defined not so much by disk write latency, as by artificial write throttling algorithm it implements to keep latency more or less constant within several seconds of transaction group commit process. But it would be good to investigate those outliers indeed. It may be the throttling needs to be tuned, or there is something else unrelated. |
I tried adjusting This also affected the percentiles. This time, they look much closer to those of ext4 and provide better throughput than zfs + writes fsync (even though at 15x worse latencies at the 95th percentile). However, I see a potential issue with this approach:
After tuning
vs before tuning
vs fsync after each file
fio
zpool iostat -w -T d testpool 5
Same happens if I tune fio
zpool iostat -w -T d testpool 5
Edit: pool was created like this |
For comparison, here are the jobfile.fio changes
No fsync writes ext4: lat (usec): min=31, max=53265, avg=675.26, stdev=4421.49
|
System information
Describe the problem you're observing
The disk in use is a SAMSUNG MZQL2960HCJR-00A07 (Sequential Write: 1500 MB/s). While using ZFS on this disk, I encountered that tail latencies for I/O operations are excessively high (0.9-1.2 seconds) when performing sequential writes without fsync.
These issues are not reproducible when using ext4 on the same disk. Encryption is disabled in both cases. Could you provide any guidelines to reduce these latencies?
smartctl output
Describe how to reproduce the problem
I used next fio configuration to measure the maximal latencies without fsync (section dirty-writers-thoughput) and with fsync(fsync-writers-thoughput). Each test starts with fs recreation, dropping system caches and fstrim.
jobfile.fio
zfs pool was created with the following command:
Below is the properties set for the pool.
zfs get all testpool
No fsync writes
In this test, fio writes 10 MiB files using 10 threads without fsync. Each thread generates 200 files. The difference in maximum latency between ZFS and EXT4 is significant—241.7x (918.5 ms vs. 3.1 ms). And it is not even a bad case scenarion. The highest latency observed for ZFS was 1.2 seconds.
zfs: bw=1589MiB/s and lat (usec): min=10, max=918533, avg=731.80, stdev=22025.52
ext4: bw=1437MiB/s and lat (usec): min=65, max=3086, avg=846.51, stdev=297.89
Fsync writes
The same test as above is conducted, but this time an fsync is performed after each file is written.
zfs: bw=1210MiB/s and lat (usec): min=9, max=2205, avg=19.47, stdev=35.62
ext4: bw=1421MiB/s and lat (usec): min=67, max=4474, avg=840.67, stdev=318.26
The text was updated successfully, but these errors were encountered: