-
Notifications
You must be signed in to change notification settings - Fork 0
[pull] master from axboe:master #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
1,310
commits into
kubestone:master
Choose a base branch
from
axboe:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…roact-de/fio * 'iouring-spellingfix-2025-03-18' of https://github.com/proact-de/fio: Fix spelling error in IO uring engine.
The image used by GitHub-hosted runners changed the default kvm device permissions recently rendering us no longer able to start guest VMs. The error message: Could not access KVM kernel module: Permission denied qemu-system-x86_64: failed to initialize kvm: Permission denied Working run: https://github.com/fiotestbot/fio/actions/runs/14186873066 Failed run: https://github.com/fiotestbot/fio/actions/runs/14211189491 Explicitly give the GitHub Actions runner user permission to access the /dev/kvm device following the guide at https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/ Signed-off-by: Vincent Fu <[email protected]>
Fio has the ability to verify trim operations by running a verify workload and setting the trim_percentage, trim_backlog, and trim_verify_zero options. Some of the written blocks will then be trimmed and then read back to see if they are zeroed out. This patch changes fio_ro_check to allow trim operations when fio is running a verify/trim workload. Fixes: 196ccc4 ("fio.h: also check trim operations in fio_ro_check") Signed-off-by: Vincent Fu <[email protected]>
The trim bit in td_ddir is not set when trim_percentage/backlog is enabled yet fio still issues trim operations. Detect these cases and produce output describing trim operations if we issued any. This is similar to the fix (615c794) committed for verify_backlog. Signed-off-by: Vincent Fu <[email protected]>
Fio may issue trim commands for a verify/trim job. Abort and print an error message if this type of job is run with the --readonly option. Signed-off-by: Vincent Fu <[email protected]>
If we have drained the list of trim operations but its original contents were fewer than a full batch we should zero out the running batch count to make sure that we issue another full set of trim_backlog write operations before considering trims again. Otherwise we will immediately trim after each subsequent write operation until we have met the batch size requirement. Signed-off-by: Vincent Fu <[email protected]>
In order to detect when we are at the beginning of a trim phase we check io_hist_len and should check that the previous operation was not a *trim* (instead of not a read). Without this change trim_backlog_batch will have no effect because after one batch is done, fio will simply start a new batch because io_hist_len is still a multiple of trim_backlog and the last operation in a batch was a trim which is not a read. For check_get_verify checking against read is appropriate but for check_get_trim we must check against a trim. Also we need to decrement the trim_batch count for the first trim operation we send through. Signed-off-by: Vincent Fu <[email protected]>
Fio can verify trim operations. This script adds some simple test cases for this feature. Signed-off-by: Vincent Fu <[email protected]>
On GitHub Actions we cannot insert kernel modules, so skip this script on tests that run with pull requests and after every push. Instead run this test with our nightly tests that run in a QEMU environment. Signed-off-by: Vincent Fu <[email protected]>
Currently, when a write target zone has fewer remainder sectors than the block size, fio finishes the zone to make the zone inactive (not open), so that another zone can be open and used as a write target zone. This zone finish operation is implemented in zbd_adjust_block(). However, this placement is less ideal because zbd_adjust_block() manages not just write requests but also read and trim requests. Since the zone finish operation is exclusively necessary for write requests, implement it into zbd_convert_to_write_zone(). While at it, improve the function comment. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
To prepare for the following fix, factor out a part of zbd_convert_to_write_zone() to the new function zbd_pick_write_zone(). This function randomly chooses a zone in the array of write zones. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
When a random write target offset points to a zone that is not writable, zbd_convert_to_write_zone() attempts to convert the write offset to a different, writable zone. However, the conversion fails when all of the following conditions are met: 1) the workload has the max_open_zones limit 2) every write target zones, up to the max_open_zones limit, has remainder sectors smaller than the block size 3) the next random write request targets a zone not in the write target zone list In this case, zbd_convert_to_write_zone() can not open another zone without exceeding the max_open_zones constraint. Therefore, It does not convert the write to a different zone printing with the debug message "did not choose another write zone". This leads to an unexpected stop of the random write workload. To prevent the unexpected write stop, finish one of the write target zones with small remainder sectors. Check if all write target zones have small remainder, and store the result in the new local boolean variable all_write_zones_have_small_remainder. When this condition is true, choose one of the write target zones and finish it. Then return the zone from zbd_convert_to_write_zone() enabling the write process to continue. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
The previous commit fixed the unexpected write stop when all write target zones have small remainder sectors to write. Add a test case to confirm the fix. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
The Windows poll function does not clear revents field before it is populated. As a result, subsequent calls to poll using the same pollfd reference return with revents set even when there is nothing available to read. This later results in a hang in recv(). Signed-off-by: James Rizzo <[email protected]>
* 'master' of https://github.com/blah325/fio: Fix hang on Windows when multiple --client args are present
As a preparation for continue_on_error option support for zonemode=zbd, introduce a new function blkzoned_move_zone_wp(). It moves the write pointer by data write. If data buffer is provided, call pwrite() system call. If data buffer is not provided, call fallocate() to write zero data. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
As a preparation for continue_on_error option support for zonemode=zbd, introduce a new callback move_zone_wp() for the IO engines. It moves the write pointer by writing data in the specified buffer. Also bump up FIO_IOOPS_VERSION to note that the new callback is added. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
As a preparation for continue_on_error option support for zonemode=zbd, implement move_zone_wp() callback for libzbc IO engine. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
As a preparation for continue_on_error option support for zonemode=zbd, introduce the function zbd_move_zone_wp(). It moves write pointers by calling blkzoned_move_zone_wp() or move_zone_wp() callback of IO engines. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
When the continue_on_error options is specified, it is expected that the workload continues to run when non-critical errors happen. However, write workloads with zonemode=zbd option can not continue after errors, if the failed writes cause partial data write on the target device. This partial write creates write pointer gap between the device and fio, then the next write requests by fio will fail due to unaligned write command errors. This restriction results in undesirable test stops during long runs for SMR drives which can recover defect sectors. To allow the write workloads with zonemode=zbd to continue after write failures with partial data writes, introduce the new option recover_zbd_write_error. When this option is specified together with the continue_on_error option, fio checks the write pointer positions of the write target zones in the error handling step. Then fix the write pointer by moving it to the position that the failed writes would have moved. Bump up FIO_SERVER_VER to note that the new option is added. For that purpose, add a new function zbd_recover_write_error(). Call it from zbd_queue_io() for sync IO engines, and from io_completed() for async IO engines. Modify zbd_queue_io() to pass the pointer to the status so that zbd_recover_write_error() can modify the status to ignore the errors. Add three fields to struct fio_zone_info. The two new fields writes_in_flight and max_write_error_offset track status of in-flight writes at the write error, so that the write pointer positions can be fixed after the in-flight writes completed. The field fixing_zone_wp stores that the write pointer fix is ongoing, then prohibit the new writes get issued to the zone. When the failed write is synchronous, the write pointer fix is done by writing the left data for the failed write. This keeps the verify patterns written to the device, then verify works together with the continue_on_zbd_write_error option. When the failed write is asynchronous, other in-flight writes fail together. In this case, fio waits for all in-flight writes complete then fix the write pointer. Then verify data of the failed writes are lost and verify does not work. Check the continue_on_zbd_write_error option is not specified together with the verify workload and asynchronous IO engine. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
As a preparation to add test cases which check that the continue_on_error option and the recover_zbd_write_error option work when bad blocks cause IO errors, set additional null_blk parameters badblocks_once and badblocks_partial_io. These parameters were added to Linux kernel version 6.15-rc1 and allows more realistic scenario of write failures on zoned block devices. The former parameter makes the specified badblocks recover after the first write, and the latter parameter leaves partially written data on the device. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
When the continue_on_error option is specified, it is expected that write workloads do not stop even when bad blocks cause IO errors and leave partially written data. Add a test cases to confirm it with zonemode=zbd and the new option recover_zbd_write_error. To create the IO errors as expected, use null_blk and scsi_debug. Especially, use null_blk and its parameters badblocks and badblocks_once, which can control the block to cause the IO error. Introduce helper functions which confirms the parameters for bad blocks are available, and sets up the bad blocks. Using the helper functions, add four new test cases. The first two cases confirm that the fio recovers after the IO error with partial write. One test case covers psync IO engine. The other test case covers async IO with libaio engine with high queue depth and multiple jobs. The last two test cases confirm the case that another IO error happen again during the recovery process from the IO error. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
The newly added test cases in t/zbd/test-zbd-support 72 and 73 require error injection feature. They can be run with either null_blk or scsi_debug, which provides the error injection feature. To run the test cases easily with scsi_debug, add another script run-tests-against- scsi_debug. It simply prepares a zoned scsi_debug device and run the two test cases. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
Commit 4175f4d ("oslib: blkzoned: add blkzoned_move_zone_wp() helper function") introduced the new function for Linux, but did not add its stub function for OSes that lack the blkzoned feature. This caused build failures on MacOS and Windows. Add the missing stub to fix it. Fixes: 4175f4d ("oslib: blkzoned: add blkzoned_move_zone_wp() helper function") Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
This caused me some headache, let's add some details on how fio expects the http_host and file to be formatted. Signed-off-by: Avritt Rohwer [email protected]
* 'patch-1' of https://github.com/avrittrohwer/fio: Document expected filename format for s3 http engine.
This patch adds comprehensive ZBD support to io_uring engine, enabling it to work with zonemode=zbd for traditional zoned block devices like SMR (Shingled Magnetic Recording) devices. Changes include: - ZBD function implementations for zone management operations - Integration with existing blkzoned interfaces - Support for zone reporting, reset, finish, and write pointer operations in SMR/ZBD devices I've tested this on SMR drives with the io_uring engine and it works well. Sample fio job: [zbd-test] ioengine=io_uring direct=1 bs=128k zonemode=zbd filename=/dev/sdg Signed-off-by: Vishal Jose Mannanal <[email protected]>
There is no need for a 64-bit random generator for fdp_state. The FDP random generator is only used to randomly select from available placement IDs. For NVMe FDP devices the placement ID is a 16-bit value. Signed-off-by: Vincent Fu <[email protected]>
The trim_state random generator determines whether to trim an offset or not with the given probability specified by the trim_percentage option. There is no need for it to ever be 64 bits. So make it always 32 bits. Signed-off-by: Vincent Fu <[email protected]>
random_state is a random generator that generates the random offsets to use for I/O operations. Change its name to offset_state to make this more obvious. Signed-off-by: Vincent Fu <[email protected]>
Clarify that this is the state used for generating random offsets. The bitmap reference must be a holdover from old code. Signed-off-by: Vincent Fu <[email protected]>
If the combination of file size and minimum block size exceeds the limits of the default tausworthe random generator, switch for real to the tausworthe64 random generator. The original code changed the random_generator option to tausworthe64 but by the time the change was made the random generators had already been initialized. So the original code had no effect on the offsets generated. This patch re-initializes the relevant random generator after the switch to tausworthe64 so that this change can actually take effect. The random generators are initialized as Fio parses the command line and job files. Later on thread_main() calls init_random_map() which calls check_rand_gen_limits() which actually carries out the random generator limits check for each file. Fixes: #2036 Signed-off-by: Vincent Fu <[email protected]>
With `verify_only` option, no writes are actually issued. Update the man page and the HOWTO doc explaining that the writes in the fio output are phantom writes. Fixes: #1499 Signed-off-by: Gautam Menghani <[email protected]>
There are no problems when output is directed to the console, but when --output=somefile is set, child process log messages may not make it to the file since buffers are not flushed when the child process terminates. Make sure child process messages appear in the output by flushing the buffer before exiting. Also try to make sure the child process starts with an emtpy info log buffer. Example: tausworthe64 message does not appear without this patch ---------------------------------------------------------------- root@localhost:~# ./fio-canonical/fio --output=test --name=test --rw=randread --filesize=128T --number_ios=1 --ioengine=null root@localhost:~# cat test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1 fio-3.41-77-gadec Starting 1 process test: (groupid=0, jobs=1): err= 0: pid=206045: Thu Jan 15 15:46:43 2026 read: IOPS=1, BW=6860B/s (6860B/s)(4096B/597msec) ... Example: with this patch tausworthe64 message appears ----------------------------------------------------- root@localhost:~# ./fio --output=test --name=test --rw=randread --filesize=128T --number_ios=1 --ioengine=null root@localhost:~# cat test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1 fio-3.41-82-g7a1b-dirty Starting 1 process fio: file test.0.0 exceeds 32-bit tausworthe random generator. fio: Switching to tausworthe64. Use the random_generator= option to get rid of this warning. test: (groupid=0, jobs=1): err= 0: pid=206135: Thu Jan 15 15:53:14 2026 read: IOPS=1, BW=6781B/s (6781B/s)(4096B/604msec) ... Signed-off-by: Vincent Fu <[email protected]>
Move random seed debug prints immediately after they are set. This makes clearer that the subsequent call to td_fill_rand_seeds does not affect these random seeds. Signed-off-by: Vincent Fu <[email protected]>
Add a test script to confirm that we successfully switch to the tausworthe64 random generator when the combination of minimum block size and file size exceeds the limits of the default tausworthe32 random generator. Detect a succesful switch by seeing how many duplicate offsets are generated. Also add this script to our automated test harness. To run on Windows there will need to be some changes to the test runner. Signed-off-by: Vincent Fu <[email protected]>
* 'io_uring_zbc_support' of https://github.com/mannanal/fio: Add ZBD (Zoned Block Device) support to io_uring engine
Currently, the rbd engine can only attach to unencrypted images. This prevents users from benchmarking the performance impact of librbd's client-side encryption features. This patch adds two new options, 'rbd_encryption_format' and 'rbd_encryption_passphrase', allowing fio to perform encryption/decryption IO with librbd before starting I/O. Signed-off-by: David Mohren <[email protected]>
…er15/fio * 'rbd-encryption-support' of https://github.com/Greenpepper15/fio: engines/rbd: add support for LUKS encryption
Just various style issues, that GitHub very helpfully never highlights in their code view... Signed-off-by: Jens Axboe <[email protected]>
* 'issue_1499' of https://github.com/gautammenghani/fio: man: Update the description for `verify_only` explaining phantom writes
We should not flag iowait for waiting on events, if the kernel supports bypassing that. Signed-off-by: Jens Axboe <[email protected]>
Some storage systems may exhibit different behaviors depending on the application running on top, identified by its process comm string. It can be useful for fio jobs to present themselves with a specified comm string to trigger those behaviors in a testing setup. Signed-off-by: Robert Baldyga <[email protected]>
* 'job-process-comm' of https://github.com/robertbaldyga/fio: Add option to specify job process comm
When running in fork mode, the parent process was manually freeing
td->eo (ioengine options) immediately after forking the child.
However, the parent process continues to manage the thread_data structure
until the job completes, and fio_backend() eventually calls
fio_options_free(td) during the final cleanup.
This premature free in the parent process loop caused a double-free
error when fio_backend() later attempted to free the same options
via fio_options_free().
Remove the explicit free of td->eo in the run_threads() loop to let
the standard cleanup path in fio_backend() handle it safely.
This double-free issue was detected by AddressSanitizer:
==1276==ERROR: AddressSanitizer: attempting double-free on 0x7c254ebe0260 in thread T0:
#0 0x7fc5500e5beb in free.part.0 (/usr/lib64/libasan.so.8+0xe5beb)
#1 0x000000449d36 in fio_options_free /tmp/fio/options.c:6161
#2 0x00000046709b in fio_backend /tmp/fio/backend.c:2779
Signed-off-by: Yehor Malikov <[email protected]>
* 'fix-fork-memleak' of https://github.com/malikoyv/fio: backend: remove premature free of td->eo in parent process
Terminate the error message with a newline. Fixes: #2047 Signed-off-by: Tomas Winkler <[email protected]>
* 'sprandom-fix' of https://github.com/tomas-winkler-sndk/fio: fio: sprandom: append newline to error message
When starting a submit worker, start_worker() sets the worker state to
IDLE immediately after pthread_create():
sw->flags = SW_F_IDLE;
However, worker_thread() may run and set SW_F_RUNNING very early in its
own startup path:
pthread_mutex_lock(&sw->lock);
sw->flags |= SW_F_RUNNING;
if (ret)
sw->flags |= SW_F_ERROR;
pthread_mutex_unlock(&sw->lock);
If worker_thread() wins the race and sets SW_F_RUNNING before
start_worker() assigns SW_F_IDLE, the unconditional assignment in
start_worker() clobbers previously set bits and leaves the worker flags
as *only* SW_F_IDLE. As a result, workqueue_init() waits forever for all
workers to report SW_F_RUNNING:
thread_main()
-> rate_submit_init()
-> workqueue_init()
...
running = 0;
for (i = 0; i < wq->max_workers; i++) {
pthread_mutex_lock(&sw->lock);
if (sw->flags & SW_F_RUNNING)
running++;
if (sw->flags & SW_F_ERROR)
error++;
pthread_mutex_unlock(&sw->lock);
}
if (error || running == wq->max_workers)
break;
pthread_cond_wait(&wq->flush_cond, &wq->flush_lock);
Because SW_F_RUNNING was overwritten, `running` never reaches
`wq->max_workers`, and the init loop blocks on `flush_cond` indefinitely.
Signed-off-by: Chana Zaks <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
* 'offload_fix' of https://github.com/tomas-winkler-sndk/fio: workqueue: fix threads stall when running with io_submit_mode offload
For string parsing functions, check for a NULL input before passing to callbacks, or attempting to duplicate the string. If not, ill formed jobs that do: fdp_pli or verify_async_cpus without providing any actual data will crash the parser. Link: #2055 Signed-off-by: Jens Axboe <[email protected]>
This reverts commit 9387e61. Some callbacks do handle and expect the input to be NULL, so unfortunately we cannot rely on that to do a generic punt for a NULL input. Signed-off-by: Jens Axboe <[email protected]>
Various callback handlers weren't prepared for a NULL input. Ensure that they are, they should just fail if the input is required but none is given. Signed-off-by: Jens Axboe <[email protected]>
Define a new SPRandom variable to allow for an SSD cache size(spr_cs) to be defined. Preserve original SPRandom invalidation behavior if spr_cs is zero(default). If spr_cs is non-zero then region X invalidations are done after region X+1 writes are completed instead of before. This creates an additional region size of distance between the writes and associated invalidations. Return an error if the defined cache size is greater than the region size and tell the user the maximum number of regions supported with the defined cache size. Attempting to resolve #2043 Signed-off-by: Charles Henry <[email protected]>
…chyyyk/fio-spr-cache * 'sprandom-cache-implementation' of https://github.com/cachyyyk/fio-spr-cache: SPRandom Cache Size Behavior Implementation
Increment the server version since we now have a new sprandom cache size option. Signed-off-by: Vincent Fu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )