Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Dec 10, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

Martin Steigerwald and others added 27 commits March 18, 2025 11:50
…roact-de/fio

* 'iouring-spellingfix-2025-03-18' of https://github.com/proact-de/fio:
  Fix spelling error in IO uring engine.
The image used by GitHub-hosted runners changed the default kvm device
permissions recently rendering us no longer able to start guest VMs. The
error message:

Could not access KVM kernel module: Permission denied
qemu-system-x86_64: failed to initialize kvm: Permission denied

Working run: https://github.com/fiotestbot/fio/actions/runs/14186873066
Failed run: https://github.com/fiotestbot/fio/actions/runs/14211189491

Explicitly give the GitHub Actions runner user permission to access the
/dev/kvm device following the guide at

https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/

Signed-off-by: Vincent Fu <[email protected]>
Fio has the ability to verify trim operations by running a verify
workload and setting the trim_percentage, trim_backlog, and
trim_verify_zero options. Some of the written blocks will then be
trimmed and then read back to see if they are zeroed out.

This patch changes fio_ro_check to allow trim operations when fio is
running a verify/trim workload.

Fixes: 196ccc4 ("fio.h: also check trim operations in fio_ro_check")
Signed-off-by: Vincent Fu <[email protected]>
The trim bit in td_ddir is not set when trim_percentage/backlog is
enabled yet fio still issues trim operations. Detect these cases and
produce output describing trim operations if we issued any.

This is similar to the fix (615c794)
committed for verify_backlog.

Signed-off-by: Vincent Fu <[email protected]>
Fio may issue trim commands for a verify/trim job. Abort and print an
error message if this type of job is run with the --readonly option.

Signed-off-by: Vincent Fu <[email protected]>
If we have drained the list of trim operations but its original contents
were fewer than a full batch we should zero out the running batch count
to make sure that we issue another full set of trim_backlog write
operations before considering trims again. Otherwise we will immediately
trim after each subsequent write operation until we have met the batch
size requirement.

Signed-off-by: Vincent Fu <[email protected]>
In order to detect when we are at the beginning of a trim phase we check
io_hist_len and should check that the previous operation was not a
*trim* (instead of not a read). Without this change trim_backlog_batch
will have no effect because after one batch is done, fio will simply
start a new batch because io_hist_len is still a multiple of
trim_backlog and the last operation in a batch was a trim which is not a
read.

For check_get_verify checking against read is appropriate but for
check_get_trim we must check against a trim.

Also we need to decrement the trim_batch count for the first trim
operation we send through.

Signed-off-by: Vincent Fu <[email protected]>
Fio can verify trim operations. This script adds some simple test cases
for this feature.

Signed-off-by: Vincent Fu <[email protected]>
On GitHub Actions we cannot insert kernel modules, so skip this script
on tests that run with pull requests and after every push. Instead run
this test with our nightly tests that run in a QEMU environment.

Signed-off-by: Vincent Fu <[email protected]>
Currently, when a write target zone has fewer remainder sectors than
the block size, fio finishes the zone to make the zone inactive (not
open), so that another zone can be open and used as a write target zone.
This zone finish operation is implemented in zbd_adjust_block().
However, this placement is less ideal because zbd_adjust_block() manages
not just write requests but also read and trim requests.

Since the zone finish operation is exclusively necessary for write
requests, implement it into zbd_convert_to_write_zone(). While at it,
improve the function comment.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
To prepare for the following fix, factor out a part of
zbd_convert_to_write_zone() to the new function zbd_pick_write_zone().
This function randomly chooses a zone in the array of write zones.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
When a random write target offset points to a zone that is not writable,
zbd_convert_to_write_zone() attempts to convert the write offset to a
different, writable zone. However, the conversion fails when all of the
following conditions are met:

1) the workload has the max_open_zones limit
2) every write target zones, up to the max_open_zones limit, has
   remainder sectors smaller than the block size
3) the next random write request targets a zone not in the write target
   zone list

In this case, zbd_convert_to_write_zone() can not open another zone
without exceeding the max_open_zones constraint. Therefore, It does not
convert the write to a different zone printing with the debug message
"did not choose another write zone". This leads to an unexpected stop of
the random write workload.

To prevent the unexpected write stop, finish one of the write target
zones with small remainder sectors. Check if all write target zones have
small remainder, and store the result in the new local boolean variable
all_write_zones_have_small_remainder. When this condition is true,
choose one of the write target zones and finish it. Then return the zone
from zbd_convert_to_write_zone() enabling the write process to continue.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
The previous commit fixed the unexpected write stop when all write
target zones have small remainder sectors to write. Add a test case to
confirm the fix.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
The Windows poll function does not clear revents field before it is
populated.  As a result, subsequent calls to poll using the same
pollfd reference return with revents set even when there is nothing
available to read.  This later results in a hang in recv().

Signed-off-by: James Rizzo <[email protected]>
* 'master' of https://github.com/blah325/fio:
  Fix hang on Windows when multiple --client args are present
As a preparation for continue_on_error option support for zonemode=zbd,
introduce a new function blkzoned_move_zone_wp(). It moves the write
pointer by data write. If data buffer is provided, call pwrite() system
call. If data buffer is not provided, call fallocate() to write zero
data.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
As a preparation for continue_on_error option support for zonemode=zbd,
introduce a new callback move_zone_wp() for the IO engines. It moves the
write pointer by writing data in the specified buffer. Also bump up
FIO_IOOPS_VERSION to note that the new callback is added.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
As a preparation for continue_on_error option support for zonemode=zbd,
implement move_zone_wp() callback for libzbc IO engine.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
As a preparation for continue_on_error option support for zonemode=zbd,
introduce the function zbd_move_zone_wp(). It moves write pointers by
calling blkzoned_move_zone_wp() or move_zone_wp() callback of IO
engines.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
When the continue_on_error options is specified, it is expected that the
workload continues to run when non-critical errors happen. However,
write workloads with zonemode=zbd option can not continue after errors,
if the failed writes cause partial data write on the target device. This
partial write creates write pointer gap between the device and fio, then
the next write requests by fio will fail due to unaligned write command
errors. This restriction results in undesirable test stops during long
runs for SMR drives which can recover defect sectors.

To allow the write workloads with zonemode=zbd to continue after write
failures with partial data writes, introduce the new option
recover_zbd_write_error. When this option is specified together with the
continue_on_error option, fio checks the write pointer positions of the
write target zones in the error handling step. Then fix the write
pointer by moving it to the position that the failed writes would have
moved. Bump up FIO_SERVER_VER to note that the new option is added.

For that purpose, add a new function zbd_recover_write_error(). Call it
from zbd_queue_io() for sync IO engines, and from io_completed() for
async IO engines. Modify zbd_queue_io() to pass the pointer to the
status so that zbd_recover_write_error() can modify the status to ignore
the errors. Add three fields to struct fio_zone_info. The two new fields
writes_in_flight and max_write_error_offset track status of in-flight
writes at the write error, so that the write pointer positions can be
fixed after the in-flight writes completed. The field fixing_zone_wp
stores that the write pointer fix is ongoing, then prohibit the new
writes get issued to the zone.

When the failed write is synchronous, the write pointer fix is done by
writing the left data for the failed write. This keeps the verify
patterns written to the device, then verify works together with the
continue_on_zbd_write_error option. When the failed write is
asynchronous, other in-flight writes fail together. In this case, fio
waits for all in-flight writes complete then fix the write pointer. Then
verify data of the failed writes are lost and verify does not work.
Check the continue_on_zbd_write_error option is not specified together
with the verify workload and asynchronous IO engine.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
As a preparation to add test cases which check that the
continue_on_error option and the recover_zbd_write_error option work
when bad blocks cause IO errors, set additional null_blk parameters
badblocks_once and badblocks_partial_io. These parameters were added to
Linux kernel version 6.15-rc1 and allows more realistic scenario of
write failures on zoned block devices. The former parameter makes the
specified badblocks recover after the first write, and the latter
parameter leaves partially written data on the device.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
When the continue_on_error option is specified, it is expected that
write workloads do not stop even when bad blocks cause IO errors and
leave partially written data. Add a test cases to confirm it with
zonemode=zbd and the new option recover_zbd_write_error.

To create the IO errors as expected, use null_blk and scsi_debug.
Especially, use null_blk and its parameters badblocks and
badblocks_once, which can control the block to cause the IO error.
Introduce helper functions which confirms the parameters for bad blocks
are available, and sets up the bad blocks.

Using the helper functions, add four new test cases. The first two cases
confirm that the fio recovers after the IO error with partial write.
One test case covers psync IO engine. The other test case covers async
IO with libaio engine with high queue depth and multiple jobs. The last
two test cases confirm the case that another IO error happen again
during the recovery process from the IO error.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
The newly added test cases in t/zbd/test-zbd-support 72 and 73 require
error injection feature. They can be run with either null_blk or
scsi_debug, which provides the error injection feature. To run the test
cases easily with scsi_debug, add another script run-tests-against-
scsi_debug. It simply prepares a zoned scsi_debug device and run the two
test cases.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Commit 4175f4d ("oslib: blkzoned: add blkzoned_move_zone_wp()
helper function") introduced the new function for Linux, but did not add
its stub function for OSes that lack the blkzoned feature. This caused
build failures on MacOS and Windows. Add the missing stub to fix it.

Fixes: 4175f4d ("oslib: blkzoned: add blkzoned_move_zone_wp() helper function")
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
This caused me some headache, let's add some details on how fio expects
the http_host and file to be formatted.

Signed-off-by: Avritt Rohwer [email protected]
* 'patch-1' of https://github.com/avrittrohwer/fio:
  Document expected filename format for s3 http engine.
Vishal Jose Mannanal and others added 30 commits January 12, 2026 14:43
This patch adds comprehensive ZBD support to io_uring engine, enabling
it to work with zonemode=zbd for traditional zoned block devices like
SMR (Shingled Magnetic Recording) devices.

Changes include:
- ZBD function implementations for zone management operations
- Integration with existing blkzoned interfaces
- Support for zone reporting, reset, finish, and write pointer
  operations in SMR/ZBD devices

I've tested this on SMR drives with the io_uring engine and it works well.

Sample fio job:

[zbd-test]
ioengine=io_uring
direct=1
bs=128k
zonemode=zbd
filename=/dev/sdg

Signed-off-by: Vishal Jose Mannanal <[email protected]>
There is no need for a 64-bit random generator for fdp_state. The FDP
random generator is only used to randomly select from available
placement IDs. For NVMe FDP devices the placement ID is a 16-bit value.

Signed-off-by: Vincent Fu <[email protected]>
The trim_state random generator determines whether to trim an offset or
not with the given probability specified by the trim_percentage option.
There is no need for it to ever be 64 bits. So make it always 32 bits.

Signed-off-by: Vincent Fu <[email protected]>
random_state is a random generator that generates the random offsets to
use for I/O operations. Change its name to offset_state to make this
more obvious.

Signed-off-by: Vincent Fu <[email protected]>
Clarify that this is the state used for generating random offsets. The
bitmap reference must be a holdover from old code.

Signed-off-by: Vincent Fu <[email protected]>
If the combination of file size and minimum block size exceeds the
limits of the default tausworthe random generator, switch for real to
the tausworthe64 random generator.

The original code changed the random_generator option to tausworthe64
but by the time the change was made the random generators had already
been initialized. So the original code had no effect on the offsets
generated. This patch re-initializes the relevant random generator after
the switch to tausworthe64 so that this change can actually take effect.

The random generators are initialized as Fio parses the command line and
job files. Later on thread_main() calls init_random_map() which calls
check_rand_gen_limits() which actually carries out the random generator
limits check for each file.

Fixes: #2036
Signed-off-by: Vincent Fu <[email protected]>
With `verify_only` option, no writes are actually issued. Update the man
page and the HOWTO doc explaining that the writes in the fio output are
phantom writes.

Fixes: #1499
Signed-off-by: Gautam Menghani <[email protected]>
There are no problems when output is directed to the console, but when
--output=somefile is set, child process log messages may not make it to
the file since buffers are not flushed when the child process
terminates. Make sure child process messages appear in the output by
flushing the buffer before exiting.

Also try to make sure the child process starts with an emtpy info log
buffer.

Example: tausworthe64 message does not appear without this patch
----------------------------------------------------------------
root@localhost:~# ./fio-canonical/fio --output=test --name=test --rw=randread --filesize=128T --number_ios=1 --ioengine=null
root@localhost:~# cat test
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1
fio-3.41-77-gadec
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=206045: Thu Jan 15 15:46:43 2026
  read: IOPS=1, BW=6860B/s (6860B/s)(4096B/597msec)
...

Example: with this patch tausworthe64 message appears
-----------------------------------------------------
root@localhost:~# ./fio --output=test --name=test --rw=randread --filesize=128T --number_ios=1 --ioengine=null
root@localhost:~# cat test
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1
fio-3.41-82-g7a1b-dirty
Starting 1 process
fio: file test.0.0 exceeds 32-bit tausworthe random generator.
fio: Switching to tausworthe64. Use the random_generator= option to get rid of this warning.

test: (groupid=0, jobs=1): err= 0: pid=206135: Thu Jan 15 15:53:14 2026
  read: IOPS=1, BW=6781B/s (6781B/s)(4096B/604msec)
...

Signed-off-by: Vincent Fu <[email protected]>
Move random seed debug prints immediately after they are set. This makes
clearer that the subsequent call to td_fill_rand_seeds does not affect
these random seeds.

Signed-off-by: Vincent Fu <[email protected]>
Add a test script to confirm that we successfully switch to the
tausworthe64 random generator when the combination of minimum block size
and file size exceeds the limits of the default tausworthe32 random
generator.

Detect a succesful switch by seeing how many duplicate offsets are
generated.

Also add this script to our automated test harness.

To run on Windows there will need to be some changes to the test runner.

Signed-off-by: Vincent Fu <[email protected]>
* 'io_uring_zbc_support' of https://github.com/mannanal/fio:
  Add ZBD (Zoned Block Device) support to io_uring engine
Currently, the rbd engine can only attach to unencrypted images.
This prevents users from benchmarking the performance impact of
librbd's client-side encryption features.

This patch adds two new options, 'rbd_encryption_format' and
'rbd_encryption_passphrase', allowing fio to perform
encryption/decryption IO with librbd before starting I/O.

Signed-off-by: David Mohren <[email protected]>
…er15/fio

* 'rbd-encryption-support' of https://github.com/Greenpepper15/fio:
  engines/rbd: add support for LUKS encryption
Just various style issues, that GitHub very helpfully never highlights
in their code view...

Signed-off-by: Jens Axboe <[email protected]>
* 'issue_1499' of https://github.com/gautammenghani/fio:
  man: Update the description for `verify_only` explaining phantom writes
We should not flag iowait for waiting on events, if the kernel
supports bypassing that.

Signed-off-by: Jens Axboe <[email protected]>
Some storage systems may exhibit different behaviors depending on the
application running on top, identified by its process comm string.
It can be useful for fio jobs to present themselves with a specified
comm string to trigger those behaviors in a testing setup.

Signed-off-by: Robert Baldyga <[email protected]>
When running in fork mode, the parent process was manually freeing
td->eo (ioengine options) immediately after forking the child.
However, the parent process continues to manage the thread_data structure
until the job completes, and fio_backend() eventually calls
fio_options_free(td) during the final cleanup.

This premature free in the parent process loop caused a double-free
error when fio_backend() later attempted to free the same options
via fio_options_free().

Remove the explicit free of td->eo in the run_threads() loop to let
the standard cleanup path in fio_backend() handle it safely.

This double-free issue was detected by AddressSanitizer:
==1276==ERROR: AddressSanitizer: attempting double-free on 0x7c254ebe0260 in thread T0:
    #0 0x7fc5500e5beb in free.part.0 (/usr/lib64/libasan.so.8+0xe5beb)
    #1 0x000000449d36 in fio_options_free /tmp/fio/options.c:6161
    #2 0x00000046709b in fio_backend /tmp/fio/backend.c:2779

Signed-off-by: Yehor Malikov <[email protected]>
* 'fix-fork-memleak' of https://github.com/malikoyv/fio:
  backend: remove premature free of td->eo in parent process
Terminate the error message with a newline.

Fixes: #2047
Signed-off-by: Tomas Winkler <[email protected]>
When starting a submit worker, start_worker() sets the worker state to
IDLE immediately after pthread_create():

sw->flags = SW_F_IDLE;

However, worker_thread() may run and set SW_F_RUNNING very early in its
own startup path:

	pthread_mutex_lock(&sw->lock);
        sw->flags |= SW_F_RUNNING;
        if (ret)
            sw->flags |= SW_F_ERROR;
        pthread_mutex_unlock(&sw->lock);

If worker_thread() wins the race and sets SW_F_RUNNING before
start_worker() assigns SW_F_IDLE, the unconditional assignment in
start_worker() clobbers previously set bits and leaves the worker flags
as *only* SW_F_IDLE. As a result, workqueue_init() waits forever for all
workers to report SW_F_RUNNING:

      thread_main()
        -> rate_submit_init()
          -> workqueue_init()
             ...
             running = 0;
             for (i = 0; i < wq->max_workers; i++) {
                 pthread_mutex_lock(&sw->lock);

                 if (sw->flags & SW_F_RUNNING)
                     running++;
                 if (sw->flags & SW_F_ERROR)
                     error++;
                 pthread_mutex_unlock(&sw->lock);
             }

             if (error || running == wq->max_workers)
                 break;

             pthread_cond_wait(&wq->flush_cond, &wq->flush_lock);

Because SW_F_RUNNING was overwritten, `running` never reaches
    `wq->max_workers`, and the init loop blocks on `flush_cond` indefinitely.

Signed-off-by: Chana Zaks <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
* 'offload_fix' of https://github.com/tomas-winkler-sndk/fio:
  workqueue: fix threads stall when running with io_submit_mode offload
For string parsing functions, check for a NULL input before passing
to callbacks, or attempting to duplicate the string. If not, ill formed
jobs that do:

fdp_pli

or

verify_async_cpus

without providing any actual data will crash the parser.

Link: #2055
Signed-off-by: Jens Axboe <[email protected]>
This reverts commit 9387e61.

Some callbacks do handle and expect the input to be NULL, so
unfortunately we cannot rely on that to do a generic punt for a NULL
input.

Signed-off-by: Jens Axboe <[email protected]>
Various callback handlers weren't prepared for a NULL input. Ensure
that they are, they should just fail if the input is required but
none is given.

Signed-off-by: Jens Axboe <[email protected]>
Define a new SPRandom variable to allow for an SSD cache size(spr_cs)
to be defined.
Preserve original SPRandom invalidation behavior if spr_cs is
zero(default).
If spr_cs is non-zero then region X invalidations are done after
region X+1 writes are completed instead of before. This creates an
additional region size of distance between the writes and associated
invalidations.
Return an error if the defined cache size is greater than the region
size and tell the user the maximum number of regions supported with
the defined cache size.

Attempting to resolve #2043

Signed-off-by:  Charles Henry <[email protected]>
Increment the server version since we now have a new sprandom cache size
option.

Signed-off-by: Vincent Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.