android_kernel_samsung_sm8650

Go to file

Gabriel Krisman Bertazi 0f312961c7 sbitmap: Use single per-bitmap counting to wake up queued tags

[ Upstream commit 4f8126bb2308066b877859e4b5923ffb54143630 ]

sbitmap suffers from code complexity, as demonstrated by recent fixes,
and eventual lost wake ups on nested I/O completion.  The later happens,
from what I understand, due to the non-atomic nature of the updates to
wait_cnt, which needs to be subtracted and eventually reset when equal
to zero.  This two step process can eventually miss an update when a
nested completion happens to interrupt the CPU in between the wait_cnt
updates.  This is very hard to fix, as shown by the recent changes to
this code.

The code complexity arises mostly from the corner cases to avoid missed
wakes in this scenario.  In addition, the handling of wake_batch
recalculation plus the synchronization with sbq_queue_wake_up is
non-trivial.

This patchset implements the idea originally proposed by Jan [1], which
removes the need for the two-step updates of wait_cnt.  This is done by
tracking the number of completions and wakeups in always increasing,
per-bitmap counters.  Instead of having to reset the wait_cnt when it
reaches zero, we simply keep counting, and attempt to wake up N threads
in a single wait queue whenever there is enough space for a batch.
Waking up less than batch_wake shouldn't be a problem, because we
haven't changed the conditions for wake up, and the existing batch
calculation guarantees at least enough remaining completions to wake up
a batch for each queue at any time.

Performance-wise, one should expect very similar performance to the
original algorithm for the case where there is no queueing.  In both the
old algorithm and this implementation, the first thing is to check
ws_active, which bails out if there is no queueing to be managed. In the
new code, we took care to avoid accounting completions and wakeups when
there is no queueing, to not pay the cost of atomic operations
unnecessarily, since it doesn't skew the numbers.

For more interesting cases, where there is queueing, we need to take
into account the cross-communication of the atomic operations.  I've
been benchmarking by running parallel fio jobs against a single hctx
nullb in different hardware queue depth scenarios, and verifying both
IOPS and queueing.

Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
varying only the hardware queue length per test.

queue size 2                 4                 8                 16                 32                 64
6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)

The following is a similar experiment, ran against a nullb with a single
bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
parallel fio jobs operating on the same device

queue size 2 	             4                 8              	16             	    32		       64
6.1-rc2	   1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K) 	6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
patched	   1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K) 	6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)

It has also survived blktests and a 12h-stress run against nullb. I also
ran the code against nvme and a scsi SSD, and I didn't observe
performance regression in those. If there are other tests you think I
should run, please let me know and I will follow up with results.

[1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/

Cc: Hugh Dickins <hughd@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Liu Song <liusong@linux.alibaba.com>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: b5fcf7871acb ("sbitmap: correct wake_batch recalculation to avoid potential IO hung")
Signed-off-by: Sasha Levin <sashal@kernel.org>

2023-03-10 09:32:42 +01:00

arch

ARM: dts: imx7s: correct iomuxc gpr mux controller cells

2023-03-10 09:32:42 +01:00

block

block, bfq: fix uaf for bfqq in bic_set_bfqq()

2023-02-09 11:28:06 +01:00

certs

certs: Fix build error when PKCS#11 URI contains semicolon

2023-02-09 11:28:11 +01:00

crypto

use less confusing names for iov_iter direction initializers

2023-02-09 11:28:04 +01:00

Documentation

attr: use consistent sgid stripping checks

2023-03-03 11:52:25 +01:00

drivers

ublk_drv: don't probe partitions if the ubq daemon isn't trusted

2023-03-10 09:32:41 +01:00

attr: use consistent sgid stripping checks

2023-03-03 11:52:25 +01:00

include

sbitmap: Use single per-bitmap counting to wake up queued tags

2023-03-10 09:32:42 +01:00

init

gcc: disable -Warray-bounds for gcc-11 too

2023-01-14 10:33:43 +01:00

io_uring

use less confusing names for iov_iter direction initializers

2023-02-09 11:28:04 +01:00

ipc

ipc: fix memory leak in init_mqueue_fs()

2022-12-31 13:32:01 +01:00

kernel

locking/rwsem: Disable preemption in all down_read*() and up_read() code paths

2023-03-10 09:32:41 +01:00

lib

sbitmap: Use single per-bitmap counting to wake up queued tags

2023-03-10 09:32:42 +01:00

LICENSES

LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers

2021-12-16 14:33:10 +01:00

mm/gup: add folio to list when folio_isolate_lru() succeed

2023-02-22 12:59:54 +01:00

net

net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues().

2023-03-03 11:52:23 +01:00

rust

rust: print: avoid evaluating arguments in pr_* macros in unsafe blocks

2023-02-06 08:06:34 +01:00

samples

ftrace: Export ftrace_free_filter() to modules

2023-02-01 08:34:37 +01:00

scripts

scripts/tags.sh: fix incompatibility with PCRE2

2023-03-03 11:52:25 +01:00

security

randstruct: disable Clang 15 support

2023-02-25 11:25:43 +01:00

sound

ASoC: codecs: es8326: Fix DTS properties reading

2023-03-03 11:52:22 +01:00

tools

selftests: ocelot: tc_flower_chains: make test_vlan_ingress_modify() more comprehensive

2023-03-03 11:52:22 +01:00

usr

usr/gen_init_cpio.c: remove unnecessary -1 values from int file

2022-10-03 14:21:44 -07:00

virt

kvm/vfio: Fix potential deadlock on vfio group_lock

2023-02-01 08:34:36 +01:00

.clang-format

inet: ping: use hlist_nulls rcu iterator during lookup

2022-12-01 12:42:46 +01:00

.cocciconfig

scripts: add Linux .cocciconfig for coccinelle

2016-07-22 12:13:39 +02:00

.get_maintainer.ignore

get_maintainer: add Alan to .get_maintainer.ignore

2022-08-20 15:17:44 -07:00

.gitattributes

.gitattributes: use 'dts' diff driver for dts files

2019-12-04 19:44:11 -08:00

.gitignore

Kbuild: add Rust support

2022-09-28 09:02:20 +02:00

.mailmap

9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address

2022-12-10 17:10:52 -08:00

.rustfmt.toml

rust: add .rustfmt.toml

2022-09-28 09:02:20 +02:00

COPYING

COPYING: state that all contributions really are covered by this file

2020-02-10 13:32:20 -08:00

CREDITS

MAINTAINERS: Remove Michal Marek from Kbuild maintainers

2022-11-16 14:53:00 +09:00

Kbuild

Kbuild updates for v6.1

2022-10-10 12:00:45 -07:00

Kconfig

kbuild: ensure full rebuild when the compiler is updated

2020-05-12 13:28:33 +09:00

MAINTAINERS

audit: update the mailing list in MAINTAINERS

2023-02-25 11:25:42 +01:00

Makefile

Linux 6.1.15

2023-03-03 11:52:25 +01:00

README

Drop all 00-INDEX files from Documentation/

2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

Languages

C 97.8%

Assembly 1.1%

Shell 0.4%

Makefile 0.3%

Python 0.1%