Kernel for Galaxy S24, rebased on CLO sources (WIP)
Go to file
Ziwei Dai c6cbb4e1c1 rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period
commit 5da7cb193db32da783a3f3e77d8b639989321d48 upstream.

Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode.  The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.

On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories.  Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed.  And
the kfree_rcu_monitor() function does in fact check for this.

Except that the kfree_rcu_monitor() function checks these pointers one
at a time.  This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately.  This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.

To see this, consider the following sequence of events, in which:

o	Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
	then is preempted.

o	CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
	after a later grace period.  Except that "from_cset" is freed
	right after the previous grace period ended, so that "from_cset"
	is immediately freed.  Task A resumes and references "from_cset"'s
	member, after which nothing good happens.

In full detail:

CPU 0					CPU 1
----------------------			----------------------
count_memcg_event_mm()
|rcu_read_lock()  <---
|mem_cgroup_from_task()
 |// css_set_ptr is the "from_cset" mentioned on CPU 1
 |css_set_ptr = rcu_dereference((task)->cgroups)
 |// Hard irq comes, current task is scheduled out.

					cgroup_attach_task()
					|cgroup_migrate()
					|cgroup_migrate_execute()
					|css_set_move_task(task, from_cset, to_cset, true)
					|cgroup_move_task(task, to_cset)
					|rcu_assign_pointer(.., to_cset)
					|...
					|cgroup_migrate_finish()
					|put_css_set_locked(from_cset)
					|from_cset->refcount return 0
					|kfree_rcu(cset, rcu_head) // free from_cset after new gp
					|add_ptr_to_bulk_krc_lock()
					|schedule_delayed_work(&krcp->monitor_work, ..)

					kfree_rcu_monitor()
					|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
					|queue_rcu_work(system_wq, &krwp->rcu_work)
					|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
					|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp

					// There is a perious call_rcu(.., rcu_work_rcufn)
					// gp end, rcu_work_rcufn() is called.
					rcu_work_rcufn()
					|__queue_work(.., rwork->wq, &rwork->work);

					|kfree_rcu_work()
					|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
					|The "from_cset" is freed before new gp end.

// the task resumes some time later.
 |css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.

This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.

v2: Use helper function instead of inserted code block at kfree_rcu_monitor().

Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d6204 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-21 16:01:02 +02:00
arch parisc: Delete redundant register definitions in <asm/assembly.h> 2023-06-21 16:01:02 +02:00
block blk-mq: fix blk_mq_hw_ctx active request accounting 2023-06-14 11:15:31 +02:00
certs certs: Fix build error when PKCS#11 URI contains semicolon 2023-02-09 11:28:11 +01:00
crypto KEYS: asymmetric: Copy sig and digest in public_key_verify_signature() 2023-06-09 10:34:28 +02:00
Documentation mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-06-14 11:15:29 +02:00
drivers octeon_ep: Add missing check for ioremap 2023-06-21 16:01:02 +02:00
fs afs: Fix vlserver probe RTT handling 2023-06-21 16:01:02 +02:00
include net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting 2023-06-21 16:01:01 +02:00
init gcc: disable '-Warray-bounds' for gcc-13 too 2023-04-26 14:28:43 +02:00
io_uring io_uring/net: save msghdr->msg_control for retries 2023-06-21 16:00:55 +02:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period 2023-06-21 16:01:02 +02:00
lib test_firmware: prevent race conditions by a correct implementation of locking 2023-06-21 16:00:51 +02:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm zswap: do not shrink if cgroup may not zswap 2023-06-21 16:00:54 +02:00
net net: tipc: resize nlattr array to correct size 2023-06-21 16:01:02 +02:00
rust rust: kernel: Mark rust_fmt_argument as extern "C" 2023-04-26 14:28:38 +02:00
samples samples/bpf: Fix fout leak in hbm's run_bpf_prog 2023-05-24 17:32:38 +01:00
scripts recordmcount: Fix memory leaks in the uwrite function 2023-05-24 17:32:41 +01:00
security selinux: don't use make's grouped targets feature yet 2023-06-09 10:34:24 +02:00
sound ALSA: hda/realtek: Add a quirk for Compaq N14JP6 2023-06-21 16:00:56 +02:00
tools selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET 2023-06-21 16:01:02 +02:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt KVM: Fix vcpu_array[0] races 2023-05-24 17:32:50 +01:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS platform/x86: Move existing HP drivers to a new hp subdir 2023-05-24 17:32:42 +01:00
Makefile Linux 6.1.34 2023-06-14 11:15:34 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.