android_kernel_xiaomi_sm8450/kernel
Munehisa Kamata ec9c7aa088 sched/psi: Fix use-after-free in ep_remove_wait_queue()
commit c2dbe32d5db5c4ead121cf86dabd5ab691fb47fe upstream.

If a non-root cgroup gets removed when there is a thread that registered
trigger and is polling on a pressure file within the cgroup, the polling
waitqueue gets freed in the following path:

 do_rmdir
   cgroup_rmdir
     kernfs_drain_open_files
       cgroup_file_release
         cgroup_pressure_release
           psi_trigger_destroy

However, the polling thread still has a reference to the pressure file and
will access the freed waitqueue when the file is closed or upon exit:

 fput
   ep_eventpoll_release
     ep_free
       ep_remove_wait_queue
         remove_wait_queue

This results in use-after-free as pasted below.

The fundamental problem here is that cgroup_file_release() (and
consequently waitqueue's lifetime) is not tied to the file's real lifetime.
Using wake_up_pollfree() here might be less than ideal, but it is in line
with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()")
since the waitqueue's lifetime is not tied to file's one and can be
considered as another special case. While this would be fixable by somehow
making cgroup_file_release() be tied to the fput(), it would require
sizable refactoring at cgroups or higher layer which might be more
justifiable if we identify more cases like this.

  BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
  Write of size 4 at addr ffff88810e625328 by task a.out/4404

	CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
	Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
	Call Trace:
	<TASK>
	dump_stack_lvl+0x73/0xa0
	print_report+0x16c/0x4e0
	kasan_report+0xc3/0xf0
	kasan_check_range+0x2d2/0x310
	_raw_spin_lock_irqsave+0x60/0xc0
	remove_wait_queue+0x1a/0xa0
	ep_free+0x12c/0x170
	ep_eventpoll_release+0x26/0x30
	__fput+0x202/0x400
	task_work_run+0x11d/0x170
	do_exit+0x495/0x1130
	do_group_exit+0x100/0x100
	get_signal+0xd67/0xde0
	arch_do_signal_or_restart+0x2a/0x2b0
	exit_to_user_mode_prepare+0x94/0x100
	syscall_exit_to_user_mode+0x20/0x40
	do_syscall_64+0x52/0x90
	entry_SYSCALL_64_after_hwframe+0x63/0xcd
	</TASK>

 Allocated by task 4404:

	kasan_set_track+0x3d/0x60
	__kasan_kmalloc+0x85/0x90
	psi_trigger_create+0x113/0x3e0
	pressure_write+0x146/0x2e0
	cgroup_file_write+0x11c/0x250
	kernfs_fop_write_iter+0x186/0x220
	vfs_write+0x3d8/0x5c0
	ksys_write+0x90/0x110
	do_syscall_64+0x43/0x90
	entry_SYSCALL_64_after_hwframe+0x63/0xcd

 Freed by task 4407:

	kasan_set_track+0x3d/0x60
	kasan_save_free_info+0x27/0x40
	____kasan_slab_free+0x11d/0x170
	slab_free_freelist_hook+0x87/0x150
	__kmem_cache_free+0xcb/0x180
	psi_trigger_destroy+0x2e8/0x310
	cgroup_file_release+0x4f/0xb0
	kernfs_drain_open_files+0x165/0x1f0
	kernfs_drain+0x162/0x1a0
	__kernfs_remove+0x1fb/0x310
	kernfs_remove_by_name_ns+0x95/0xe0
	cgroup_addrm_files+0x67f/0x700
	cgroup_destroy_locked+0x283/0x3c0
	cgroup_rmdir+0x29/0x100
	kernfs_iop_rmdir+0xd1/0x140
	vfs_rmdir+0xfe/0x240
	do_rmdir+0x13d/0x280
	__x64_sys_rmdir+0x2c/0x30
	do_syscall_64+0x43/0x90
	entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 0e94682b73 ("psi: introduce psi monitor")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Mengchi Cheng <mengcc@amazon.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:55:56 +01:00
..
bpf bpf: Do not reject when the stack read size is different from the tracked scalar size 2023-02-15 17:22:21 +01:00
cgroup memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:31:57 +01:00
configs compiler: remove CONFIG_OPTIMIZE_INLINING entirely 2020-04-07 10:43:42 -07:00
debug lockdown: also lock down previous kgdb use 2022-05-30 09:33:22 +02:00
dma swiotlb: max mapping size takes min align mask into account 2022-10-05 10:38:40 +02:00
entry entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set 2023-01-04 11:39:22 +01:00
events perf/core: Call LSM hook after copying perf_event_attr 2023-01-14 10:16:33 +01:00
futex futex: Resend potentially swallowed owner death notification 2023-01-14 10:15:20 +01:00
gcov gcov: add support for checksum field 2023-01-14 10:16:24 +01:00
irq genirq: Add IRQF_NO_AUTOEN for request_irq/nmi() 2023-01-14 10:15:58 +01:00
kcsan panic: Consolidate open-coded panic_on_warn checks 2023-02-01 08:23:20 +01:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 13:25:14 +02:00
locking locking/lockdep: Fix lockdep_init_map_*() confusion 2022-08-21 15:15:33 +02:00
power PM: hibernate: Fix mistake in kerneldoc comment 2023-01-14 10:15:15 +01:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-08 14:40:08 +02:00
rcu rcu: Prevent lockdep-RCU splats on lock acquisition/release 2023-01-14 10:16:29 +01:00
sched sched/psi: Fix use-after-free in ep_remove_wait_queue() 2023-02-22 12:55:56 +01:00
time timekeeping: contribute wall clock to rng on time change 2022-08-21 15:16:20 +02:00
trace tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw 2023-02-15 17:22:22 +01:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-14 10:16:14 +01:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:01:00 +01:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-31 17:15:13 +02:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-09-03 10:09:31 +02:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:30:34 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c cpu/hotplug: Make target_store() a nop when target == state 2023-01-14 10:15:20 +01:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c crash_dump: Remove no longer used saved_max_pfn 2020-04-15 11:21:54 +02:00
cred.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:23:21 +01:00
extable.c kernel/extable.c: use address-of operator on section symbols 2020-04-07 10:43:42 -07:00
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-19 11:58:16 -08:00
fork.c io_uring: import 5.15-stable io_uring 2023-01-04 11:39:23 +01:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: make some symbols static 2020-08-12 10:58:02 -07:00
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:28:37 +01:00
kexec_elf.c
kexec_file.c ima: force signature verification when CONFIG_KEXEC_SIG is configured 2022-07-21 21:20:11 +02:00
kexec_internal.h
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case 2022-11-25 17:45:55 +01:00
ksysfs.c
kthread.c kthread: Fix PF_KTHREAD vs to_kthread() race 2021-09-03 10:09:31 +02:00
latencytop.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
Makefile futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-02-01 08:23:22 +01:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: Fix list iterator in padata_do_serial() 2023-01-14 10:15:51 +01:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:23:21 +01:00
params.c params: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-18 13:40:36 +02:00
pid.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-21 15:16:05 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:20:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-14 11:26:03 -08:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relay: fix type mismatch when allocating memory in relay_create_buf() 2023-01-14 10:15:22 +01:00
resource.c kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources 2021-05-19 10:13:09 +02:00
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:40:03 +02:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-06-09 10:21:01 +02:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Fix setting loaded filter count during TSYNC 2021-08-18 08:59:06 +02:00
signal.c task_work: unconditionally run task_work from get_signal() 2023-01-04 11:39:23 +01:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:23:29 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-31 17:15:13 +02:00
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:19:58 +01:00
sysctl-test.c
sysctl.c kernel/panic: move panic sysctls to its own file 2023-02-01 08:23:18 +01:00
task_work.c task_work: add helper for more targeted task_work canceling 2023-01-04 11:39:23 +01:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Dump ftrace at shutdown only if requested 2020-06-29 12:01:45 -07:00
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
utsname.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
watch_queue.c watch_queue: Fix missing locking in add_watch_to_object() 2022-08-03 12:00:44 +02:00
watchdog_hld.c
watchdog.c watchdog: export lockup_detector_reconfigure 2022-08-25 11:38:20 +02:00
workqueue_internal.h
workqueue.c workqueue: don't skip lockdep work dependency in cancel_work_sync() 2022-09-28 11:10:40 +02:00