android_kernel_xiaomi_sm8450/kernel
Haifeng Xu a335ad77bd perf/core: Fix missing wakeup when waiting for context reference
[ Upstream commit 74751ef5c1912ebd3e65c3b65f45587e05ce5d36 ]

In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:

[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae

The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.

Thread A					Thread B
...						...
perf_event_free_task				perf_event_release_kernel
						   ...
						   acquire event->child_mutex
						   ...
						   get_ctx
   ...						   release event->child_mutex
   acquire ctx->mutex
   ...
   perf_free_event (acquire/release event->child_mutex)
   ...
   release ctx->mutex
   wait_var_event
						   acquire ctx->mutex
						   acquire event->child_mutex
						   # move existing events to free_list
						   release event->child_mutex
						   release ctx->mutex
						   put_ctx
...						...

In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.

Fixes: 1cf8dfe8a6 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05 09:12:44 +02:00
..
bpf fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
cgroup sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:32:15 +02:00
configs
debug kdb: Use format-specifiers rather than memset() for padding in kdb_read() 2024-06-16 13:32:35 +02:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 14:37:45 -08:00
entry entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set 2023-01-04 11:39:22 +01:00
events perf/core: Fix missing wakeup when waiting for context reference 2024-07-05 09:12:44 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:06:44 +01:00
gcov gcov: add support for GCC 14 2024-07-05 09:12:41 +02:00
irq genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline 2024-06-16 13:32:30 +02:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-27 08:43:57 +02:00
livepatch kallsyms: refactor {,module_}kallsyms_on_each_symbol 2024-06-21 14:52:58 +02:00
locking lockdep: Fix block chain corruption 2023-12-08 08:46:09 +01:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-13 12:58:13 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-13 12:58:54 +02:00
rcu rcutorture: Fix invalid context warning when enable srcu barrier testing 2024-07-05 09:12:33 +02:00
sched sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:32:15 +02:00
time tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-07-05 09:12:32 +02:00
trace tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test 2024-07-05 09:12:43 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-14 10:16:14 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:41:53 +01:00
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2024-06-21 14:53:39 +02:00
audit_tree.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:42:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:20:13 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:23:46 +02:00
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:23:45 +02:00
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c cpu: Re-enable CPU mitigations by default for !X86 architectures 2024-05-02 16:23:44 +02:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:44:30 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:23:21 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:40:18 +01:00
fork.c exec: Simplify unshare_files 2024-06-21 14:52:47 +02:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-07-05 09:12:44 +02:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c kallsyms: only build {,module_}kallsyms_on_each_symbol when required 2024-06-21 14:52:58 +02:00
kcmp.c kcmp: In get_file_raw_ptr use task_lookup_fd_rcu 2024-06-21 14:52:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't lose track of remote references during softirqs 2024-07-05 09:12:41 +02:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-27 08:43:40 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:45:37 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:47:33 +02:00
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Fix possible use-after-free issue on kprobe registration 2024-05-02 16:23:36 +02:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:10:29 +02:00
kthread.c exit: Implement kthread_exit 2024-06-21 14:53:28 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module-internal.h
module.c NFSD: Remove svc_serv_ops::svo_module 2024-06-21 14:53:37 +02:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: Disable BH when taking works lock on MT path 2024-07-05 09:12:33 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 12:59:40 +02:00
params.c params: lift param_set_uint_minmax to common code 2024-06-16 13:32:26 +02:00
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-07-05 09:12:33 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2024-06-21 14:53:17 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-21 15:16:05 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:20:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:54:58 +00:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:47:34 +02:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-11 16:40:04 +01:00
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:40:03 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:01:05 +02:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2024-03-01 13:16:46 +01:00
signal.c task_work: unconditionally run task_work from get_signal() 2023-01-04 11:39:23 +01:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:23:29 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-31 17:15:13 +02:00
sys.c fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
sysctl-test.c
sysctl.c sysctl: introduce new proc handler proc_dobool 2024-06-21 14:53:18 +02:00
task_work.c task_work: add helper for more targeted task_work canceling 2023-01-04 11:39:23 +01:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:23:17 +02:00
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c fanotify: configurable limits via sysfs 2024-06-21 14:53:06 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-03-17 08:45:13 +01:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-27 08:43:40 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:54:56 +00:00
workqueue_internal.h
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:46:13 +01:00