RCU pull request for v5.20 (or whatever)
This pull request contains the following branches: doc.2022.06.21a: Documentation updates. fixes.2022.07.19a: Miscellaneous fixes. nocb.2022.07.19a: Callback-offload updates, perhaps most notably a new RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be offloaded at boot time, regardless of kernel boot parameters. This is useful to battery-powered systems such as ChromeOS and Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot parameter prevents offloaded callbacks from interfering with real-time workloads and with energy-efficiency mechanisms. poll.2022.07.21a: Polled grace-period updates, perhaps most notably making these APIs account for both normal and expedited grace periods. rcu-tasks.2022.06.21a: Tasks RCU updates, perhaps most notably reducing the CPU overhead of RCU tasks trace grace periods by more than a factor of two on a system with 15,000 tasks. The reduction is expected to increase with the number of tasks, so it seems reasonable to hypothesize that a system with 150,000 tasks might see a 20-fold reduction in CPU overhead. torture.2022.06.21a: Torture-test updates. ctxt.2022.07.05a: Updates that merge RCU's dyntick-idle tracking into context tracking, thus reducing the overhead of transitioning to kernel mode from either idle or nohz_full userspace execution for kernels that track context independently of RCU. This is expected to be helpful primarily for kernels built with CONFIG_NO_HZ_FULL=y. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLgMcgTHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jArXD/0fjbCwqpRjHVTzjMY8jN4zDkqZZD6m g8Fx27hZ4ToNFwRptyHwNezrNj14skjAJEXfdjaVw32W62ivXvf0HINvSzsTLCSq k2kWyBdXLc9CwY5p5W4smnpn5VoAScjg5PoPL59INoZ/Zziji323C7Zepl/1DYJt 0T6bPCQjo1ZQoDUCyVpSjDmAqxnderWG0MeJVt74GkLqmnYLANg0GH8c7mH4+9LL kVGlLp5nlPgNJ4FEoFdMwNU8T/ETmaVld/m2dkiawjkXjJzB2XKtBigU91DDmXz5 7DIdV4ABrxiy4kGNqtIe/jFgnKyVD7xiDpyfjd6KTeDr/rDS8u2ZH7+1iHsyz3g0 Np/tS3vcd0KR+gI/d0eXxPbgm5sKlCmKw/nU2eArpW/+4LmVXBUfHTG9Jg+LJmBc JrUh6aEdIZJZHgv/nOQBNig7GJW43IG50rjuJxAuzcxiZNEG5lUSS23ysaA9CPCL PxRWKSxIEfK3kdmvVO5IIbKTQmIBGWlcWMTcYictFSVfBgcCXpPAksGvqA5JiUkc egW+xLFo/7K+E158vSKsVqlWZcEeUbsNJ88QOlpqnRgH++I2Yv/LhK41XfJfpH+Y ALxVaDd+mAq6v+qSHNVq9wT3ozXIPy/zK1hDlMIqx40h2YvaEsH4je+521oSoN9r vX60+QNxvUBLwA== =vUNm -----END PGP SIGNATURE----- Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes - Callback-offload updates, perhaps most notably a new RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be offloaded at boot time, regardless of kernel boot parameters. This is useful to battery-powered systems such as ChromeOS and Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot parameter prevents offloaded callbacks from interfering with real-time workloads and with energy-efficiency mechanisms - Polled grace-period updates, perhaps most notably making these APIs account for both normal and expedited grace periods - Tasks RCU updates, perhaps most notably reducing the CPU overhead of RCU tasks trace grace periods by more than a factor of two on a system with 15,000 tasks. The reduction is expected to increase with the number of tasks, so it seems reasonable to hypothesize that a system with 150,000 tasks might see a 20-fold reduction in CPU overhead - Torture-test updates - Updates that merge RCU's dyntick-idle tracking into context tracking, thus reducing the overhead of transitioning to kernel mode from either idle or nohz_full userspace execution for kernels that track context independently of RCU. This is expected to be helpful primarily for kernels built with CONFIG_NO_HZ_FULL=y * tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits) rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings rcu: Diagnose extended sync_rcu_do_polled_gp() loops rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings rcutorture: Test polled expedited grace-period primitives rcu: Add polled expedited grace-period primitives rcutorture: Verify that polled GP API sees synchronous grace periods rcu: Make Tiny RCU grace periods visible to polled APIs rcu: Make polled grace-period API account for expedited grace periods rcu: Switch polled grace-period APIs to ->gp_seq_polled rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty rcu/nocb: Add option to opt rcuo kthreads out of RT priority rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread() rcu/nocb: Add an option to offload all CPUs on boot rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order rcu/nocb: Add/del rdp to iterate from rcuog itself rcu/tree: Add comment to describe GP-done condition in fqs loop rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs() rcu/kvfree: Remove useless monitor_todo flag rcu: Cleanup RCU urgency state for offline CPU ...
This commit is contained in:
commit
7d9d077c78
@ -1844,10 +1844,10 @@ that meets this requirement.
|
||||
|
||||
Furthermore, NMI handlers can be interrupted by what appear to RCU to be
|
||||
normal interrupts. One way that this can happen is for code that
|
||||
directly invokes rcu_irq_enter() and rcu_irq_exit() to be called
|
||||
directly invokes ct_irq_enter() and ct_irq_exit() to be called
|
||||
from an NMI handler. This astonishing fact of life prompted the current
|
||||
code structure, which has rcu_irq_enter() invoking
|
||||
rcu_nmi_enter() and rcu_irq_exit() invoking rcu_nmi_exit().
|
||||
code structure, which has ct_irq_enter() invoking
|
||||
ct_nmi_enter() and ct_irq_exit() invoking ct_nmi_exit().
|
||||
And yes, I also learned of this requirement the hard way.
|
||||
|
||||
Loadable Modules
|
||||
@ -2195,7 +2195,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be:
|
||||
sections, and RCU believes this CPU to be idle, no problem. This
|
||||
sort of thing is used by some architectures for light-weight
|
||||
exception handlers, which can then avoid the overhead of
|
||||
rcu_irq_enter() and rcu_irq_exit() at exception entry and
|
||||
ct_irq_enter() and ct_irq_exit() at exception entry and
|
||||
exit, respectively. Some go further and avoid the entireties of
|
||||
irq_enter() and irq_exit().
|
||||
Just make very sure you are running some of your tests with
|
||||
@ -2226,7 +2226,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be:
|
||||
+-----------------------------------------------------------------------+
|
||||
| **Answer**: |
|
||||
+-----------------------------------------------------------------------+
|
||||
| One approach is to do ``rcu_irq_exit();rcu_irq_enter();`` every so |
|
||||
| One approach is to do ``ct_irq_exit();ct_irq_enter();`` every so |
|
||||
| often. But given that long-running interrupt handlers can cause other |
|
||||
| problems, not least for response time, shouldn't you work to keep |
|
||||
| your interrupt handler's runtime within reasonable bounds? |
|
||||
|
@ -97,12 +97,12 @@ warnings:
|
||||
which will include additional debugging information.
|
||||
|
||||
- A low-level kernel issue that either fails to invoke one of the
|
||||
variants of rcu_user_enter(), rcu_user_exit(), rcu_idle_enter(),
|
||||
rcu_idle_exit(), rcu_irq_enter(), or rcu_irq_exit() on the one
|
||||
variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(),
|
||||
ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one
|
||||
hand, or that invokes one of them too many times on the other.
|
||||
Historically, the most frequent issue has been an omission
|
||||
of either irq_enter() or irq_exit(), which in turn invoke
|
||||
rcu_irq_enter() or rcu_irq_exit(), respectively. Building your
|
||||
ct_irq_enter() or ct_irq_exit(), respectively. Building your
|
||||
kernel with CONFIG_RCU_EQS_DEBUG=y can help track down these types
|
||||
of issues, which sometimes arise in architecture-specific code.
|
||||
|
||||
|
@ -3667,6 +3667,9 @@
|
||||
just as if they had also been called out in the
|
||||
rcu_nocbs= boot parameter.
|
||||
|
||||
Note that this argument takes precedence over
|
||||
the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
|
||||
|
||||
noiotrap [SH] Disables trapped I/O port accesses.
|
||||
|
||||
noirqdebug [X86-32] Disables the code which attempts to detect and
|
||||
@ -4560,6 +4563,9 @@
|
||||
no-callback mode from boot but the mode may be
|
||||
toggled at runtime via cpusets.
|
||||
|
||||
Note that this argument takes precedence over
|
||||
the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
|
||||
|
||||
rcu_nocb_poll [KNL]
|
||||
Rather than requiring that offloaded CPUs
|
||||
(specified by rcu_nocbs= above) explicitly
|
||||
@ -4669,6 +4675,34 @@
|
||||
When RCU_NOCB_CPU is set, also adjust the
|
||||
priority of NOCB callback kthreads.
|
||||
|
||||
rcutree.rcu_divisor= [KNL]
|
||||
Set the shift-right count to use to compute
|
||||
the callback-invocation batch limit bl from
|
||||
the number of callbacks queued on this CPU.
|
||||
The result will be bounded below by the value of
|
||||
the rcutree.blimit kernel parameter. Every bl
|
||||
callbacks, the softirq handler will exit in
|
||||
order to allow the CPU to do other work.
|
||||
|
||||
Please note that this callback-invocation batch
|
||||
limit applies only to non-offloaded callback
|
||||
invocation. Offloaded callbacks are instead
|
||||
invoked in the context of an rcuoc kthread, which
|
||||
scheduler will preempt as it does any other task.
|
||||
|
||||
rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
|
||||
On callback-offloaded (rcu_nocbs) CPUs,
|
||||
RCU reduces the lock contention that would
|
||||
otherwise be caused by callback floods through
|
||||
use of the ->nocb_bypass list. However, in the
|
||||
common non-flooded case, RCU queues directly to
|
||||
the main ->cblist in order to avoid the extra
|
||||
overhead of the ->nocb_bypass list and its lock.
|
||||
But if there are too many callbacks queued during
|
||||
a single jiffy, RCU pre-queues the callbacks into
|
||||
the ->nocb_bypass queue. The definition of "too
|
||||
many" is supplied by this kernel boot parameter.
|
||||
|
||||
rcutree.rcu_nocb_gp_stride= [KNL]
|
||||
Set the number of NOCB callback kthreads in
|
||||
each group, which defaults to the square root
|
||||
|
@ -1,7 +1,7 @@
|
||||
#
|
||||
# Feature name: context-tracking
|
||||
# Kconfig: HAVE_CONTEXT_TRACKING
|
||||
# description: arch supports context tracking for NO_HZ_FULL
|
||||
# Feature name: user-context-tracking
|
||||
# Kconfig: HAVE_CONTEXT_TRACKING_USER
|
||||
# description: arch supports user context tracking for NO_HZ_FULL
|
||||
#
|
||||
-----------------------
|
||||
| arch |status|
|
||||
|
@ -5165,6 +5165,7 @@ F: include/linux/console*
|
||||
|
||||
CONTEXT TRACKING
|
||||
M: Frederic Weisbecker <frederic@kernel.org>
|
||||
M: "Paul E. McKenney" <paulmck@kernel.org>
|
||||
S: Maintained
|
||||
F: kernel/context_tracking.c
|
||||
F: include/linux/context_tracking*
|
||||
|
@ -784,7 +784,7 @@ config HAVE_ARCH_WITHIN_STACK_FRAMES
|
||||
and similar) by implementing an inline arch_within_stack_frames(),
|
||||
which is used by CONFIG_HARDENED_USERCOPY.
|
||||
|
||||
config HAVE_CONTEXT_TRACKING
|
||||
config HAVE_CONTEXT_TRACKING_USER
|
||||
bool
|
||||
help
|
||||
Provide kernel/user boundaries probes necessary for subsystems
|
||||
@ -792,10 +792,10 @@ config HAVE_CONTEXT_TRACKING
|
||||
Syscalls need to be wrapped inside user_exit()-user_enter(), either
|
||||
optimized behind static key or through the slow path using TIF_NOHZ
|
||||
flag. Exceptions handlers must be wrapped as well. Irqs are already
|
||||
protected inside rcu_irq_enter/rcu_irq_exit() but preemption or signal
|
||||
protected inside ct_irq_enter/ct_irq_exit() but preemption or signal
|
||||
handling on irq exit still need to be protected.
|
||||
|
||||
config HAVE_CONTEXT_TRACKING_OFFSTACK
|
||||
config HAVE_CONTEXT_TRACKING_USER_OFFSTACK
|
||||
bool
|
||||
help
|
||||
Architecture neither relies on exception_enter()/exception_exit()
|
||||
@ -807,7 +807,7 @@ config HAVE_CONTEXT_TRACKING_OFFSTACK
|
||||
|
||||
- Critical entry code isn't preemptible (or better yet:
|
||||
not interruptible).
|
||||
- No use of RCU read side critical sections, unless rcu_nmi_enter()
|
||||
- No use of RCU read side critical sections, unless ct_nmi_enter()
|
||||
got called.
|
||||
- No use of instrumentation, unless instrumentation_begin() got
|
||||
called.
|
||||
|
@ -84,7 +84,7 @@ config ARM
|
||||
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARM_LPAE
|
||||
select HAVE_ARM_SMCCC if CPU_V7
|
||||
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_C_RECORDMCOUNT
|
||||
select HAVE_BUILDTIME_MCOUNT_SORT
|
||||
select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
|
||||
|
@ -28,7 +28,7 @@
|
||||
#include "entry-header.S"
|
||||
|
||||
saved_psr .req r8
|
||||
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING)
|
||||
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING_USER)
|
||||
saved_pc .req r9
|
||||
#define TRACE(x...) x
|
||||
#else
|
||||
@ -38,7 +38,7 @@ saved_pc .req lr
|
||||
|
||||
.section .entry.text,"ax",%progbits
|
||||
.align 5
|
||||
#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING) || \
|
||||
#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING_USER) || \
|
||||
IS_ENABLED(CONFIG_DEBUG_RSEQ))
|
||||
/*
|
||||
* This is the fast syscall return path. We do as little as possible here,
|
||||
|
@ -366,25 +366,25 @@ ALT_UP_B(.L1_\@)
|
||||
* between user and kernel mode.
|
||||
*/
|
||||
.macro ct_user_exit, save = 1
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
.if \save
|
||||
stmdb sp!, {r0-r3, ip, lr}
|
||||
bl context_tracking_user_exit
|
||||
bl user_exit_callable
|
||||
ldmia sp!, {r0-r3, ip, lr}
|
||||
.else
|
||||
bl context_tracking_user_exit
|
||||
bl user_exit_callable
|
||||
.endif
|
||||
#endif
|
||||
.endm
|
||||
|
||||
.macro ct_user_enter, save = 1
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
.if \save
|
||||
stmdb sp!, {r0-r3, ip, lr}
|
||||
bl context_tracking_user_enter
|
||||
bl user_enter_callable
|
||||
ldmia sp!, {r0-r3, ip, lr}
|
||||
.else
|
||||
bl context_tracking_user_enter
|
||||
bl user_enter_callable
|
||||
.endif
|
||||
#endif
|
||||
.endm
|
||||
|
@ -3,6 +3,7 @@
|
||||
* Copyright (C) 2012 Freescale Semiconductor, Inc.
|
||||
*/
|
||||
|
||||
#include <linux/context_tracking.h>
|
||||
#include <linux/cpuidle.h>
|
||||
#include <linux/module.h>
|
||||
#include <asm/cpuidle.h>
|
||||
@ -24,9 +25,9 @@ static int imx6q_enter_wait(struct cpuidle_device *dev,
|
||||
imx6_set_lpm(WAIT_UNCLOCKED);
|
||||
raw_spin_unlock(&cpuidle_lock);
|
||||
|
||||
rcu_idle_enter();
|
||||
ct_idle_enter();
|
||||
cpu_do_idle();
|
||||
rcu_idle_exit();
|
||||
ct_idle_exit();
|
||||
|
||||
raw_spin_lock(&cpuidle_lock);
|
||||
if (num_idle_cpus-- == num_online_cpus())
|
||||
|
@ -176,7 +176,7 @@ config ARM64
|
||||
select HAVE_C_RECORDMCOUNT
|
||||
select HAVE_CMPXCHG_DOUBLE
|
||||
select HAVE_CMPXCHG_LOCAL
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
select HAVE_DMA_CONTIGUOUS
|
||||
select HAVE_DYNAMIC_FTRACE
|
||||
|
@ -41,7 +41,7 @@ static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs)
|
||||
|
||||
if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
|
||||
lockdep_hardirqs_off(CALLER_ADDR0);
|
||||
rcu_irq_enter();
|
||||
ct_irq_enter();
|
||||
trace_hardirqs_off_finish();
|
||||
|
||||
regs->exit_rcu = true;
|
||||
@ -76,7 +76,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
|
||||
if (regs->exit_rcu) {
|
||||
trace_hardirqs_on_prepare();
|
||||
lockdep_hardirqs_on_prepare();
|
||||
rcu_irq_exit();
|
||||
ct_irq_exit();
|
||||
lockdep_hardirqs_on(CALLER_ADDR0);
|
||||
return;
|
||||
}
|
||||
@ -84,7 +84,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
|
||||
trace_hardirqs_on();
|
||||
} else {
|
||||
if (regs->exit_rcu)
|
||||
rcu_irq_exit();
|
||||
ct_irq_exit();
|
||||
}
|
||||
}
|
||||
|
||||
@ -161,7 +161,7 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
|
||||
__nmi_enter();
|
||||
lockdep_hardirqs_off(CALLER_ADDR0);
|
||||
lockdep_hardirq_enter();
|
||||
rcu_nmi_enter();
|
||||
ct_nmi_enter();
|
||||
|
||||
trace_hardirqs_off_finish();
|
||||
ftrace_nmi_enter();
|
||||
@ -182,7 +182,7 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs)
|
||||
lockdep_hardirqs_on_prepare();
|
||||
}
|
||||
|
||||
rcu_nmi_exit();
|
||||
ct_nmi_exit();
|
||||
lockdep_hardirq_exit();
|
||||
if (restore)
|
||||
lockdep_hardirqs_on(CALLER_ADDR0);
|
||||
@ -199,7 +199,7 @@ static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
|
||||
regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
|
||||
|
||||
lockdep_hardirqs_off(CALLER_ADDR0);
|
||||
rcu_nmi_enter();
|
||||
ct_nmi_enter();
|
||||
|
||||
trace_hardirqs_off_finish();
|
||||
}
|
||||
@ -218,7 +218,7 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
|
||||
lockdep_hardirqs_on_prepare();
|
||||
}
|
||||
|
||||
rcu_nmi_exit();
|
||||
ct_nmi_exit();
|
||||
if (restore)
|
||||
lockdep_hardirqs_on(CALLER_ADDR0);
|
||||
}
|
||||
|
@ -42,7 +42,7 @@ config CSKY
|
||||
select HAVE_ARCH_AUDITSYSCALL
|
||||
select HAVE_ARCH_MMAP_RND_BITS
|
||||
select HAVE_ARCH_SECCOMP_FILTER
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_VIRT_CPU_ACCOUNTING_GEN
|
||||
select HAVE_DEBUG_BUGVERBOSE
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
|
@ -19,11 +19,11 @@
|
||||
.endm
|
||||
|
||||
.macro context_tracking
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
mfcr a0, epsr
|
||||
btsti a0, 31
|
||||
bt 1f
|
||||
jbsr context_tracking_user_exit
|
||||
jbsr user_exit_callable
|
||||
ldw a0, (sp, LSAVE_A0)
|
||||
ldw a1, (sp, LSAVE_A1)
|
||||
ldw a2, (sp, LSAVE_A2)
|
||||
@ -159,8 +159,8 @@ ret_from_exception:
|
||||
and r10, r9
|
||||
cmpnei r10, 0
|
||||
bt exit_work
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
jbsr context_tracking_user_enter
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
jbsr user_enter_callable
|
||||
#endif
|
||||
1:
|
||||
#ifdef CONFIG_PREEMPTION
|
||||
|
@ -75,7 +75,7 @@ config LOONGARCH
|
||||
select HAVE_ARCH_TRACEHOOK
|
||||
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
|
||||
select HAVE_ASM_MODVERSIONS
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_DEBUG_STACKOVERFLOW
|
||||
select HAVE_DMA_CONTIGUOUS
|
||||
select HAVE_EXIT_THREAD
|
||||
|
@ -56,7 +56,7 @@ config MIPS
|
||||
select HAVE_ARCH_TRACEHOOK
|
||||
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES
|
||||
select HAVE_ASM_MODVERSIONS
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_TIF_NOHZ
|
||||
select HAVE_C_RECORDMCOUNT
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
|
@ -202,7 +202,7 @@ config PPC
|
||||
select HAVE_ARCH_SECCOMP_FILTER
|
||||
select HAVE_ARCH_TRACEHOOK
|
||||
select HAVE_ASM_MODVERSIONS
|
||||
select HAVE_CONTEXT_TRACKING if PPC64
|
||||
select HAVE_CONTEXT_TRACKING_USER if PPC64
|
||||
select HAVE_C_RECORDMCOUNT
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
select HAVE_DEBUG_STACKOVERFLOW
|
||||
|
@ -2,7 +2,7 @@
|
||||
#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
|
||||
#define _ASM_POWERPC_CONTEXT_TRACKING_H
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
#define SCHEDULE_USER bl schedule_user
|
||||
#else
|
||||
#define SCHEDULE_USER bl schedule
|
||||
|
@ -86,7 +86,7 @@ config RISCV
|
||||
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
|
||||
select HAVE_ARCH_VMAP_STACK if MMU && 64BIT
|
||||
select HAVE_ASM_MODVERSIONS
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
select HAVE_DMA_CONTIGUOUS if MMU
|
||||
select HAVE_EBPF_JIT if MMU
|
||||
|
@ -111,12 +111,12 @@ _save_context:
|
||||
call __trace_hardirqs_off
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
/* If previous state is in user mode, call context_tracking_user_exit. */
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
/* If previous state is in user mode, call user_exit_callable(). */
|
||||
li a0, SR_PP
|
||||
and a0, s1, a0
|
||||
bnez a0, skip_context_tracking
|
||||
call context_tracking_user_exit
|
||||
call user_exit_callable
|
||||
skip_context_tracking:
|
||||
#endif
|
||||
|
||||
@ -176,7 +176,7 @@ handle_syscall:
|
||||
*/
|
||||
csrs CSR_STATUS, SR_IE
|
||||
#endif
|
||||
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING)
|
||||
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING_USER)
|
||||
/* Recover a0 - a7 for system calls */
|
||||
REG_L a0, PT_A0(sp)
|
||||
REG_L a1, PT_A1(sp)
|
||||
@ -269,8 +269,8 @@ resume_userspace:
|
||||
andi s1, s0, _TIF_WORK_MASK
|
||||
bnez s1, work_pending
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
call context_tracking_user_enter
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
call user_enter_callable
|
||||
#endif
|
||||
|
||||
/* Save unwound kernel stack pointer in thread_info */
|
||||
|
@ -73,7 +73,7 @@ config SPARC64
|
||||
select HAVE_DYNAMIC_FTRACE
|
||||
select HAVE_FTRACE_MCOUNT_RECORD
|
||||
select HAVE_SYSCALL_TRACEPOINTS
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_TIF_NOHZ
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
select IOMMU_HELPER
|
||||
|
@ -15,7 +15,7 @@
|
||||
#include <asm/visasm.h>
|
||||
#include <asm/processor.h>
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
# define SCHEDULE_USER schedule_user
|
||||
#else
|
||||
# define SCHEDULE_USER schedule
|
||||
|
@ -186,8 +186,8 @@ config X86
|
||||
select HAVE_ASM_MODVERSIONS
|
||||
select HAVE_CMPXCHG_DOUBLE
|
||||
select HAVE_CMPXCHG_LOCAL
|
||||
select HAVE_CONTEXT_TRACKING if X86_64
|
||||
select HAVE_CONTEXT_TRACKING_OFFSTACK if HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER if X86_64
|
||||
select HAVE_CONTEXT_TRACKING_USER_OFFSTACK if HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_C_RECORDMCOUNT
|
||||
select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL
|
||||
select HAVE_BUILDTIME_MCOUNT_SORT
|
||||
|
@ -1526,7 +1526,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
|
||||
|
||||
/*
|
||||
* Entry handling for valid #PF from kernel mode is slightly
|
||||
* different: RCU is already watching and rcu_irq_enter() must not
|
||||
* different: RCU is already watching and ct_irq_enter() must not
|
||||
* be invoked because a kernel fault on a user space address might
|
||||
* sleep.
|
||||
*
|
||||
|
@ -33,7 +33,7 @@ config XTENSA
|
||||
select HAVE_ARCH_KCSAN
|
||||
select HAVE_ARCH_SECCOMP_FILTER
|
||||
select HAVE_ARCH_TRACEHOOK
|
||||
select HAVE_CONTEXT_TRACKING
|
||||
select HAVE_CONTEXT_TRACKING_USER
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
select HAVE_DMA_CONTIGUOUS
|
||||
select HAVE_EXIT_THREAD
|
||||
|
@ -455,10 +455,10 @@ KABI_W or a3, a3, a2
|
||||
abi_call trace_hardirqs_off
|
||||
1:
|
||||
#endif
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
l32i abi_tmp0, a1, PT_PS
|
||||
bbci.l abi_tmp0, PS_UM_BIT, 1f
|
||||
abi_call context_tracking_user_exit
|
||||
abi_call user_exit_callable
|
||||
1:
|
||||
#endif
|
||||
|
||||
@ -544,8 +544,8 @@ common_exception_return:
|
||||
j .Lrestore_state
|
||||
|
||||
.Lexit_tif_loop_user:
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
abi_call context_tracking_user_enter
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
abi_call user_enter_callable
|
||||
#endif
|
||||
#ifdef CONFIG_HAVE_HW_BREAKPOINT
|
||||
_bbci.l abi_saved0, TIF_DB_DISABLED, 1f
|
||||
|
@ -23,6 +23,7 @@
|
||||
#include <linux/minmax.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <acpi/processor.h>
|
||||
#include <linux/context_tracking.h>
|
||||
|
||||
/*
|
||||
* Include the apic definitions for x86 to have the APIC timer related defines
|
||||
@ -647,11 +648,11 @@ static int __cpuidle acpi_idle_enter_bm(struct cpuidle_driver *drv,
|
||||
raw_spin_unlock(&c3_lock);
|
||||
}
|
||||
|
||||
rcu_idle_enter();
|
||||
ct_idle_enter();
|
||||
|
||||
acpi_idle_do_entry(cx);
|
||||
|
||||
rcu_idle_exit();
|
||||
ct_idle_exit();
|
||||
|
||||
/* Re-enable bus master arbitration */
|
||||
if (dis_bm) {
|
||||
|
@ -69,12 +69,12 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
return -1;
|
||||
|
||||
/* Do runtime PM to manage a hierarchical CPU toplogy. */
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
if (s2idle)
|
||||
dev_pm_genpd_suspend(pd_dev);
|
||||
else
|
||||
pm_runtime_put_sync_suspend(pd_dev);
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
|
||||
state = psci_get_domain_state();
|
||||
if (!state)
|
||||
@ -82,12 +82,12 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
|
||||
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
|
||||
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
if (s2idle)
|
||||
dev_pm_genpd_resume(pd_dev);
|
||||
else
|
||||
pm_runtime_get_sync(pd_dev);
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
|
||||
cpu_pm_exit();
|
||||
|
||||
|
@ -116,12 +116,12 @@ static int __sbi_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
return -1;
|
||||
|
||||
/* Do runtime PM to manage a hierarchical CPU toplogy. */
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
if (s2idle)
|
||||
dev_pm_genpd_suspend(pd_dev);
|
||||
else
|
||||
pm_runtime_put_sync_suspend(pd_dev);
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
|
||||
if (sbi_is_domain_state_available())
|
||||
state = sbi_get_domain_state();
|
||||
@ -130,12 +130,12 @@ static int __sbi_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
|
||||
ret = sbi_suspend(state) ? -1 : idx;
|
||||
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
if (s2idle)
|
||||
dev_pm_genpd_resume(pd_dev);
|
||||
else
|
||||
pm_runtime_get_sync(pd_dev);
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
|
||||
cpu_pm_exit();
|
||||
|
||||
|
@ -23,6 +23,7 @@
|
||||
#include <linux/suspend.h>
|
||||
#include <linux/tick.h>
|
||||
#include <linux/mmu_context.h>
|
||||
#include <linux/context_tracking.h>
|
||||
#include <trace/events/power.h>
|
||||
|
||||
#include "cpuidle.h"
|
||||
@ -150,12 +151,12 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
|
||||
*/
|
||||
stop_critical_timings();
|
||||
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
|
||||
rcu_idle_enter();
|
||||
ct_idle_enter();
|
||||
target_state->enter_s2idle(dev, drv, index);
|
||||
if (WARN_ON_ONCE(!irqs_disabled()))
|
||||
local_irq_disable();
|
||||
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
|
||||
rcu_idle_exit();
|
||||
ct_idle_exit();
|
||||
tick_unfreeze();
|
||||
start_critical_timings();
|
||||
|
||||
@ -233,10 +234,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
|
||||
|
||||
stop_critical_timings();
|
||||
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
|
||||
rcu_idle_enter();
|
||||
ct_idle_enter();
|
||||
entered_state = target_state->enter(dev, drv, index);
|
||||
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
|
||||
rcu_idle_exit();
|
||||
ct_idle_exit();
|
||||
start_critical_timings();
|
||||
|
||||
sched_clock_idle_wakeup_event();
|
||||
|
@ -10,71 +10,72 @@
|
||||
#include <asm/ptrace.h>
|
||||
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
extern void context_tracking_cpu_set(int cpu);
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
extern void ct_cpu_track_user(int cpu);
|
||||
|
||||
/* Called with interrupts disabled. */
|
||||
extern void __context_tracking_enter(enum ctx_state state);
|
||||
extern void __context_tracking_exit(enum ctx_state state);
|
||||
extern void __ct_user_enter(enum ctx_state state);
|
||||
extern void __ct_user_exit(enum ctx_state state);
|
||||
|
||||
extern void context_tracking_enter(enum ctx_state state);
|
||||
extern void context_tracking_exit(enum ctx_state state);
|
||||
extern void context_tracking_user_enter(void);
|
||||
extern void context_tracking_user_exit(void);
|
||||
extern void ct_user_enter(enum ctx_state state);
|
||||
extern void ct_user_exit(enum ctx_state state);
|
||||
|
||||
extern void user_enter_callable(void);
|
||||
extern void user_exit_callable(void);
|
||||
|
||||
static inline void user_enter(void)
|
||||
{
|
||||
if (context_tracking_enabled())
|
||||
context_tracking_enter(CONTEXT_USER);
|
||||
ct_user_enter(CONTEXT_USER);
|
||||
|
||||
}
|
||||
static inline void user_exit(void)
|
||||
{
|
||||
if (context_tracking_enabled())
|
||||
context_tracking_exit(CONTEXT_USER);
|
||||
ct_user_exit(CONTEXT_USER);
|
||||
}
|
||||
|
||||
/* Called with interrupts disabled. */
|
||||
static __always_inline void user_enter_irqoff(void)
|
||||
{
|
||||
if (context_tracking_enabled())
|
||||
__context_tracking_enter(CONTEXT_USER);
|
||||
__ct_user_enter(CONTEXT_USER);
|
||||
|
||||
}
|
||||
static __always_inline void user_exit_irqoff(void)
|
||||
{
|
||||
if (context_tracking_enabled())
|
||||
__context_tracking_exit(CONTEXT_USER);
|
||||
__ct_user_exit(CONTEXT_USER);
|
||||
}
|
||||
|
||||
static inline enum ctx_state exception_enter(void)
|
||||
{
|
||||
enum ctx_state prev_ctx;
|
||||
|
||||
if (IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) ||
|
||||
if (IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK) ||
|
||||
!context_tracking_enabled())
|
||||
return 0;
|
||||
|
||||
prev_ctx = this_cpu_read(context_tracking.state);
|
||||
prev_ctx = __ct_state();
|
||||
if (prev_ctx != CONTEXT_KERNEL)
|
||||
context_tracking_exit(prev_ctx);
|
||||
ct_user_exit(prev_ctx);
|
||||
|
||||
return prev_ctx;
|
||||
}
|
||||
|
||||
static inline void exception_exit(enum ctx_state prev_ctx)
|
||||
{
|
||||
if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) &&
|
||||
if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK) &&
|
||||
context_tracking_enabled()) {
|
||||
if (prev_ctx != CONTEXT_KERNEL)
|
||||
context_tracking_enter(prev_ctx);
|
||||
ct_user_enter(prev_ctx);
|
||||
}
|
||||
}
|
||||
|
||||
static __always_inline bool context_tracking_guest_enter(void)
|
||||
{
|
||||
if (context_tracking_enabled())
|
||||
__context_tracking_enter(CONTEXT_GUEST);
|
||||
__ct_user_enter(CONTEXT_GUEST);
|
||||
|
||||
return context_tracking_enabled_this_cpu();
|
||||
}
|
||||
@ -82,40 +83,56 @@ static __always_inline bool context_tracking_guest_enter(void)
|
||||
static __always_inline void context_tracking_guest_exit(void)
|
||||
{
|
||||
if (context_tracking_enabled())
|
||||
__context_tracking_exit(CONTEXT_GUEST);
|
||||
__ct_user_exit(CONTEXT_GUEST);
|
||||
}
|
||||
|
||||
/**
|
||||
* ct_state() - return the current context tracking state if known
|
||||
*
|
||||
* Returns the current cpu's context tracking state if context tracking
|
||||
* is enabled. If context tracking is disabled, returns
|
||||
* CONTEXT_DISABLED. This should be used primarily for debugging.
|
||||
*/
|
||||
static __always_inline enum ctx_state ct_state(void)
|
||||
{
|
||||
return context_tracking_enabled() ?
|
||||
this_cpu_read(context_tracking.state) : CONTEXT_DISABLED;
|
||||
}
|
||||
#define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond))
|
||||
|
||||
#else
|
||||
static inline void user_enter(void) { }
|
||||
static inline void user_exit(void) { }
|
||||
static inline void user_enter_irqoff(void) { }
|
||||
static inline void user_exit_irqoff(void) { }
|
||||
static inline enum ctx_state exception_enter(void) { return 0; }
|
||||
static inline int exception_enter(void) { return 0; }
|
||||
static inline void exception_exit(enum ctx_state prev_ctx) { }
|
||||
static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; }
|
||||
static inline int ct_state(void) { return -1; }
|
||||
static __always_inline bool context_tracking_guest_enter(void) { return false; }
|
||||
static inline void context_tracking_guest_exit(void) { }
|
||||
#define CT_WARN_ON(cond) do { } while (0)
|
||||
#endif /* !CONFIG_CONTEXT_TRACKING_USER */
|
||||
|
||||
#endif /* !CONFIG_CONTEXT_TRACKING */
|
||||
|
||||
#define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond))
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_FORCE
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE
|
||||
extern void context_tracking_init(void);
|
||||
#else
|
||||
static inline void context_tracking_init(void) { }
|
||||
#endif /* CONFIG_CONTEXT_TRACKING_FORCE */
|
||||
#endif /* CONFIG_CONTEXT_TRACKING_USER_FORCE */
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
|
||||
extern void ct_idle_enter(void);
|
||||
extern void ct_idle_exit(void);
|
||||
|
||||
/*
|
||||
* Is the current CPU in an extended quiescent state?
|
||||
*
|
||||
* No ordering, as we are sampling CPU-local information.
|
||||
*/
|
||||
static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void)
|
||||
{
|
||||
return !(arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & RCU_DYNTICKS_IDX);
|
||||
}
|
||||
|
||||
/*
|
||||
* Increment the current CPU's context_tracking structure's ->state field
|
||||
* with ordering. Return the new value.
|
||||
*/
|
||||
static __always_inline unsigned long ct_state_inc(int incby)
|
||||
{
|
||||
return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state));
|
||||
}
|
||||
|
||||
#else
|
||||
static inline void ct_idle_enter(void) { }
|
||||
static inline void ct_idle_exit(void) { }
|
||||
#endif /* !CONFIG_CONTEXT_TRACKING_IDLE */
|
||||
|
||||
#endif
|
||||
|
21
include/linux/context_tracking_irq.h
Normal file
21
include/linux/context_tracking_irq.h
Normal file
@ -0,0 +1,21 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef _LINUX_CONTEXT_TRACKING_IRQ_H
|
||||
#define _LINUX_CONTEXT_TRACKING_IRQ_H
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
|
||||
void ct_irq_enter(void);
|
||||
void ct_irq_exit(void);
|
||||
void ct_irq_enter_irqson(void);
|
||||
void ct_irq_exit_irqson(void);
|
||||
void ct_nmi_enter(void);
|
||||
void ct_nmi_exit(void);
|
||||
#else
|
||||
static inline void ct_irq_enter(void) { }
|
||||
static inline void ct_irq_exit(void) { }
|
||||
static inline void ct_irq_enter_irqson(void) { }
|
||||
static inline void ct_irq_exit_irqson(void) { }
|
||||
static inline void ct_nmi_enter(void) { }
|
||||
static inline void ct_nmi_exit(void) { }
|
||||
#endif
|
||||
|
||||
#endif
|
@ -4,8 +4,28 @@
|
||||
|
||||
#include <linux/percpu.h>
|
||||
#include <linux/static_key.h>
|
||||
#include <linux/context_tracking_irq.h>
|
||||
|
||||
/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */
|
||||
#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1)
|
||||
|
||||
enum ctx_state {
|
||||
CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */
|
||||
CONTEXT_KERNEL = 0,
|
||||
CONTEXT_IDLE = 1,
|
||||
CONTEXT_USER = 2,
|
||||
CONTEXT_GUEST = 3,
|
||||
CONTEXT_MAX = 4,
|
||||
};
|
||||
|
||||
/* Even value for idle, else odd. */
|
||||
#define RCU_DYNTICKS_IDX CONTEXT_MAX
|
||||
|
||||
#define CT_STATE_MASK (CONTEXT_MAX - 1)
|
||||
#define CT_DYNTICKS_MASK (~CT_STATE_MASK)
|
||||
|
||||
struct context_tracking {
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
/*
|
||||
* When active is false, probes are unset in order
|
||||
* to minimize overhead: TIF flags are cleared
|
||||
@ -14,18 +34,73 @@ struct context_tracking {
|
||||
*/
|
||||
bool active;
|
||||
int recursion;
|
||||
enum ctx_state {
|
||||
CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */
|
||||
CONTEXT_KERNEL = 0,
|
||||
CONTEXT_USER,
|
||||
CONTEXT_GUEST,
|
||||
} state;
|
||||
#endif
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
atomic_t state;
|
||||
#endif
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
|
||||
long dynticks_nesting; /* Track process nesting level. */
|
||||
long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
|
||||
#endif
|
||||
};
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
extern struct static_key_false context_tracking_key;
|
||||
DECLARE_PER_CPU(struct context_tracking, context_tracking);
|
||||
|
||||
static __always_inline int __ct_state(void)
|
||||
{
|
||||
return arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
|
||||
static __always_inline int ct_dynticks(void)
|
||||
{
|
||||
return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_DYNTICKS_MASK;
|
||||
}
|
||||
|
||||
static __always_inline int ct_dynticks_cpu(int cpu)
|
||||
{
|
||||
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
|
||||
|
||||
return atomic_read(&ct->state) & CT_DYNTICKS_MASK;
|
||||
}
|
||||
|
||||
static __always_inline int ct_dynticks_cpu_acquire(int cpu)
|
||||
{
|
||||
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
|
||||
|
||||
return atomic_read_acquire(&ct->state) & CT_DYNTICKS_MASK;
|
||||
}
|
||||
|
||||
static __always_inline long ct_dynticks_nesting(void)
|
||||
{
|
||||
return __this_cpu_read(context_tracking.dynticks_nesting);
|
||||
}
|
||||
|
||||
static __always_inline long ct_dynticks_nesting_cpu(int cpu)
|
||||
{
|
||||
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
|
||||
|
||||
return ct->dynticks_nesting;
|
||||
}
|
||||
|
||||
static __always_inline long ct_dynticks_nmi_nesting(void)
|
||||
{
|
||||
return __this_cpu_read(context_tracking.dynticks_nmi_nesting);
|
||||
}
|
||||
|
||||
static __always_inline long ct_dynticks_nmi_nesting_cpu(int cpu)
|
||||
{
|
||||
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
|
||||
|
||||
return ct->dynticks_nmi_nesting;
|
||||
}
|
||||
#endif /* #ifdef CONFIG_CONTEXT_TRACKING_IDLE */
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
extern struct static_key_false context_tracking_key;
|
||||
|
||||
static __always_inline bool context_tracking_enabled(void)
|
||||
{
|
||||
return static_branch_unlikely(&context_tracking_key);
|
||||
@ -41,15 +116,31 @@ static inline bool context_tracking_enabled_this_cpu(void)
|
||||
return context_tracking_enabled() && __this_cpu_read(context_tracking.active);
|
||||
}
|
||||
|
||||
static __always_inline bool context_tracking_in_user(void)
|
||||
/**
|
||||
* ct_state() - return the current context tracking state if known
|
||||
*
|
||||
* Returns the current cpu's context tracking state if context tracking
|
||||
* is enabled. If context tracking is disabled, returns
|
||||
* CONTEXT_DISABLED. This should be used primarily for debugging.
|
||||
*/
|
||||
static __always_inline int ct_state(void)
|
||||
{
|
||||
return __this_cpu_read(context_tracking.state) == CONTEXT_USER;
|
||||
int ret;
|
||||
|
||||
if (!context_tracking_enabled())
|
||||
return CONTEXT_DISABLED;
|
||||
|
||||
preempt_disable();
|
||||
ret = __ct_state();
|
||||
preempt_enable();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
#else
|
||||
static __always_inline bool context_tracking_in_user(void) { return false; }
|
||||
static __always_inline bool context_tracking_enabled(void) { return false; }
|
||||
static __always_inline bool context_tracking_enabled_cpu(int cpu) { return false; }
|
||||
static __always_inline bool context_tracking_enabled_this_cpu(void) { return false; }
|
||||
#endif /* CONFIG_CONTEXT_TRACKING */
|
||||
#endif /* CONFIG_CONTEXT_TRACKING_USER */
|
||||
|
||||
#endif
|
||||
|
@ -357,7 +357,7 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs);
|
||||
/**
|
||||
* struct irqentry_state - Opaque object for exception state storage
|
||||
* @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
|
||||
* exit path has to invoke rcu_irq_exit().
|
||||
* exit path has to invoke ct_irq_exit().
|
||||
* @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
|
||||
* lockdep state is restored correctly on exit from nmi.
|
||||
*
|
||||
@ -395,12 +395,12 @@ typedef struct irqentry_state {
|
||||
*
|
||||
* For kernel mode entries RCU handling is done conditional. If RCU is
|
||||
* watching then the only RCU requirement is to check whether the tick has
|
||||
* to be restarted. If RCU is not watching then rcu_irq_enter() has to be
|
||||
* invoked on entry and rcu_irq_exit() on exit.
|
||||
* to be restarted. If RCU is not watching then ct_irq_enter() has to be
|
||||
* invoked on entry and ct_irq_exit() on exit.
|
||||
*
|
||||
* Avoiding the rcu_irq_enter/exit() calls is an optimization but also
|
||||
* Avoiding the ct_irq_enter/exit() calls is an optimization but also
|
||||
* solves the problem of kernel mode pagefaults which can schedule, which
|
||||
* is not possible after invoking rcu_irq_enter() without undoing it.
|
||||
* is not possible after invoking ct_irq_enter() without undoing it.
|
||||
*
|
||||
* For user mode entries irqentry_enter_from_user_mode() is invoked to
|
||||
* establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
|
||||
|
@ -92,14 +92,6 @@ void irq_exit_rcu(void);
|
||||
#define arch_nmi_exit() do { } while (0)
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_TINY_RCU
|
||||
static inline void rcu_nmi_enter(void) { }
|
||||
static inline void rcu_nmi_exit(void) { }
|
||||
#else
|
||||
extern void rcu_nmi_enter(void);
|
||||
extern void rcu_nmi_exit(void);
|
||||
#endif
|
||||
|
||||
/*
|
||||
* NMI vs Tracing
|
||||
* --------------
|
||||
@ -124,7 +116,7 @@ extern void rcu_nmi_exit(void);
|
||||
do { \
|
||||
__nmi_enter(); \
|
||||
lockdep_hardirq_enter(); \
|
||||
rcu_nmi_enter(); \
|
||||
ct_nmi_enter(); \
|
||||
instrumentation_begin(); \
|
||||
ftrace_nmi_enter(); \
|
||||
instrumentation_end(); \
|
||||
@ -143,7 +135,7 @@ extern void rcu_nmi_exit(void);
|
||||
instrumentation_begin(); \
|
||||
ftrace_nmi_exit(); \
|
||||
instrumentation_end(); \
|
||||
rcu_nmi_exit(); \
|
||||
ct_nmi_exit(); \
|
||||
lockdep_hardirq_exit(); \
|
||||
__nmi_exit(); \
|
||||
} while (0)
|
||||
|
@ -29,6 +29,7 @@
|
||||
#include <linux/lockdep.h>
|
||||
#include <asm/processor.h>
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/context_tracking_irq.h>
|
||||
|
||||
#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
|
||||
#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
|
||||
@ -41,6 +42,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
|
||||
void rcu_barrier_tasks(void);
|
||||
void rcu_barrier_tasks_rude(void);
|
||||
void synchronize_rcu(void);
|
||||
unsigned long get_completed_synchronize_rcu(void);
|
||||
|
||||
#ifdef CONFIG_PREEMPT_RCU
|
||||
|
||||
@ -103,13 +105,11 @@ static inline void rcu_sysrq_start(void) { }
|
||||
static inline void rcu_sysrq_end(void) { }
|
||||
#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
|
||||
|
||||
#ifdef CONFIG_NO_HZ_FULL
|
||||
void rcu_user_enter(void);
|
||||
void rcu_user_exit(void);
|
||||
#if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK))
|
||||
void rcu_irq_work_resched(void);
|
||||
#else
|
||||
static inline void rcu_user_enter(void) { }
|
||||
static inline void rcu_user_exit(void) { }
|
||||
#endif /* CONFIG_NO_HZ_FULL */
|
||||
static inline void rcu_irq_work_resched(void) { }
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_RCU_NOCB_CPU
|
||||
void rcu_init_nohz(void);
|
||||
@ -128,7 +128,7 @@ static inline void rcu_nocb_flush_deferred_wakeup(void) { }
|
||||
* @a: Code that RCU needs to pay attention to.
|
||||
*
|
||||
* RCU read-side critical sections are forbidden in the inner idle loop,
|
||||
* that is, between the rcu_idle_enter() and the rcu_idle_exit() -- RCU
|
||||
* that is, between the ct_idle_enter() and the ct_idle_exit() -- RCU
|
||||
* will happily ignore any such read-side critical sections. However,
|
||||
* things like powertop need tracepoints in the inner idle loop.
|
||||
*
|
||||
@ -143,9 +143,9 @@ static inline void rcu_nocb_flush_deferred_wakeup(void) { }
|
||||
*/
|
||||
#define RCU_NONIDLE(a) \
|
||||
do { \
|
||||
rcu_irq_enter_irqson(); \
|
||||
ct_irq_enter_irqson(); \
|
||||
do { a; } while (0); \
|
||||
rcu_irq_exit_irqson(); \
|
||||
ct_irq_exit_irqson(); \
|
||||
} while (0)
|
||||
|
||||
/*
|
||||
@ -169,13 +169,24 @@ void synchronize_rcu_tasks(void);
|
||||
# endif
|
||||
|
||||
# ifdef CONFIG_TASKS_TRACE_RCU
|
||||
# define rcu_tasks_trace_qs(t) \
|
||||
do { \
|
||||
if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
|
||||
!unlikely(READ_ONCE((t)->trc_reader_nesting))) { \
|
||||
smp_store_release(&(t)->trc_reader_checked, true); \
|
||||
smp_mb(); /* Readers partitioned by store. */ \
|
||||
} \
|
||||
// Bits for ->trc_reader_special.b.need_qs field.
|
||||
#define TRC_NEED_QS 0x1 // Task needs a quiescent state.
|
||||
#define TRC_NEED_QS_CHECKED 0x2 // Task has been checked for needing quiescent state.
|
||||
|
||||
u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new);
|
||||
void rcu_tasks_trace_qs_blkd(struct task_struct *t);
|
||||
|
||||
# define rcu_tasks_trace_qs(t) \
|
||||
do { \
|
||||
int ___rttq_nesting = READ_ONCE((t)->trc_reader_nesting); \
|
||||
\
|
||||
if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \
|
||||
likely(!___rttq_nesting)) { \
|
||||
rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \
|
||||
} else if (___rttq_nesting && ___rttq_nesting != INT_MIN && \
|
||||
!READ_ONCE((t)->trc_reader_special.b.blocked)) { \
|
||||
rcu_tasks_trace_qs_blkd(t); \
|
||||
} \
|
||||
} while (0)
|
||||
# else
|
||||
# define rcu_tasks_trace_qs(t) do { } while (0)
|
||||
@ -184,7 +195,7 @@ void synchronize_rcu_tasks(void);
|
||||
#define rcu_tasks_qs(t, preempt) \
|
||||
do { \
|
||||
rcu_tasks_classic_qs((t), (preempt)); \
|
||||
rcu_tasks_trace_qs((t)); \
|
||||
rcu_tasks_trace_qs(t); \
|
||||
} while (0)
|
||||
|
||||
# ifdef CONFIG_TASKS_RUDE_RCU
|
||||
|
@ -75,7 +75,7 @@ static inline void rcu_read_unlock_trace(void)
|
||||
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
|
||||
barrier(); // Critical section before disabling.
|
||||
// Disable IPI-based setting of .need_qs.
|
||||
WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
|
||||
WRITE_ONCE(t->trc_reader_nesting, INT_MIN + nesting);
|
||||
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
|
||||
WRITE_ONCE(t->trc_reader_nesting, nesting);
|
||||
return; // We assume shallow reader nesting.
|
||||
|
@ -23,6 +23,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
|
||||
might_sleep();
|
||||
}
|
||||
|
||||
static inline unsigned long start_poll_synchronize_rcu_expedited(void)
|
||||
{
|
||||
return start_poll_synchronize_rcu();
|
||||
}
|
||||
|
||||
static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
|
||||
{
|
||||
cond_synchronize_rcu(oldstate);
|
||||
}
|
||||
|
||||
extern void rcu_barrier(void);
|
||||
|
||||
static inline void synchronize_rcu_expedited(void)
|
||||
@ -38,7 +48,7 @@ static inline void synchronize_rcu_expedited(void)
|
||||
*/
|
||||
extern void kvfree(const void *addr);
|
||||
|
||||
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
{
|
||||
if (head) {
|
||||
call_rcu(head, func);
|
||||
@ -51,6 +61,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
kvfree((void *) func);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_KASAN_GENERIC
|
||||
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
|
||||
#else
|
||||
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
{
|
||||
__kvfree_call_rcu(head, func);
|
||||
}
|
||||
#endif
|
||||
|
||||
void rcu_qs(void);
|
||||
|
||||
static inline void rcu_softirq_qs(void)
|
||||
@ -76,12 +95,6 @@ static inline int rcu_needs_cpu(void)
|
||||
static inline void rcu_virt_note_context_switch(int cpu) { }
|
||||
static inline void rcu_cpu_stall_reset(void) { }
|
||||
static inline int rcu_jiffies_till_stall_check(void) { return 21 * HZ; }
|
||||
static inline void rcu_idle_enter(void) { }
|
||||
static inline void rcu_idle_exit(void) { }
|
||||
static inline void rcu_irq_enter(void) { }
|
||||
static inline void rcu_irq_exit_irqson(void) { }
|
||||
static inline void rcu_irq_enter_irqson(void) { }
|
||||
static inline void rcu_irq_exit(void) { }
|
||||
static inline void rcu_irq_exit_check_preempt(void) { }
|
||||
#define rcu_is_idle_cpu(cpu) \
|
||||
(is_idle_task(current) && !in_nmi() && !in_hardirq() && !in_serving_softirq())
|
||||
|
@ -40,17 +40,13 @@ bool rcu_eqs_special_set(int cpu);
|
||||
void rcu_momentary_dyntick_idle(void);
|
||||
void kfree_rcu_scheduler_running(void);
|
||||
bool rcu_gp_might_be_stalled(void);
|
||||
unsigned long start_poll_synchronize_rcu_expedited(void);
|
||||
void cond_synchronize_rcu_expedited(unsigned long oldstate);
|
||||
unsigned long get_state_synchronize_rcu(void);
|
||||
unsigned long start_poll_synchronize_rcu(void);
|
||||
bool poll_state_synchronize_rcu(unsigned long oldstate);
|
||||
void cond_synchronize_rcu(unsigned long oldstate);
|
||||
|
||||
void rcu_idle_enter(void);
|
||||
void rcu_idle_exit(void);
|
||||
void rcu_irq_enter(void);
|
||||
void rcu_irq_exit(void);
|
||||
void rcu_irq_enter_irqson(void);
|
||||
void rcu_irq_exit_irqson(void);
|
||||
bool rcu_is_idle_cpu(int cpu);
|
||||
|
||||
#ifdef CONFIG_PROVE_RCU
|
||||
@ -59,6 +55,9 @@ void rcu_irq_exit_check_preempt(void);
|
||||
static inline void rcu_irq_exit_check_preempt(void) { }
|
||||
#endif
|
||||
|
||||
struct task_struct;
|
||||
void rcu_preempt_deferred_qs(struct task_struct *t);
|
||||
|
||||
void exit_rcu(void);
|
||||
|
||||
void rcu_scheduler_starting(void);
|
||||
|
@ -843,8 +843,9 @@ struct task_struct {
|
||||
int trc_reader_nesting;
|
||||
int trc_ipi_to_cpu;
|
||||
union rcu_special trc_reader_special;
|
||||
bool trc_reader_checked;
|
||||
struct list_head trc_holdout_list;
|
||||
struct list_head trc_blkd_node;
|
||||
int trc_blkd_cpu;
|
||||
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
|
||||
|
||||
struct sched_info sched_info;
|
||||
@ -2223,6 +2224,7 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
|
||||
|
||||
extern bool sched_task_on_rq(struct task_struct *p);
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern struct task_struct *cpu_curr_snapshot(int cpu);
|
||||
|
||||
/*
|
||||
* In order to reduce various lock holder preemption latencies provide an
|
||||
|
@ -200,13 +200,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
|
||||
*/ \
|
||||
if (rcuidle) { \
|
||||
__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
|
||||
rcu_irq_enter_irqson(); \
|
||||
ct_irq_enter_irqson(); \
|
||||
} \
|
||||
\
|
||||
__DO_TRACE_CALL(name, TP_ARGS(args)); \
|
||||
\
|
||||
if (rcuidle) { \
|
||||
rcu_irq_exit_irqson(); \
|
||||
ct_irq_exit_irqson(); \
|
||||
srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
|
||||
} \
|
||||
\
|
||||
|
@ -494,11 +494,11 @@ config VIRT_CPU_ACCOUNTING_NATIVE
|
||||
|
||||
config VIRT_CPU_ACCOUNTING_GEN
|
||||
bool "Full dynticks CPU time accounting"
|
||||
depends on HAVE_CONTEXT_TRACKING
|
||||
depends on HAVE_CONTEXT_TRACKING_USER
|
||||
depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
|
||||
depends on GENERIC_CLOCKEVENTS
|
||||
select VIRT_CPU_ACCOUNTING
|
||||
select CONTEXT_TRACKING
|
||||
select CONTEXT_TRACKING_USER
|
||||
help
|
||||
Select this option to enable task and CPU time accounting on full
|
||||
dynticks systems. This accounting is implemented by watching every
|
||||
|
@ -157,6 +157,7 @@ struct task_struct init_task
|
||||
.trc_reader_nesting = 0,
|
||||
.trc_reader_special.s = 0,
|
||||
.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
|
||||
.trc_blkd_node = LIST_HEAD_INIT(init_task.trc_blkd_node),
|
||||
#endif
|
||||
#ifdef CONFIG_CPUSETS
|
||||
.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
|
||||
|
@ -295,7 +295,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr)
|
||||
rcu_idle = !rcu_is_watching();
|
||||
if (rcu_idle) {
|
||||
local_irq_save(flags);
|
||||
rcu_irq_enter();
|
||||
ct_irq_enter();
|
||||
}
|
||||
|
||||
if (IS_ENABLED(CONFIG_CFI_CLANG_SHADOW))
|
||||
@ -304,7 +304,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr)
|
||||
fn = find_module_check_fn(ptr);
|
||||
|
||||
if (rcu_idle) {
|
||||
rcu_irq_exit();
|
||||
ct_irq_exit();
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
|
@ -1,18 +1,20 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/*
|
||||
* Context tracking: Probe on high level context boundaries such as kernel
|
||||
* and userspace. This includes syscalls and exceptions entry/exit.
|
||||
* Context tracking: Probe on high level context boundaries such as kernel,
|
||||
* userspace, guest or idle.
|
||||
*
|
||||
* This is used by RCU to remove its dependency on the timer tick while a CPU
|
||||
* runs in userspace.
|
||||
* runs in idle, userspace or guest mode.
|
||||
*
|
||||
* Started by Frederic Weisbecker:
|
||||
* User/guest tracking started by Frederic Weisbecker:
|
||||
*
|
||||
* Copyright (C) 2012 Red Hat, Inc., Frederic Weisbecker <fweisbec@redhat.com>
|
||||
* Copyright (C) 2012 Red Hat, Inc., Frederic Weisbecker
|
||||
*
|
||||
* Many thanks to Gilad Ben-Yossef, Paul McKenney, Ingo Molnar, Andrew Morton,
|
||||
* Steven Rostedt, Peter Zijlstra for suggestions and improvements.
|
||||
*
|
||||
* RCU extended quiescent state bits imported from kernel/rcu/tree.c
|
||||
* where the relevant authorship may be found.
|
||||
*/
|
||||
|
||||
#include <linux/context_tracking.h>
|
||||
@ -21,6 +23,411 @@
|
||||
#include <linux/hardirq.h>
|
||||
#include <linux/export.h>
|
||||
#include <linux/kprobes.h>
|
||||
#include <trace/events/rcu.h>
|
||||
|
||||
|
||||
DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
|
||||
.dynticks_nesting = 1,
|
||||
.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
|
||||
#endif
|
||||
.state = ATOMIC_INIT(RCU_DYNTICKS_IDX),
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(context_tracking);
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
|
||||
#define TPS(x) tracepoint_string(x)
|
||||
|
||||
/* Record the current task on dyntick-idle entry. */
|
||||
static __always_inline void rcu_dynticks_task_enter(void)
|
||||
{
|
||||
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
|
||||
WRITE_ONCE(current->rcu_tasks_idle_cpu, smp_processor_id());
|
||||
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
|
||||
}
|
||||
|
||||
/* Record no current task on dyntick-idle exit. */
|
||||
static __always_inline void rcu_dynticks_task_exit(void)
|
||||
{
|
||||
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
|
||||
WRITE_ONCE(current->rcu_tasks_idle_cpu, -1);
|
||||
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
|
||||
}
|
||||
|
||||
/* Turn on heavyweight RCU tasks trace readers on idle/user entry. */
|
||||
static __always_inline void rcu_dynticks_task_trace_enter(void)
|
||||
{
|
||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
current->trc_reader_special.b.need_mb = true;
|
||||
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
|
||||
}
|
||||
|
||||
/* Turn off heavyweight RCU tasks trace readers on idle/user exit. */
|
||||
static __always_inline void rcu_dynticks_task_trace_exit(void)
|
||||
{
|
||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
current->trc_reader_special.b.need_mb = false;
|
||||
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
|
||||
}
|
||||
|
||||
/*
|
||||
* Record entry into an extended quiescent state. This is only to be
|
||||
* called when not already in an extended quiescent state, that is,
|
||||
* RCU is watching prior to the call to this function and is no longer
|
||||
* watching upon return.
|
||||
*/
|
||||
static noinstr void ct_kernel_exit_state(int offset)
|
||||
{
|
||||
int seq;
|
||||
|
||||
/*
|
||||
* CPUs seeing atomic_add_return() must see prior RCU read-side
|
||||
* critical sections, and we also must force ordering with the
|
||||
* next idle sojourn.
|
||||
*/
|
||||
rcu_dynticks_task_trace_enter(); // Before ->dynticks update!
|
||||
seq = ct_state_inc(offset);
|
||||
// RCU is no longer watching. Better be in extended quiescent state!
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & RCU_DYNTICKS_IDX));
|
||||
}
|
||||
|
||||
/*
|
||||
* Record exit from an extended quiescent state. This is only to be
|
||||
* called from an extended quiescent state, that is, RCU is not watching
|
||||
* prior to the call to this function and is watching upon return.
|
||||
*/
|
||||
static noinstr void ct_kernel_enter_state(int offset)
|
||||
{
|
||||
int seq;
|
||||
|
||||
/*
|
||||
* CPUs seeing atomic_add_return() must see prior idle sojourns,
|
||||
* and we also must force ordering with the next RCU read-side
|
||||
* critical section.
|
||||
*/
|
||||
seq = ct_state_inc(offset);
|
||||
// RCU is now watching. Better not be in an extended quiescent state!
|
||||
rcu_dynticks_task_trace_exit(); // After ->dynticks update!
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !(seq & RCU_DYNTICKS_IDX));
|
||||
}
|
||||
|
||||
/*
|
||||
* Enter an RCU extended quiescent state, which can be either the
|
||||
* idle loop or adaptive-tickless usermode execution.
|
||||
*
|
||||
* We crowbar the ->dynticks_nmi_nesting field to zero to allow for
|
||||
* the possibility of usermode upcalls having messed up our count
|
||||
* of interrupt nesting level during the prior busy period.
|
||||
*/
|
||||
static void noinstr ct_kernel_exit(bool user, int offset)
|
||||
{
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
|
||||
WARN_ON_ONCE(ct_dynticks_nmi_nesting() != DYNTICK_IRQ_NONIDLE);
|
||||
WRITE_ONCE(ct->dynticks_nmi_nesting, 0);
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
|
||||
ct_dynticks_nesting() == 0);
|
||||
if (ct_dynticks_nesting() != 1) {
|
||||
// RCU will still be watching, so just do accounting and leave.
|
||||
ct->dynticks_nesting--;
|
||||
return;
|
||||
}
|
||||
|
||||
instrumentation_begin();
|
||||
lockdep_assert_irqs_disabled();
|
||||
trace_rcu_dyntick(TPS("Start"), ct_dynticks_nesting(), 0, ct_dynticks());
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
|
||||
rcu_preempt_deferred_qs(current);
|
||||
|
||||
// instrumentation for the noinstr ct_kernel_exit_state()
|
||||
instrument_atomic_write(&ct->state, sizeof(ct->state));
|
||||
|
||||
instrumentation_end();
|
||||
WRITE_ONCE(ct->dynticks_nesting, 0); /* Avoid irq-access tearing. */
|
||||
// RCU is watching here ...
|
||||
ct_kernel_exit_state(offset);
|
||||
// ... but is no longer watching here.
|
||||
rcu_dynticks_task_enter();
|
||||
}
|
||||
|
||||
/*
|
||||
* Exit an RCU extended quiescent state, which can be either the
|
||||
* idle loop or adaptive-tickless usermode execution.
|
||||
*
|
||||
* We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
|
||||
* allow for the possibility of usermode upcalls messing up our count of
|
||||
* interrupt nesting level during the busy period that is just now starting.
|
||||
*/
|
||||
static void noinstr ct_kernel_enter(bool user, int offset)
|
||||
{
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
long oldval;
|
||||
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
|
||||
oldval = ct_dynticks_nesting();
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
|
||||
if (oldval) {
|
||||
// RCU was already watching, so just do accounting and leave.
|
||||
ct->dynticks_nesting++;
|
||||
return;
|
||||
}
|
||||
rcu_dynticks_task_exit();
|
||||
// RCU is not watching here ...
|
||||
ct_kernel_enter_state(offset);
|
||||
// ... but is watching here.
|
||||
instrumentation_begin();
|
||||
|
||||
// instrumentation for the noinstr ct_kernel_enter_state()
|
||||
instrument_atomic_write(&ct->state, sizeof(ct->state));
|
||||
|
||||
trace_rcu_dyntick(TPS("End"), ct_dynticks_nesting(), 1, ct_dynticks());
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
|
||||
WRITE_ONCE(ct->dynticks_nesting, 1);
|
||||
WARN_ON_ONCE(ct_dynticks_nmi_nesting());
|
||||
WRITE_ONCE(ct->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
|
||||
instrumentation_end();
|
||||
}
|
||||
|
||||
/**
|
||||
* ct_nmi_exit - inform RCU of exit from NMI context
|
||||
*
|
||||
* If we are returning from the outermost NMI handler that interrupted an
|
||||
* RCU-idle period, update ct->state and ct->dynticks_nmi_nesting
|
||||
* to let the RCU grace-period handling know that the CPU is back to
|
||||
* being RCU-idle.
|
||||
*
|
||||
* If you add or remove a call to ct_nmi_exit(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void noinstr ct_nmi_exit(void)
|
||||
{
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
|
||||
instrumentation_begin();
|
||||
/*
|
||||
* Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
|
||||
* (We are exiting an NMI handler, so RCU better be paying attention
|
||||
* to us!)
|
||||
*/
|
||||
WARN_ON_ONCE(ct_dynticks_nmi_nesting() <= 0);
|
||||
WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
|
||||
|
||||
/*
|
||||
* If the nesting level is not 1, the CPU wasn't RCU-idle, so
|
||||
* leave it in non-RCU-idle state.
|
||||
*/
|
||||
if (ct_dynticks_nmi_nesting() != 1) {
|
||||
trace_rcu_dyntick(TPS("--="), ct_dynticks_nmi_nesting(), ct_dynticks_nmi_nesting() - 2,
|
||||
ct_dynticks());
|
||||
WRITE_ONCE(ct->dynticks_nmi_nesting, /* No store tearing. */
|
||||
ct_dynticks_nmi_nesting() - 2);
|
||||
instrumentation_end();
|
||||
return;
|
||||
}
|
||||
|
||||
/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
|
||||
trace_rcu_dyntick(TPS("Startirq"), ct_dynticks_nmi_nesting(), 0, ct_dynticks());
|
||||
WRITE_ONCE(ct->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
|
||||
|
||||
// instrumentation for the noinstr ct_kernel_exit_state()
|
||||
instrument_atomic_write(&ct->state, sizeof(ct->state));
|
||||
instrumentation_end();
|
||||
|
||||
// RCU is watching here ...
|
||||
ct_kernel_exit_state(RCU_DYNTICKS_IDX);
|
||||
// ... but is no longer watching here.
|
||||
|
||||
if (!in_nmi())
|
||||
rcu_dynticks_task_enter();
|
||||
}
|
||||
|
||||
/**
|
||||
* ct_nmi_enter - inform RCU of entry to NMI context
|
||||
*
|
||||
* If the CPU was idle from RCU's viewpoint, update ct->state and
|
||||
* ct->dynticks_nmi_nesting to let the RCU grace-period handling know
|
||||
* that the CPU is active. This implementation permits nested NMIs, as
|
||||
* long as the nesting level does not overflow an int. (You will probably
|
||||
* run out of stack space first.)
|
||||
*
|
||||
* If you add or remove a call to ct_nmi_enter(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void noinstr ct_nmi_enter(void)
|
||||
{
|
||||
long incby = 2;
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
|
||||
/* Complain about underflow. */
|
||||
WARN_ON_ONCE(ct_dynticks_nmi_nesting() < 0);
|
||||
|
||||
/*
|
||||
* If idle from RCU viewpoint, atomically increment ->dynticks
|
||||
* to mark non-idle and increment ->dynticks_nmi_nesting by one.
|
||||
* Otherwise, increment ->dynticks_nmi_nesting by two. This means
|
||||
* if ->dynticks_nmi_nesting is equal to one, we are guaranteed
|
||||
* to be in the outermost NMI handler that interrupted an RCU-idle
|
||||
* period (observation due to Andy Lutomirski).
|
||||
*/
|
||||
if (rcu_dynticks_curr_cpu_in_eqs()) {
|
||||
|
||||
if (!in_nmi())
|
||||
rcu_dynticks_task_exit();
|
||||
|
||||
// RCU is not watching here ...
|
||||
ct_kernel_enter_state(RCU_DYNTICKS_IDX);
|
||||
// ... but is watching here.
|
||||
|
||||
instrumentation_begin();
|
||||
// instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
|
||||
instrument_atomic_read(&ct->state, sizeof(ct->state));
|
||||
// instrumentation for the noinstr ct_kernel_enter_state()
|
||||
instrument_atomic_write(&ct->state, sizeof(ct->state));
|
||||
|
||||
incby = 1;
|
||||
} else if (!in_nmi()) {
|
||||
instrumentation_begin();
|
||||
rcu_irq_enter_check_tick();
|
||||
} else {
|
||||
instrumentation_begin();
|
||||
}
|
||||
|
||||
trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
|
||||
ct_dynticks_nmi_nesting(),
|
||||
ct_dynticks_nmi_nesting() + incby, ct_dynticks());
|
||||
instrumentation_end();
|
||||
WRITE_ONCE(ct->dynticks_nmi_nesting, /* Prevent store tearing. */
|
||||
ct_dynticks_nmi_nesting() + incby);
|
||||
barrier();
|
||||
}
|
||||
|
||||
/**
|
||||
* ct_idle_enter - inform RCU that current CPU is entering idle
|
||||
*
|
||||
* Enter idle mode, in other words, -leave- the mode in which RCU
|
||||
* read-side critical sections can occur. (Though RCU read-side
|
||||
* critical sections can occur in irq handlers in idle, a possibility
|
||||
* handled by irq_enter() and irq_exit().)
|
||||
*
|
||||
* If you add or remove a call to ct_idle_enter(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void noinstr ct_idle_enter(void)
|
||||
{
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
|
||||
ct_kernel_exit(false, RCU_DYNTICKS_IDX + CONTEXT_IDLE);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ct_idle_enter);
|
||||
|
||||
/**
|
||||
* ct_idle_exit - inform RCU that current CPU is leaving idle
|
||||
*
|
||||
* Exit idle mode, in other words, -enter- the mode in which RCU
|
||||
* read-side critical sections can occur.
|
||||
*
|
||||
* If you add or remove a call to ct_idle_exit(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void noinstr ct_idle_exit(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
raw_local_irq_save(flags);
|
||||
ct_kernel_enter(false, RCU_DYNTICKS_IDX - CONTEXT_IDLE);
|
||||
raw_local_irq_restore(flags);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ct_idle_exit);
|
||||
|
||||
/**
|
||||
* ct_irq_enter - inform RCU that current CPU is entering irq away from idle
|
||||
*
|
||||
* Enter an interrupt handler, which might possibly result in exiting
|
||||
* idle mode, in other words, entering the mode in which read-side critical
|
||||
* sections can occur. The caller must have disabled interrupts.
|
||||
*
|
||||
* Note that the Linux kernel is fully capable of entering an interrupt
|
||||
* handler that it never exits, for example when doing upcalls to user mode!
|
||||
* This code assumes that the idle loop never does upcalls to user mode.
|
||||
* If your architecture's idle loop does do upcalls to user mode (or does
|
||||
* anything else that results in unbalanced calls to the irq_enter() and
|
||||
* irq_exit() functions), RCU will give you what you deserve, good and hard.
|
||||
* But very infrequently and irreproducibly.
|
||||
*
|
||||
* Use things like work queues to work around this limitation.
|
||||
*
|
||||
* You have been warned.
|
||||
*
|
||||
* If you add or remove a call to ct_irq_enter(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
noinstr void ct_irq_enter(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
ct_nmi_enter();
|
||||
}
|
||||
|
||||
/**
|
||||
* ct_irq_exit - inform RCU that current CPU is exiting irq towards idle
|
||||
*
|
||||
* Exit from an interrupt handler, which might possibly result in entering
|
||||
* idle mode, in other words, leaving the mode in which read-side critical
|
||||
* sections can occur. The caller must have disabled interrupts.
|
||||
*
|
||||
* This code assumes that the idle loop never does anything that might
|
||||
* result in unbalanced calls to irq_enter() and irq_exit(). If your
|
||||
* architecture's idle loop violates this assumption, RCU will give you what
|
||||
* you deserve, good and hard. But very infrequently and irreproducibly.
|
||||
*
|
||||
* Use things like work queues to work around this limitation.
|
||||
*
|
||||
* You have been warned.
|
||||
*
|
||||
* If you add or remove a call to ct_irq_exit(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
noinstr void ct_irq_exit(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
ct_nmi_exit();
|
||||
}
|
||||
|
||||
/*
|
||||
* Wrapper for ct_irq_enter() where interrupts are enabled.
|
||||
*
|
||||
* If you add or remove a call to ct_irq_enter_irqson(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void ct_irq_enter_irqson(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
local_irq_save(flags);
|
||||
ct_irq_enter();
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* Wrapper for ct_irq_exit() where interrupts are enabled.
|
||||
*
|
||||
* If you add or remove a call to ct_irq_exit_irqson(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void ct_irq_exit_irqson(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
local_irq_save(flags);
|
||||
ct_irq_exit();
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
#else
|
||||
static __always_inline void ct_kernel_exit(bool user, int offset) { }
|
||||
static __always_inline void ct_kernel_enter(bool user, int offset) { }
|
||||
#endif /* #ifdef CONFIG_CONTEXT_TRACKING_IDLE */
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER
|
||||
|
||||
#define CREATE_TRACE_POINTS
|
||||
#include <trace/events/context_tracking.h>
|
||||
@ -28,9 +435,6 @@
|
||||
DEFINE_STATIC_KEY_FALSE(context_tracking_key);
|
||||
EXPORT_SYMBOL_GPL(context_tracking_key);
|
||||
|
||||
DEFINE_PER_CPU(struct context_tracking, context_tracking);
|
||||
EXPORT_SYMBOL_GPL(context_tracking);
|
||||
|
||||
static noinstr bool context_tracking_recursion_enter(void)
|
||||
{
|
||||
int recursion;
|
||||
@ -51,29 +455,32 @@ static __always_inline void context_tracking_recursion_exit(void)
|
||||
}
|
||||
|
||||
/**
|
||||
* context_tracking_enter - Inform the context tracking that the CPU is going
|
||||
* enter user or guest space mode.
|
||||
* __ct_user_enter - Inform the context tracking that the CPU is going
|
||||
* to enter user or guest space mode.
|
||||
*
|
||||
* This function must be called right before we switch from the kernel
|
||||
* to user or guest space, when it's guaranteed the remaining kernel
|
||||
* instructions to execute won't use any RCU read side critical section
|
||||
* because this function sets RCU in extended quiescent state.
|
||||
*/
|
||||
void noinstr __context_tracking_enter(enum ctx_state state)
|
||||
void noinstr __ct_user_enter(enum ctx_state state)
|
||||
{
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
lockdep_assert_irqs_disabled();
|
||||
|
||||
/* Kernel threads aren't supposed to go to userspace */
|
||||
WARN_ON_ONCE(!current->mm);
|
||||
|
||||
if (!context_tracking_recursion_enter())
|
||||
return;
|
||||
|
||||
if ( __this_cpu_read(context_tracking.state) != state) {
|
||||
if (__this_cpu_read(context_tracking.active)) {
|
||||
if (__ct_state() != state) {
|
||||
if (ct->active) {
|
||||
/*
|
||||
* At this stage, only low level arch entry code remains and
|
||||
* then we'll run in userspace. We can assume there won't be
|
||||
* any RCU read-side critical section until the next call to
|
||||
* user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
|
||||
* user_exit() or ct_irq_enter(). Let's remove RCU's dependency
|
||||
* on the tick.
|
||||
*/
|
||||
if (state == CONTEXT_USER) {
|
||||
@ -82,35 +489,77 @@ void noinstr __context_tracking_enter(enum ctx_state state)
|
||||
vtime_user_enter(current);
|
||||
instrumentation_end();
|
||||
}
|
||||
rcu_user_enter();
|
||||
/*
|
||||
* Other than generic entry implementation, we may be past the last
|
||||
* rescheduling opportunity in the entry code. Trigger a self IPI
|
||||
* that will fire and reschedule once we resume in user/guest mode.
|
||||
*/
|
||||
rcu_irq_work_resched();
|
||||
|
||||
/*
|
||||
* Enter RCU idle mode right before resuming userspace. No use of RCU
|
||||
* is permitted between this call and rcu_eqs_exit(). This way the
|
||||
* CPU doesn't need to maintain the tick for RCU maintenance purposes
|
||||
* when the CPU runs in userspace.
|
||||
*/
|
||||
ct_kernel_exit(true, RCU_DYNTICKS_IDX + state);
|
||||
|
||||
/*
|
||||
* Special case if we only track user <-> kernel transitions for tickless
|
||||
* cputime accounting but we don't support RCU extended quiescent state.
|
||||
* In this we case we don't care about any concurrency/ordering.
|
||||
*/
|
||||
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE))
|
||||
atomic_set(&ct->state, state);
|
||||
} else {
|
||||
/*
|
||||
* Even if context tracking is disabled on this CPU, because it's outside
|
||||
* the full dynticks mask for example, we still have to keep track of the
|
||||
* context transitions and states to prevent inconsistency on those of
|
||||
* other CPUs.
|
||||
* If a task triggers an exception in userspace, sleep on the exception
|
||||
* handler and then migrate to another CPU, that new CPU must know where
|
||||
* the exception returns by the time we call exception_exit().
|
||||
* This information can only be provided by the previous CPU when it called
|
||||
* exception_enter().
|
||||
* OTOH we can spare the calls to vtime and RCU when context_tracking.active
|
||||
* is false because we know that CPU is not tickless.
|
||||
*/
|
||||
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) {
|
||||
/* Tracking for vtime only, no concurrent RCU EQS accounting */
|
||||
atomic_set(&ct->state, state);
|
||||
} else {
|
||||
/*
|
||||
* Tracking for vtime and RCU EQS. Make sure we don't race
|
||||
* with NMIs. OTOH we don't care about ordering here since
|
||||
* RCU only requires RCU_DYNTICKS_IDX increments to be fully
|
||||
* ordered.
|
||||
*/
|
||||
atomic_add(state, &ct->state);
|
||||
}
|
||||
}
|
||||
/*
|
||||
* Even if context tracking is disabled on this CPU, because it's outside
|
||||
* the full dynticks mask for example, we still have to keep track of the
|
||||
* context transitions and states to prevent inconsistency on those of
|
||||
* other CPUs.
|
||||
* If a task triggers an exception in userspace, sleep on the exception
|
||||
* handler and then migrate to another CPU, that new CPU must know where
|
||||
* the exception returns by the time we call exception_exit().
|
||||
* This information can only be provided by the previous CPU when it called
|
||||
* exception_enter().
|
||||
* OTOH we can spare the calls to vtime and RCU when context_tracking.active
|
||||
* is false because we know that CPU is not tickless.
|
||||
*/
|
||||
__this_cpu_write(context_tracking.state, state);
|
||||
}
|
||||
context_tracking_recursion_exit();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__context_tracking_enter);
|
||||
EXPORT_SYMBOL_GPL(__ct_user_enter);
|
||||
|
||||
void context_tracking_enter(enum ctx_state state)
|
||||
/*
|
||||
* OBSOLETE:
|
||||
* This function should be noinstr but the below local_irq_restore() is
|
||||
* unsafe because it involves illegal RCU uses through tracing and lockdep.
|
||||
* This is unlikely to be fixed as this function is obsolete. The preferred
|
||||
* way is to call __context_tracking_enter() through user_enter_irqoff()
|
||||
* or context_tracking_guest_enter(). It should be the arch entry code
|
||||
* responsibility to call into context tracking with IRQs disabled.
|
||||
*/
|
||||
void ct_user_enter(enum ctx_state state)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
/*
|
||||
* Some contexts may involve an exception occuring in an irq,
|
||||
* leading to that nesting:
|
||||
* rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
|
||||
* ct_irq_enter() rcu_eqs_exit(true) rcu_eqs_enter(true) ct_irq_exit()
|
||||
* This would mess up the dyntick_nesting count though. And rcu_irq_*()
|
||||
* helpers are enough to protect RCU uses inside the exception. So
|
||||
* just return immediately if we detect we are in an IRQ.
|
||||
@ -119,21 +568,32 @@ void context_tracking_enter(enum ctx_state state)
|
||||
return;
|
||||
|
||||
local_irq_save(flags);
|
||||
__context_tracking_enter(state);
|
||||
__ct_user_enter(state);
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
NOKPROBE_SYMBOL(context_tracking_enter);
|
||||
EXPORT_SYMBOL_GPL(context_tracking_enter);
|
||||
NOKPROBE_SYMBOL(ct_user_enter);
|
||||
EXPORT_SYMBOL_GPL(ct_user_enter);
|
||||
|
||||
void context_tracking_user_enter(void)
|
||||
/**
|
||||
* user_enter_callable() - Unfortunate ASM callable version of user_enter() for
|
||||
* archs that didn't manage to check the context tracking
|
||||
* static key from low level code.
|
||||
*
|
||||
* This OBSOLETE function should be noinstr but it unsafely calls
|
||||
* local_irq_restore(), involving illegal RCU uses through tracing and lockdep.
|
||||
* This is unlikely to be fixed as this function is obsolete. The preferred
|
||||
* way is to call user_enter_irqoff(). It should be the arch entry code
|
||||
* responsibility to call into context tracking with IRQs disabled.
|
||||
*/
|
||||
void user_enter_callable(void)
|
||||
{
|
||||
user_enter();
|
||||
}
|
||||
NOKPROBE_SYMBOL(context_tracking_user_enter);
|
||||
NOKPROBE_SYMBOL(user_enter_callable);
|
||||
|
||||
/**
|
||||
* context_tracking_exit - Inform the context tracking that the CPU is
|
||||
* exiting user or guest mode and entering the kernel.
|
||||
* __ct_user_exit - Inform the context tracking that the CPU is
|
||||
* exiting user or guest mode and entering the kernel.
|
||||
*
|
||||
* This function must be called after we entered the kernel from user or
|
||||
* guest space before any use of RCU read side critical section. This
|
||||
@ -143,32 +603,64 @@ NOKPROBE_SYMBOL(context_tracking_user_enter);
|
||||
* This call supports re-entrancy. This way it can be called from any exception
|
||||
* handler without needing to know if we came from userspace or not.
|
||||
*/
|
||||
void noinstr __context_tracking_exit(enum ctx_state state)
|
||||
void noinstr __ct_user_exit(enum ctx_state state)
|
||||
{
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
|
||||
if (!context_tracking_recursion_enter())
|
||||
return;
|
||||
|
||||
if (__this_cpu_read(context_tracking.state) == state) {
|
||||
if (__this_cpu_read(context_tracking.active)) {
|
||||
if (__ct_state() == state) {
|
||||
if (ct->active) {
|
||||
/*
|
||||
* We are going to run code that may use RCU. Inform
|
||||
* RCU core about that (ie: we may need the tick again).
|
||||
* Exit RCU idle mode while entering the kernel because it can
|
||||
* run a RCU read side critical section anytime.
|
||||
*/
|
||||
rcu_user_exit();
|
||||
ct_kernel_enter(true, RCU_DYNTICKS_IDX - state);
|
||||
if (state == CONTEXT_USER) {
|
||||
instrumentation_begin();
|
||||
vtime_user_exit(current);
|
||||
trace_user_exit(0);
|
||||
instrumentation_end();
|
||||
}
|
||||
|
||||
/*
|
||||
* Special case if we only track user <-> kernel transitions for tickless
|
||||
* cputime accounting but we don't support RCU extended quiescent state.
|
||||
* In this we case we don't care about any concurrency/ordering.
|
||||
*/
|
||||
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE))
|
||||
atomic_set(&ct->state, CONTEXT_KERNEL);
|
||||
|
||||
} else {
|
||||
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) {
|
||||
/* Tracking for vtime only, no concurrent RCU EQS accounting */
|
||||
atomic_set(&ct->state, CONTEXT_KERNEL);
|
||||
} else {
|
||||
/*
|
||||
* Tracking for vtime and RCU EQS. Make sure we don't race
|
||||
* with NMIs. OTOH we don't care about ordering here since
|
||||
* RCU only requires RCU_DYNTICKS_IDX increments to be fully
|
||||
* ordered.
|
||||
*/
|
||||
atomic_sub(state, &ct->state);
|
||||
}
|
||||
}
|
||||
__this_cpu_write(context_tracking.state, CONTEXT_KERNEL);
|
||||
}
|
||||
context_tracking_recursion_exit();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__context_tracking_exit);
|
||||
EXPORT_SYMBOL_GPL(__ct_user_exit);
|
||||
|
||||
void context_tracking_exit(enum ctx_state state)
|
||||
/*
|
||||
* OBSOLETE:
|
||||
* This function should be noinstr but the below local_irq_save() is
|
||||
* unsafe because it involves illegal RCU uses through tracing and lockdep.
|
||||
* This is unlikely to be fixed as this function is obsolete. The preferred
|
||||
* way is to call __context_tracking_exit() through user_exit_irqoff()
|
||||
* or context_tracking_guest_exit(). It should be the arch entry code
|
||||
* responsibility to call into context tracking with IRQs disabled.
|
||||
*/
|
||||
void ct_user_exit(enum ctx_state state)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
@ -176,19 +668,30 @@ void context_tracking_exit(enum ctx_state state)
|
||||
return;
|
||||
|
||||
local_irq_save(flags);
|
||||
__context_tracking_exit(state);
|
||||
__ct_user_exit(state);
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
NOKPROBE_SYMBOL(context_tracking_exit);
|
||||
EXPORT_SYMBOL_GPL(context_tracking_exit);
|
||||
NOKPROBE_SYMBOL(ct_user_exit);
|
||||
EXPORT_SYMBOL_GPL(ct_user_exit);
|
||||
|
||||
void context_tracking_user_exit(void)
|
||||
/**
|
||||
* user_exit_callable() - Unfortunate ASM callable version of user_exit() for
|
||||
* archs that didn't manage to check the context tracking
|
||||
* static key from low level code.
|
||||
*
|
||||
* This OBSOLETE function should be noinstr but it unsafely calls local_irq_save(),
|
||||
* involving illegal RCU uses through tracing and lockdep. This is unlikely
|
||||
* to be fixed as this function is obsolete. The preferred way is to call
|
||||
* user_exit_irqoff(). It should be the arch entry code responsibility to
|
||||
* call into context tracking with IRQs disabled.
|
||||
*/
|
||||
void user_exit_callable(void)
|
||||
{
|
||||
user_exit();
|
||||
}
|
||||
NOKPROBE_SYMBOL(context_tracking_user_exit);
|
||||
NOKPROBE_SYMBOL(user_exit_callable);
|
||||
|
||||
void __init context_tracking_cpu_set(int cpu)
|
||||
void __init ct_cpu_track_user(int cpu)
|
||||
{
|
||||
static __initdata bool initialized = false;
|
||||
|
||||
@ -212,12 +715,14 @@ void __init context_tracking_cpu_set(int cpu)
|
||||
initialized = true;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_FORCE
|
||||
#ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE
|
||||
void __init context_tracking_init(void)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
for_each_possible_cpu(cpu)
|
||||
context_tracking_cpu_set(cpu);
|
||||
ct_cpu_track_user(cpu);
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* #ifdef CONFIG_CONTEXT_TRACKING_USER */
|
||||
|
@ -35,11 +35,11 @@ static int cpu_pm_notify(enum cpu_pm_event event)
|
||||
* disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
|
||||
* this.
|
||||
*/
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
rcu_read_lock();
|
||||
ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
|
||||
rcu_read_unlock();
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
|
||||
return notifier_to_errno(ret);
|
||||
}
|
||||
@ -49,11 +49,11 @@ static int cpu_pm_notify_robust(enum cpu_pm_event event_up, enum cpu_pm_event ev
|
||||
unsigned long flags;
|
||||
int ret;
|
||||
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
raw_spin_lock_irqsave(&cpu_pm_notifier.lock, flags);
|
||||
ret = raw_notifier_call_chain_robust(&cpu_pm_notifier.chain, event_up, event_down, NULL);
|
||||
raw_spin_unlock_irqrestore(&cpu_pm_notifier.lock, flags);
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
|
||||
return notifier_to_errno(ret);
|
||||
}
|
||||
|
@ -321,7 +321,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
|
||||
}
|
||||
|
||||
/*
|
||||
* If this entry hit the idle task invoke rcu_irq_enter() whether
|
||||
* If this entry hit the idle task invoke ct_irq_enter() whether
|
||||
* RCU is watching or not.
|
||||
*
|
||||
* Interrupts can nest when the first interrupt invokes softirq
|
||||
@ -332,12 +332,12 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
|
||||
* not nested into another interrupt.
|
||||
*
|
||||
* Checking for rcu_is_watching() here would prevent the nesting
|
||||
* interrupt to invoke rcu_irq_enter(). If that nested interrupt is
|
||||
* interrupt to invoke ct_irq_enter(). If that nested interrupt is
|
||||
* the tick then rcu_flavor_sched_clock_irq() would wrongfully
|
||||
* assume that it is the first interrupt and eventually claim
|
||||
* quiescent state and end grace periods prematurely.
|
||||
*
|
||||
* Unconditionally invoke rcu_irq_enter() so RCU state stays
|
||||
* Unconditionally invoke ct_irq_enter() so RCU state stays
|
||||
* consistent.
|
||||
*
|
||||
* TINY_RCU does not support EQS, so let the compiler eliminate
|
||||
@ -350,7 +350,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
|
||||
* as in irqentry_enter_from_user_mode().
|
||||
*/
|
||||
lockdep_hardirqs_off(CALLER_ADDR0);
|
||||
rcu_irq_enter();
|
||||
ct_irq_enter();
|
||||
instrumentation_begin();
|
||||
trace_hardirqs_off_finish();
|
||||
instrumentation_end();
|
||||
@ -418,7 +418,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
|
||||
trace_hardirqs_on_prepare();
|
||||
lockdep_hardirqs_on_prepare();
|
||||
instrumentation_end();
|
||||
rcu_irq_exit();
|
||||
ct_irq_exit();
|
||||
lockdep_hardirqs_on(CALLER_ADDR0);
|
||||
return;
|
||||
}
|
||||
@ -436,7 +436,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
|
||||
* was not watching on entry.
|
||||
*/
|
||||
if (state.exit_rcu)
|
||||
rcu_irq_exit();
|
||||
ct_irq_exit();
|
||||
}
|
||||
}
|
||||
|
||||
@ -449,7 +449,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
|
||||
__nmi_enter();
|
||||
lockdep_hardirqs_off(CALLER_ADDR0);
|
||||
lockdep_hardirq_enter();
|
||||
rcu_nmi_enter();
|
||||
ct_nmi_enter();
|
||||
|
||||
instrumentation_begin();
|
||||
trace_hardirqs_off_finish();
|
||||
@ -469,7 +469,7 @@ void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
|
||||
}
|
||||
instrumentation_end();
|
||||
|
||||
rcu_nmi_exit();
|
||||
ct_nmi_exit();
|
||||
lockdep_hardirq_exit();
|
||||
if (irq_state.lockdep)
|
||||
lockdep_hardirqs_on(CALLER_ADDR0);
|
||||
|
@ -114,7 +114,7 @@ int kernel_text_address(unsigned long addr)
|
||||
|
||||
/* Treat this like an NMI as it can happen anywhere */
|
||||
if (no_rcu)
|
||||
rcu_nmi_enter();
|
||||
ct_nmi_enter();
|
||||
|
||||
if (is_module_text_address(addr))
|
||||
goto out;
|
||||
@ -127,7 +127,7 @@ int kernel_text_address(unsigned long addr)
|
||||
ret = 0;
|
||||
out:
|
||||
if (no_rcu)
|
||||
rcu_nmi_exit();
|
||||
ct_nmi_exit();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -1814,6 +1814,7 @@ static inline void rcu_copy_process(struct task_struct *p)
|
||||
p->trc_reader_nesting = 0;
|
||||
p->trc_reader_special.s = 0;
|
||||
INIT_LIST_HEAD(&p->trc_holdout_list);
|
||||
INIT_LIST_HEAD(&p->trc_blkd_node);
|
||||
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
|
||||
}
|
||||
|
||||
|
@ -6571,7 +6571,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
|
||||
|
||||
/*
|
||||
* If a CPU is in the RCU-free window in idle (ie: in the section
|
||||
* between rcu_idle_enter() and rcu_idle_exit(), then RCU
|
||||
* between ct_idle_enter() and ct_idle_exit(), then RCU
|
||||
* considers that CPU to be in an "extended quiescent state",
|
||||
* which means that RCU will be completely ignoring that CPU.
|
||||
* Therefore, rcu_read_lock() and friends have absolutely no
|
||||
|
@ -8,6 +8,8 @@ menu "RCU Subsystem"
|
||||
config TREE_RCU
|
||||
bool
|
||||
default y if SMP
|
||||
# Dynticks-idle tracking
|
||||
select CONTEXT_TRACKING_IDLE
|
||||
help
|
||||
This option selects the RCU implementation that is
|
||||
designed for very large SMP system with hundreds or
|
||||
@ -262,6 +264,35 @@ config RCU_NOCB_CPU
|
||||
Say Y here if you need reduced OS jitter, despite added overhead.
|
||||
Say N here if you are unsure.
|
||||
|
||||
config RCU_NOCB_CPU_DEFAULT_ALL
|
||||
bool "Offload RCU callback processing from all CPUs by default"
|
||||
depends on RCU_NOCB_CPU
|
||||
default n
|
||||
help
|
||||
Use this option to offload callback processing from all CPUs
|
||||
by default, in the absence of the rcu_nocbs or nohz_full boot
|
||||
parameter. This also avoids the need to use any boot parameters
|
||||
to achieve the effect of offloading all CPUs on boot.
|
||||
|
||||
Say Y here if you want offload all CPUs by default on boot.
|
||||
Say N here if you are unsure.
|
||||
|
||||
config RCU_NOCB_CPU_CB_BOOST
|
||||
bool "Offload RCU callback from real-time kthread"
|
||||
depends on RCU_NOCB_CPU && RCU_BOOST
|
||||
default y if PREEMPT_RT
|
||||
help
|
||||
Use this option to invoke offloaded callbacks as SCHED_FIFO
|
||||
to avoid starvation by heavy SCHED_OTHER background load.
|
||||
Of course, running as SCHED_FIFO during callback floods will
|
||||
cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
|
||||
of milliseconds or more. Therefore, when enabling this option,
|
||||
it is your responsibility to ensure that latency-sensitive
|
||||
tasks either run with higher priority or run on some other CPU.
|
||||
|
||||
Say Y here if you want to set RT priority for offloading kthreads.
|
||||
Say N here if you are building a !PREEMPT_RT kernel and are unsure.
|
||||
|
||||
config TASKS_TRACE_RCU_READ_MB
|
||||
bool "Tasks Trace RCU readers use memory barriers in user and idle"
|
||||
depends on RCU_EXPERT && TASKS_TRACE_RCU
|
||||
|
@ -121,7 +121,7 @@ config RCU_EQS_DEBUG
|
||||
|
||||
config RCU_STRICT_GRACE_PERIOD
|
||||
bool "Provide debug RCU implementation with short grace periods"
|
||||
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4
|
||||
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
|
||||
default n
|
||||
select PREEMPT_COUNT if PREEMPT=n
|
||||
help
|
||||
|
@ -12,10 +12,6 @@
|
||||
|
||||
#include <trace/events/rcu.h>
|
||||
|
||||
/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */
|
||||
#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1)
|
||||
|
||||
|
||||
/*
|
||||
* Grace-period counter management.
|
||||
*/
|
||||
@ -23,6 +19,9 @@
|
||||
#define RCU_SEQ_CTR_SHIFT 2
|
||||
#define RCU_SEQ_STATE_MASK ((1 << RCU_SEQ_CTR_SHIFT) - 1)
|
||||
|
||||
/* Low-order bit definition for polled grace-period APIs. */
|
||||
#define RCU_GET_STATE_COMPLETED 0x1
|
||||
|
||||
extern int sysctl_sched_rt_runtime;
|
||||
|
||||
/*
|
||||
@ -119,6 +118,18 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
|
||||
return ULONG_CMP_GE(READ_ONCE(*sp), s);
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a snapshot from rcu_seq_snap(), determine whether or not a
|
||||
* full update-side operation has occurred, but do not allow the
|
||||
* (ULONG_MAX / 2) safety-factor/guard-band.
|
||||
*/
|
||||
static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
|
||||
{
|
||||
unsigned long cur_s = READ_ONCE(*sp);
|
||||
|
||||
return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
|
||||
}
|
||||
|
||||
/*
|
||||
* Has a grace period completed since the time the old gp_seq was collected?
|
||||
*/
|
||||
|
@ -419,6 +419,7 @@ rcu_scale_writer(void *arg)
|
||||
VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
|
||||
WARN_ON(!wdpp);
|
||||
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
|
||||
current->flags |= PF_NO_SETAFFINITY;
|
||||
sched_set_fifo_low(current);
|
||||
|
||||
if (holdoff)
|
||||
|
@ -75,62 +75,47 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com> and Josh Triplett <josh@
|
||||
|
||||
torture_param(int, extendables, RCUTORTURE_MAX_EXTEND,
|
||||
"Extend readers by disabling bh (1), irqs (2), or preempt (4)");
|
||||
torture_param(int, fqs_duration, 0,
|
||||
"Duration of fqs bursts (us), 0 to disable");
|
||||
torture_param(int, fqs_duration, 0, "Duration of fqs bursts (us), 0 to disable");
|
||||
torture_param(int, fqs_holdoff, 0, "Holdoff time within fqs bursts (us)");
|
||||
torture_param(int, fqs_stutter, 3, "Wait time between fqs bursts (s)");
|
||||
torture_param(int, fwd_progress, 1, "Test grace-period forward progress");
|
||||
torture_param(int, fwd_progress, 1, "Number of grace-period forward progress tasks (0 to disable)");
|
||||
torture_param(int, fwd_progress_div, 4, "Fraction of CPU stall to wait");
|
||||
torture_param(int, fwd_progress_holdoff, 60,
|
||||
"Time between forward-progress tests (s)");
|
||||
torture_param(bool, fwd_progress_need_resched, 1,
|
||||
"Hide cond_resched() behind need_resched()");
|
||||
torture_param(int, fwd_progress_holdoff, 60, "Time between forward-progress tests (s)");
|
||||
torture_param(bool, fwd_progress_need_resched, 1, "Hide cond_resched() behind need_resched()");
|
||||
torture_param(bool, gp_cond, false, "Use conditional/async GP wait primitives");
|
||||
torture_param(bool, gp_cond_exp, false, "Use conditional/async expedited GP wait primitives");
|
||||
torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
|
||||
torture_param(bool, gp_normal, false,
|
||||
"Use normal (non-expedited) GP wait primitives");
|
||||
torture_param(bool, gp_normal, false, "Use normal (non-expedited) GP wait primitives");
|
||||
torture_param(bool, gp_poll, false, "Use polling GP wait primitives");
|
||||
torture_param(bool, gp_poll_exp, false, "Use polling expedited GP wait primitives");
|
||||
torture_param(bool, gp_sync, false, "Use synchronous GP wait primitives");
|
||||
torture_param(int, irqreader, 1, "Allow RCU readers from irq handlers");
|
||||
torture_param(int, leakpointer, 0, "Leak pointer dereferences from readers");
|
||||
torture_param(int, n_barrier_cbs, 0,
|
||||
"# of callbacks/kthreads for barrier testing");
|
||||
torture_param(int, n_barrier_cbs, 0, "# of callbacks/kthreads for barrier testing");
|
||||
torture_param(int, nfakewriters, 4, "Number of RCU fake writer threads");
|
||||
torture_param(int, nreaders, -1, "Number of RCU reader threads");
|
||||
torture_param(int, object_debug, 0,
|
||||
"Enable debug-object double call_rcu() testing");
|
||||
torture_param(int, object_debug, 0, "Enable debug-object double call_rcu() testing");
|
||||
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
|
||||
torture_param(int, onoff_interval, 0,
|
||||
"Time between CPU hotplugs (jiffies), 0=disable");
|
||||
torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (jiffies), 0=disable");
|
||||
torture_param(int, nocbs_nthreads, 0, "Number of NOCB toggle threads, 0 to disable");
|
||||
torture_param(int, nocbs_toggle, 1000, "Time between toggling nocb state (ms)");
|
||||
torture_param(int, read_exit_delay, 13,
|
||||
"Delay between read-then-exit episodes (s)");
|
||||
torture_param(int, read_exit_burst, 16,
|
||||
"# of read-then-exit bursts per episode, zero to disable");
|
||||
torture_param(int, read_exit_delay, 13, "Delay between read-then-exit episodes (s)");
|
||||
torture_param(int, read_exit_burst, 16, "# of read-then-exit bursts per episode, zero to disable");
|
||||
torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles");
|
||||
torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable.");
|
||||
torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable.");
|
||||
torture_param(int, stall_cpu_holdoff, 10,
|
||||
"Time to wait before starting stall (s).");
|
||||
torture_param(bool, stall_no_softlockup, false,
|
||||
"Avoid softlockup warning during cpu stall.");
|
||||
torture_param(int, stall_cpu_holdoff, 10, "Time to wait before starting stall (s).");
|
||||
torture_param(bool, stall_no_softlockup, false, "Avoid softlockup warning during cpu stall.");
|
||||
torture_param(int, stall_cpu_irqsoff, 0, "Disable interrupts while stalling.");
|
||||
torture_param(int, stall_cpu_block, 0, "Sleep while stalling.");
|
||||
torture_param(int, stall_gp_kthread, 0,
|
||||
"Grace-period kthread stall duration (s).");
|
||||
torture_param(int, stat_interval, 60,
|
||||
"Number of seconds between stats printk()s");
|
||||
torture_param(int, stall_gp_kthread, 0, "Grace-period kthread stall duration (s).");
|
||||
torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s");
|
||||
torture_param(int, stutter, 5, "Number of seconds to run/halt test");
|
||||
torture_param(int, test_boost, 1, "Test RCU prio boost: 0=no, 1=maybe, 2=yes.");
|
||||
torture_param(int, test_boost_duration, 4,
|
||||
"Duration of each boost test, seconds.");
|
||||
torture_param(int, test_boost_interval, 7,
|
||||
"Interval between boost tests, seconds.");
|
||||
torture_param(bool, test_no_idle_hz, true,
|
||||
"Test support for tickless idle CPUs");
|
||||
torture_param(int, verbose, 1,
|
||||
"Enable verbose debugging printk()s");
|
||||
torture_param(int, test_boost_duration, 4, "Duration of each boost test, seconds.");
|
||||
torture_param(int, test_boost_interval, 7, "Interval between boost tests, seconds.");
|
||||
torture_param(bool, test_no_idle_hz, true, "Test support for tickless idle CPUs");
|
||||
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
|
||||
|
||||
static char *torture_type = "rcu";
|
||||
module_param(torture_type, charp, 0444);
|
||||
@ -209,12 +194,16 @@ static int rcu_torture_writer_state;
|
||||
#define RTWS_DEF_FREE 3
|
||||
#define RTWS_EXP_SYNC 4
|
||||
#define RTWS_COND_GET 5
|
||||
#define RTWS_COND_SYNC 6
|
||||
#define RTWS_POLL_GET 7
|
||||
#define RTWS_POLL_WAIT 8
|
||||
#define RTWS_SYNC 9
|
||||
#define RTWS_STUTTER 10
|
||||
#define RTWS_STOPPING 11
|
||||
#define RTWS_COND_GET_EXP 6
|
||||
#define RTWS_COND_SYNC 7
|
||||
#define RTWS_COND_SYNC_EXP 8
|
||||
#define RTWS_POLL_GET 9
|
||||
#define RTWS_POLL_GET_EXP 10
|
||||
#define RTWS_POLL_WAIT 11
|
||||
#define RTWS_POLL_WAIT_EXP 12
|
||||
#define RTWS_SYNC 13
|
||||
#define RTWS_STUTTER 14
|
||||
#define RTWS_STOPPING 15
|
||||
static const char * const rcu_torture_writer_state_names[] = {
|
||||
"RTWS_FIXED_DELAY",
|
||||
"RTWS_DELAY",
|
||||
@ -222,9 +211,13 @@ static const char * const rcu_torture_writer_state_names[] = {
|
||||
"RTWS_DEF_FREE",
|
||||
"RTWS_EXP_SYNC",
|
||||
"RTWS_COND_GET",
|
||||
"RTWS_COND_GET_EXP",
|
||||
"RTWS_COND_SYNC",
|
||||
"RTWS_COND_SYNC_EXP",
|
||||
"RTWS_POLL_GET",
|
||||
"RTWS_POLL_GET_EXP",
|
||||
"RTWS_POLL_WAIT",
|
||||
"RTWS_POLL_WAIT_EXP",
|
||||
"RTWS_SYNC",
|
||||
"RTWS_STUTTER",
|
||||
"RTWS_STOPPING",
|
||||
@ -337,7 +330,12 @@ struct rcu_torture_ops {
|
||||
void (*deferred_free)(struct rcu_torture *p);
|
||||
void (*sync)(void);
|
||||
void (*exp_sync)(void);
|
||||
unsigned long (*get_gp_state_exp)(void);
|
||||
unsigned long (*start_gp_poll_exp)(void);
|
||||
bool (*poll_gp_state_exp)(unsigned long oldstate);
|
||||
void (*cond_sync_exp)(unsigned long oldstate);
|
||||
unsigned long (*get_gp_state)(void);
|
||||
unsigned long (*get_gp_completed)(void);
|
||||
unsigned long (*start_gp_poll)(void);
|
||||
bool (*poll_gp_state)(unsigned long oldstate);
|
||||
void (*cond_sync)(unsigned long oldstate);
|
||||
@ -504,9 +502,14 @@ static struct rcu_torture_ops rcu_ops = {
|
||||
.sync = synchronize_rcu,
|
||||
.exp_sync = synchronize_rcu_expedited,
|
||||
.get_gp_state = get_state_synchronize_rcu,
|
||||
.get_gp_completed = get_completed_synchronize_rcu,
|
||||
.start_gp_poll = start_poll_synchronize_rcu,
|
||||
.poll_gp_state = poll_state_synchronize_rcu,
|
||||
.cond_sync = cond_synchronize_rcu,
|
||||
.get_gp_state_exp = get_state_synchronize_rcu,
|
||||
.start_gp_poll_exp = start_poll_synchronize_rcu_expedited,
|
||||
.poll_gp_state_exp = poll_state_synchronize_rcu,
|
||||
.cond_sync_exp = cond_synchronize_rcu_expedited,
|
||||
.call = call_rcu,
|
||||
.cb_barrier = rcu_barrier,
|
||||
.fqs = rcu_force_quiescent_state,
|
||||
@ -1136,9 +1139,8 @@ rcu_torture_fqs(void *arg)
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Used by writers to randomly choose from the available grace-period
|
||||
// primitives. The only purpose of the initialization is to size the array.
|
||||
static int synctype[] = { RTWS_DEF_FREE, RTWS_EXP_SYNC, RTWS_COND_GET, RTWS_POLL_GET, RTWS_SYNC };
|
||||
// Used by writers to randomly choose from the available grace-period primitives.
|
||||
static int synctype[ARRAY_SIZE(rcu_torture_writer_state_names)] = { };
|
||||
static int nsynctypes;
|
||||
|
||||
/*
|
||||
@ -1146,18 +1148,27 @@ static int nsynctypes;
|
||||
*/
|
||||
static void rcu_torture_write_types(void)
|
||||
{
|
||||
bool gp_cond1 = gp_cond, gp_exp1 = gp_exp, gp_normal1 = gp_normal;
|
||||
bool gp_poll1 = gp_poll, gp_sync1 = gp_sync;
|
||||
bool gp_cond1 = gp_cond, gp_cond_exp1 = gp_cond_exp, gp_exp1 = gp_exp;
|
||||
bool gp_poll_exp1 = gp_poll_exp, gp_normal1 = gp_normal, gp_poll1 = gp_poll;
|
||||
bool gp_sync1 = gp_sync;
|
||||
|
||||
/* Initialize synctype[] array. If none set, take default. */
|
||||
if (!gp_cond1 && !gp_exp1 && !gp_normal1 && !gp_poll1 && !gp_sync1)
|
||||
gp_cond1 = gp_exp1 = gp_normal1 = gp_poll1 = gp_sync1 = true;
|
||||
if (!gp_cond1 && !gp_cond_exp1 && !gp_exp1 && !gp_poll_exp &&
|
||||
!gp_normal1 && !gp_poll1 && !gp_sync1)
|
||||
gp_cond1 = gp_cond_exp1 = gp_exp1 = gp_poll_exp1 =
|
||||
gp_normal1 = gp_poll1 = gp_sync1 = true;
|
||||
if (gp_cond1 && cur_ops->get_gp_state && cur_ops->cond_sync) {
|
||||
synctype[nsynctypes++] = RTWS_COND_GET;
|
||||
pr_info("%s: Testing conditional GPs.\n", __func__);
|
||||
} else if (gp_cond && (!cur_ops->get_gp_state || !cur_ops->cond_sync)) {
|
||||
pr_alert("%s: gp_cond without primitives.\n", __func__);
|
||||
}
|
||||
if (gp_cond_exp1 && cur_ops->get_gp_state_exp && cur_ops->cond_sync_exp) {
|
||||
synctype[nsynctypes++] = RTWS_COND_GET_EXP;
|
||||
pr_info("%s: Testing conditional expedited GPs.\n", __func__);
|
||||
} else if (gp_cond_exp && (!cur_ops->get_gp_state_exp || !cur_ops->cond_sync_exp)) {
|
||||
pr_alert("%s: gp_cond_exp without primitives.\n", __func__);
|
||||
}
|
||||
if (gp_exp1 && cur_ops->exp_sync) {
|
||||
synctype[nsynctypes++] = RTWS_EXP_SYNC;
|
||||
pr_info("%s: Testing expedited GPs.\n", __func__);
|
||||
@ -1176,6 +1187,12 @@ static void rcu_torture_write_types(void)
|
||||
} else if (gp_poll && (!cur_ops->start_gp_poll || !cur_ops->poll_gp_state)) {
|
||||
pr_alert("%s: gp_poll without primitives.\n", __func__);
|
||||
}
|
||||
if (gp_poll_exp1 && cur_ops->start_gp_poll_exp && cur_ops->poll_gp_state_exp) {
|
||||
synctype[nsynctypes++] = RTWS_POLL_GET_EXP;
|
||||
pr_info("%s: Testing polling expedited GPs.\n", __func__);
|
||||
} else if (gp_poll_exp && (!cur_ops->start_gp_poll_exp || !cur_ops->poll_gp_state_exp)) {
|
||||
pr_alert("%s: gp_poll_exp without primitives.\n", __func__);
|
||||
}
|
||||
if (gp_sync1 && cur_ops->sync) {
|
||||
synctype[nsynctypes++] = RTWS_SYNC;
|
||||
pr_info("%s: Testing normal GPs.\n", __func__);
|
||||
@ -1254,6 +1271,10 @@ rcu_torture_writer(void *arg)
|
||||
rcu_torture_writer_state_getname(),
|
||||
rcu_torture_writer_state,
|
||||
cookie, cur_ops->get_gp_state());
|
||||
if (cur_ops->get_gp_completed) {
|
||||
cookie = cur_ops->get_gp_completed();
|
||||
WARN_ON_ONCE(!cur_ops->poll_gp_state(cookie));
|
||||
}
|
||||
cur_ops->readunlock(idx);
|
||||
}
|
||||
switch (synctype[torture_random(&rand) % nsynctypes]) {
|
||||
@ -1263,7 +1284,12 @@ rcu_torture_writer(void *arg)
|
||||
break;
|
||||
case RTWS_EXP_SYNC:
|
||||
rcu_torture_writer_state = RTWS_EXP_SYNC;
|
||||
if (cur_ops->get_gp_state && cur_ops->poll_gp_state)
|
||||
cookie = cur_ops->get_gp_state();
|
||||
cur_ops->exp_sync();
|
||||
cur_ops->exp_sync();
|
||||
if (cur_ops->get_gp_state && cur_ops->poll_gp_state)
|
||||
WARN_ON_ONCE(!cur_ops->poll_gp_state(cookie));
|
||||
rcu_torture_pipe_update(old_rp);
|
||||
break;
|
||||
case RTWS_COND_GET:
|
||||
@ -1274,6 +1300,14 @@ rcu_torture_writer(void *arg)
|
||||
cur_ops->cond_sync(gp_snap);
|
||||
rcu_torture_pipe_update(old_rp);
|
||||
break;
|
||||
case RTWS_COND_GET_EXP:
|
||||
rcu_torture_writer_state = RTWS_COND_GET_EXP;
|
||||
gp_snap = cur_ops->get_gp_state_exp();
|
||||
torture_hrtimeout_jiffies(torture_random(&rand) % 16, &rand);
|
||||
rcu_torture_writer_state = RTWS_COND_SYNC_EXP;
|
||||
cur_ops->cond_sync_exp(gp_snap);
|
||||
rcu_torture_pipe_update(old_rp);
|
||||
break;
|
||||
case RTWS_POLL_GET:
|
||||
rcu_torture_writer_state = RTWS_POLL_GET;
|
||||
gp_snap = cur_ops->start_gp_poll();
|
||||
@ -1283,9 +1317,23 @@ rcu_torture_writer(void *arg)
|
||||
&rand);
|
||||
rcu_torture_pipe_update(old_rp);
|
||||
break;
|
||||
case RTWS_POLL_GET_EXP:
|
||||
rcu_torture_writer_state = RTWS_POLL_GET_EXP;
|
||||
gp_snap = cur_ops->start_gp_poll_exp();
|
||||
rcu_torture_writer_state = RTWS_POLL_WAIT_EXP;
|
||||
while (!cur_ops->poll_gp_state_exp(gp_snap))
|
||||
torture_hrtimeout_jiffies(torture_random(&rand) % 16,
|
||||
&rand);
|
||||
rcu_torture_pipe_update(old_rp);
|
||||
break;
|
||||
case RTWS_SYNC:
|
||||
rcu_torture_writer_state = RTWS_SYNC;
|
||||
if (cur_ops->get_gp_state && cur_ops->poll_gp_state)
|
||||
cookie = cur_ops->get_gp_state();
|
||||
cur_ops->sync();
|
||||
cur_ops->sync();
|
||||
if (cur_ops->get_gp_state && cur_ops->poll_gp_state)
|
||||
WARN_ON_ONCE(!cur_ops->poll_gp_state(cookie));
|
||||
rcu_torture_pipe_update(old_rp);
|
||||
break;
|
||||
default:
|
||||
@ -1321,8 +1369,9 @@ rcu_torture_writer(void *arg)
|
||||
if (list_empty(&rcu_tortures[i].rtort_free) &&
|
||||
rcu_access_pointer(rcu_torture_current) !=
|
||||
&rcu_tortures[i]) {
|
||||
rcu_ftrace_dump(DUMP_ALL);
|
||||
tracing_off();
|
||||
WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count);
|
||||
rcu_ftrace_dump(DUMP_ALL);
|
||||
}
|
||||
if (stutter_waited)
|
||||
sched_set_normal(current, oldnice);
|
||||
@ -1384,6 +1433,11 @@ rcu_torture_fakewriter(void *arg)
|
||||
torture_hrtimeout_jiffies(torture_random(&rand) % 16, &rand);
|
||||
cur_ops->cond_sync(gp_snap);
|
||||
break;
|
||||
case RTWS_COND_GET_EXP:
|
||||
gp_snap = cur_ops->get_gp_state_exp();
|
||||
torture_hrtimeout_jiffies(torture_random(&rand) % 16, &rand);
|
||||
cur_ops->cond_sync_exp(gp_snap);
|
||||
break;
|
||||
case RTWS_POLL_GET:
|
||||
gp_snap = cur_ops->start_gp_poll();
|
||||
while (!cur_ops->poll_gp_state(gp_snap)) {
|
||||
@ -1391,6 +1445,13 @@ rcu_torture_fakewriter(void *arg)
|
||||
&rand);
|
||||
}
|
||||
break;
|
||||
case RTWS_POLL_GET_EXP:
|
||||
gp_snap = cur_ops->start_gp_poll_exp();
|
||||
while (!cur_ops->poll_gp_state_exp(gp_snap)) {
|
||||
torture_hrtimeout_jiffies(torture_random(&rand) % 16,
|
||||
&rand);
|
||||
}
|
||||
break;
|
||||
case RTWS_SYNC:
|
||||
cur_ops->sync();
|
||||
break;
|
||||
@ -1868,7 +1929,7 @@ rcu_torture_stats_print(void)
|
||||
batchsummary[i] += READ_ONCE(per_cpu(rcu_torture_batch, cpu)[i]);
|
||||
}
|
||||
}
|
||||
for (i = RCU_TORTURE_PIPE_LEN - 1; i >= 0; i--) {
|
||||
for (i = RCU_TORTURE_PIPE_LEN; i >= 0; i--) {
|
||||
if (pipesummary[i] != 0)
|
||||
break;
|
||||
}
|
||||
@ -1990,7 +2051,13 @@ static void rcu_torture_mem_dump_obj(void)
|
||||
static int z;
|
||||
|
||||
kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL);
|
||||
if (WARN_ON_ONCE(!kcp))
|
||||
return;
|
||||
rhp = kmem_cache_alloc(kcp, GFP_KERNEL);
|
||||
if (WARN_ON_ONCE(!rhp)) {
|
||||
kmem_cache_destroy(kcp);
|
||||
return;
|
||||
}
|
||||
pr_alert("mem_dump_obj() slab test: rcu_torture_stats = %px, &rhp = %px, rhp = %px, &z = %px\n", stats_task, &rhp, rhp, &z);
|
||||
pr_alert("mem_dump_obj(ZERO_SIZE_PTR):");
|
||||
mem_dump_obj(ZERO_SIZE_PTR);
|
||||
@ -2007,6 +2074,8 @@ static void rcu_torture_mem_dump_obj(void)
|
||||
kmem_cache_free(kcp, rhp);
|
||||
kmem_cache_destroy(kcp);
|
||||
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
|
||||
if (WARN_ON_ONCE(!rhp))
|
||||
return;
|
||||
pr_alert("mem_dump_obj() kmalloc test: rcu_torture_stats = %px, &rhp = %px, rhp = %px\n", stats_task, &rhp, rhp);
|
||||
pr_alert("mem_dump_obj(kmalloc %px):", rhp);
|
||||
mem_dump_obj(rhp);
|
||||
@ -2014,6 +2083,8 @@ static void rcu_torture_mem_dump_obj(void)
|
||||
mem_dump_obj(&rhp->func);
|
||||
kfree(rhp);
|
||||
rhp = vmalloc(4096);
|
||||
if (WARN_ON_ONCE(!rhp))
|
||||
return;
|
||||
pr_alert("mem_dump_obj() vmalloc test: rcu_torture_stats = %px, &rhp = %px, rhp = %px\n", stats_task, &rhp, rhp);
|
||||
pr_alert("mem_dump_obj(vmalloc %px):", rhp);
|
||||
mem_dump_obj(rhp);
|
||||
@ -2075,6 +2146,19 @@ static int rcutorture_booster_init(unsigned int cpu)
|
||||
if (boost_tasks[cpu] != NULL)
|
||||
return 0; /* Already created, nothing more to do. */
|
||||
|
||||
// Testing RCU priority boosting requires rcutorture do
|
||||
// some serious abuse. Counter this by running ksoftirqd
|
||||
// at higher priority.
|
||||
if (IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)) {
|
||||
struct sched_param sp;
|
||||
struct task_struct *t;
|
||||
|
||||
t = per_cpu(ksoftirqd, cpu);
|
||||
WARN_ON_ONCE(!t);
|
||||
sp.sched_priority = 2;
|
||||
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
|
||||
}
|
||||
|
||||
/* Don't allow time recalculation while creating a new task. */
|
||||
mutex_lock(&boost_mutex);
|
||||
rcu_torture_disable_rt_throttle();
|
||||
@ -2873,7 +2957,6 @@ static int rcu_torture_read_exit_child(void *trsp_in)
|
||||
// Parent kthread which creates and destroys read-exit child kthreads.
|
||||
static int rcu_torture_read_exit(void *unused)
|
||||
{
|
||||
int count = 0;
|
||||
bool errexit = false;
|
||||
int i;
|
||||
struct task_struct *tsp;
|
||||
@ -2885,34 +2968,28 @@ static int rcu_torture_read_exit(void *unused)
|
||||
|
||||
// Each pass through this loop does one read-exit episode.
|
||||
do {
|
||||
if (++count > read_exit_burst) {
|
||||
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: End of episode");
|
||||
rcu_barrier(); // Wait for task_struct free, avoid OOM.
|
||||
for (i = 0; i < read_exit_delay; i++) {
|
||||
schedule_timeout_uninterruptible(HZ);
|
||||
if (READ_ONCE(read_exit_child_stop))
|
||||
break;
|
||||
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of episode");
|
||||
for (i = 0; i < read_exit_burst; i++) {
|
||||
if (READ_ONCE(read_exit_child_stop))
|
||||
break;
|
||||
stutter_wait("rcu_torture_read_exit");
|
||||
// Spawn child.
|
||||
tsp = kthread_run(rcu_torture_read_exit_child,
|
||||
&trs, "%s", "rcu_torture_read_exit_child");
|
||||
if (IS_ERR(tsp)) {
|
||||
TOROUT_ERRSTRING("out of memory");
|
||||
errexit = true;
|
||||
break;
|
||||
}
|
||||
if (!READ_ONCE(read_exit_child_stop))
|
||||
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of episode");
|
||||
count = 0;
|
||||
cond_resched();
|
||||
kthread_stop(tsp);
|
||||
n_read_exits++;
|
||||
}
|
||||
if (READ_ONCE(read_exit_child_stop))
|
||||
break;
|
||||
// Spawn child.
|
||||
tsp = kthread_run(rcu_torture_read_exit_child,
|
||||
&trs, "%s",
|
||||
"rcu_torture_read_exit_child");
|
||||
if (IS_ERR(tsp)) {
|
||||
TOROUT_ERRSTRING("out of memory");
|
||||
errexit = true;
|
||||
tsp = NULL;
|
||||
break;
|
||||
}
|
||||
cond_resched();
|
||||
kthread_stop(tsp);
|
||||
n_read_exits ++;
|
||||
stutter_wait("rcu_torture_read_exit");
|
||||
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: End of episode");
|
||||
rcu_barrier(); // Wait for task_struct free, avoid OOM.
|
||||
i = 0;
|
||||
for (; !errexit && !READ_ONCE(read_exit_child_stop) && i < read_exit_delay; i++)
|
||||
schedule_timeout_uninterruptible(HZ);
|
||||
} while (!errexit && !READ_ONCE(read_exit_child_stop));
|
||||
|
||||
// Clean up and exit.
|
||||
@ -3122,6 +3199,7 @@ static void rcu_test_debug_objects(void)
|
||||
pr_alert("%s: WARN: Duplicate call_rcu() test complete.\n", KBUILD_MODNAME);
|
||||
destroy_rcu_head_on_stack(&rh1);
|
||||
destroy_rcu_head_on_stack(&rh2);
|
||||
kfree(rhp);
|
||||
#else /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
||||
pr_alert("%s: !CONFIG_DEBUG_OBJECTS_RCU_HEAD, not testing duplicate call_rcu()\n", KBUILD_MODNAME);
|
||||
#endif /* #else #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
||||
@ -3329,21 +3407,6 @@ rcu_torture_init(void)
|
||||
rcutor_hp = firsterr;
|
||||
if (torture_init_error(firsterr))
|
||||
goto unwind;
|
||||
|
||||
// Testing RCU priority boosting requires rcutorture do
|
||||
// some serious abuse. Counter this by running ksoftirqd
|
||||
// at higher priority.
|
||||
if (IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)) {
|
||||
for_each_online_cpu(cpu) {
|
||||
struct sched_param sp;
|
||||
struct task_struct *t;
|
||||
|
||||
t = per_cpu(ksoftirqd, cpu);
|
||||
WARN_ON_ONCE(!t);
|
||||
sp.sched_priority = 2;
|
||||
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
|
||||
}
|
||||
}
|
||||
}
|
||||
shutdown_jiffies = jiffies + shutdown_secs * HZ;
|
||||
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
|
||||
|
@ -385,7 +385,7 @@ static struct ref_scale_ops rwsem_ops = {
|
||||
};
|
||||
|
||||
// Definitions for global spinlock
|
||||
static DEFINE_SPINLOCK(test_lock);
|
||||
static DEFINE_RAW_SPINLOCK(test_lock);
|
||||
|
||||
static void ref_lock_section(const int nloops)
|
||||
{
|
||||
@ -393,8 +393,8 @@ static void ref_lock_section(const int nloops)
|
||||
|
||||
preempt_disable();
|
||||
for (i = nloops; i >= 0; i--) {
|
||||
spin_lock(&test_lock);
|
||||
spin_unlock(&test_lock);
|
||||
raw_spin_lock(&test_lock);
|
||||
raw_spin_unlock(&test_lock);
|
||||
}
|
||||
preempt_enable();
|
||||
}
|
||||
@ -405,9 +405,9 @@ static void ref_lock_delay_section(const int nloops, const int udl, const int nd
|
||||
|
||||
preempt_disable();
|
||||
for (i = nloops; i >= 0; i--) {
|
||||
spin_lock(&test_lock);
|
||||
raw_spin_lock(&test_lock);
|
||||
un_delay(udl, ndl);
|
||||
spin_unlock(&test_lock);
|
||||
raw_spin_unlock(&test_lock);
|
||||
}
|
||||
preempt_enable();
|
||||
}
|
||||
@ -427,8 +427,8 @@ static void ref_lock_irq_section(const int nloops)
|
||||
|
||||
preempt_disable();
|
||||
for (i = nloops; i >= 0; i--) {
|
||||
spin_lock_irqsave(&test_lock, flags);
|
||||
spin_unlock_irqrestore(&test_lock, flags);
|
||||
raw_spin_lock_irqsave(&test_lock, flags);
|
||||
raw_spin_unlock_irqrestore(&test_lock, flags);
|
||||
}
|
||||
preempt_enable();
|
||||
}
|
||||
@ -440,9 +440,9 @@ static void ref_lock_irq_delay_section(const int nloops, const int udl, const in
|
||||
|
||||
preempt_disable();
|
||||
for (i = nloops; i >= 0; i--) {
|
||||
spin_lock_irqsave(&test_lock, flags);
|
||||
raw_spin_lock_irqsave(&test_lock, flags);
|
||||
un_delay(udl, ndl);
|
||||
spin_unlock_irqrestore(&test_lock, flags);
|
||||
raw_spin_unlock_irqrestore(&test_lock, flags);
|
||||
}
|
||||
preempt_enable();
|
||||
}
|
||||
|
@ -14,7 +14,7 @@
|
||||
|
||||
struct rcu_tasks;
|
||||
typedef void (*rcu_tasks_gp_func_t)(struct rcu_tasks *rtp);
|
||||
typedef void (*pregp_func_t)(void);
|
||||
typedef void (*pregp_func_t)(struct list_head *hop);
|
||||
typedef void (*pertask_func_t)(struct task_struct *t, struct list_head *hop);
|
||||
typedef void (*postscan_func_t)(struct list_head *hop);
|
||||
typedef void (*holdouts_func_t)(struct list_head *hop, bool ndrpt, bool *frptp);
|
||||
@ -29,6 +29,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
|
||||
* @rtp_work: Work queue for invoking callbacks.
|
||||
* @rtp_irq_work: IRQ work queue for deferred wakeups.
|
||||
* @barrier_q_head: RCU callback for barrier operation.
|
||||
* @rtp_blkd_tasks: List of tasks blocked as readers.
|
||||
* @cpu: CPU number corresponding to this entry.
|
||||
* @rtpp: Pointer to the rcu_tasks structure.
|
||||
*/
|
||||
@ -40,6 +41,7 @@ struct rcu_tasks_percpu {
|
||||
struct work_struct rtp_work;
|
||||
struct irq_work rtp_irq_work;
|
||||
struct rcu_head barrier_q_head;
|
||||
struct list_head rtp_blkd_tasks;
|
||||
int cpu;
|
||||
struct rcu_tasks *rtpp;
|
||||
};
|
||||
@ -48,6 +50,7 @@ struct rcu_tasks_percpu {
|
||||
* struct rcu_tasks - Definition for a Tasks-RCU-like mechanism.
|
||||
* @cbs_wait: RCU wait allowing a new callback to get kthread's attention.
|
||||
* @cbs_gbl_lock: Lock protecting callback list.
|
||||
* @tasks_gp_mutex: Mutex protecting grace period, needed during mid-boot dead zone.
|
||||
* @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
|
||||
* @gp_func: This flavor's grace-period-wait function.
|
||||
* @gp_state: Grace period's most recent state transition (debugging).
|
||||
@ -79,6 +82,7 @@ struct rcu_tasks_percpu {
|
||||
struct rcu_tasks {
|
||||
struct rcuwait cbs_wait;
|
||||
raw_spinlock_t cbs_gbl_lock;
|
||||
struct mutex tasks_gp_mutex;
|
||||
int gp_state;
|
||||
int gp_sleep;
|
||||
int init_fract;
|
||||
@ -119,6 +123,7 @@ static struct rcu_tasks rt_name = \
|
||||
{ \
|
||||
.cbs_wait = __RCUWAIT_INITIALIZER(rt_name.wait), \
|
||||
.cbs_gbl_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name.cbs_gbl_lock), \
|
||||
.tasks_gp_mutex = __MUTEX_INITIALIZER(rt_name.tasks_gp_mutex), \
|
||||
.gp_func = gp, \
|
||||
.call_func = call, \
|
||||
.rtpcpu = &rt_name ## __percpu, \
|
||||
@ -140,6 +145,7 @@ static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY;
|
||||
module_param(rcu_task_ipi_delay, int, 0644);
|
||||
|
||||
/* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */
|
||||
#define RCU_TASK_BOOT_STALL_TIMEOUT (HZ * 30)
|
||||
#define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10)
|
||||
static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT;
|
||||
module_param(rcu_task_stall_timeout, int, 0644);
|
||||
@ -253,6 +259,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
|
||||
INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq);
|
||||
rtpcp->cpu = cpu;
|
||||
rtpcp->rtpp = rtp;
|
||||
if (!rtpcp->rtp_blkd_tasks.next)
|
||||
INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
|
||||
raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
|
||||
}
|
||||
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
|
||||
@ -323,17 +331,6 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
|
||||
irq_work_queue(&rtpcp->rtp_irq_work);
|
||||
}
|
||||
|
||||
// Wait for a grace period for the specified flavor of Tasks RCU.
|
||||
static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
|
||||
{
|
||||
/* Complain if the scheduler has not started. */
|
||||
RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
|
||||
"synchronize_rcu_tasks called too soon");
|
||||
|
||||
/* Wait for the grace period. */
|
||||
wait_rcu_gp(rtp->call_func);
|
||||
}
|
||||
|
||||
// RCU callback function for rcu_barrier_tasks_generic().
|
||||
static void rcu_barrier_tasks_generic_cb(struct rcu_head *rhp)
|
||||
{
|
||||
@ -439,6 +436,11 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
|
||||
WRITE_ONCE(rtp->percpu_dequeue_lim, 1);
|
||||
pr_info("Completing switch %s to CPU-0 callback queuing.\n", rtp->name);
|
||||
}
|
||||
for (cpu = rtp->percpu_dequeue_lim; cpu < nr_cpu_ids; cpu++) {
|
||||
struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
|
||||
|
||||
WARN_ON_ONCE(rcu_segcblist_n_cbs(&rtpcp->cblist));
|
||||
}
|
||||
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
|
||||
}
|
||||
|
||||
@ -497,10 +499,41 @@ static void rcu_tasks_invoke_cbs_wq(struct work_struct *wp)
|
||||
rcu_tasks_invoke_cbs(rtp, rtpcp);
|
||||
}
|
||||
|
||||
/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
|
||||
static int __noreturn rcu_tasks_kthread(void *arg)
|
||||
// Wait for one grace period.
|
||||
static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
|
||||
{
|
||||
int needgpcb;
|
||||
|
||||
mutex_lock(&rtp->tasks_gp_mutex);
|
||||
|
||||
// If there were none, wait a bit and start over.
|
||||
if (unlikely(midboot)) {
|
||||
needgpcb = 0x2;
|
||||
} else {
|
||||
set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
|
||||
rcuwait_wait_event(&rtp->cbs_wait,
|
||||
(needgpcb = rcu_tasks_need_gpcb(rtp)),
|
||||
TASK_IDLE);
|
||||
}
|
||||
|
||||
if (needgpcb & 0x2) {
|
||||
// Wait for one grace period.
|
||||
set_tasks_gp_state(rtp, RTGS_WAIT_GP);
|
||||
rtp->gp_start = jiffies;
|
||||
rcu_seq_start(&rtp->tasks_gp_seq);
|
||||
rtp->gp_func(rtp);
|
||||
rcu_seq_end(&rtp->tasks_gp_seq);
|
||||
}
|
||||
|
||||
// Invoke callbacks.
|
||||
set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
|
||||
rcu_tasks_invoke_cbs(rtp, per_cpu_ptr(rtp->rtpcpu, 0));
|
||||
mutex_unlock(&rtp->tasks_gp_mutex);
|
||||
}
|
||||
|
||||
// RCU-tasks kthread that detects grace periods and invokes callbacks.
|
||||
static int __noreturn rcu_tasks_kthread(void *arg)
|
||||
{
|
||||
struct rcu_tasks *rtp = arg;
|
||||
|
||||
/* Run on housekeeping CPUs by default. Sysadm can move if desired. */
|
||||
@ -514,31 +547,30 @@ static int __noreturn rcu_tasks_kthread(void *arg)
|
||||
* This loop is terminated by the system going down. ;-)
|
||||
*/
|
||||
for (;;) {
|
||||
set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
|
||||
// Wait for one grace period and invoke any callbacks
|
||||
// that are ready.
|
||||
rcu_tasks_one_gp(rtp, false);
|
||||
|
||||
/* If there were none, wait a bit and start over. */
|
||||
rcuwait_wait_event(&rtp->cbs_wait,
|
||||
(needgpcb = rcu_tasks_need_gpcb(rtp)),
|
||||
TASK_IDLE);
|
||||
|
||||
if (needgpcb & 0x2) {
|
||||
// Wait for one grace period.
|
||||
set_tasks_gp_state(rtp, RTGS_WAIT_GP);
|
||||
rtp->gp_start = jiffies;
|
||||
rcu_seq_start(&rtp->tasks_gp_seq);
|
||||
rtp->gp_func(rtp);
|
||||
rcu_seq_end(&rtp->tasks_gp_seq);
|
||||
}
|
||||
|
||||
/* Invoke callbacks. */
|
||||
set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
|
||||
rcu_tasks_invoke_cbs(rtp, per_cpu_ptr(rtp->rtpcpu, 0));
|
||||
|
||||
/* Paranoid sleep to keep this from entering a tight loop */
|
||||
// Paranoid sleep to keep this from entering a tight loop.
|
||||
schedule_timeout_idle(rtp->gp_sleep);
|
||||
}
|
||||
}
|
||||
|
||||
// Wait for a grace period for the specified flavor of Tasks RCU.
|
||||
static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
|
||||
{
|
||||
/* Complain if the scheduler has not started. */
|
||||
RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
|
||||
"synchronize_rcu_tasks called too soon");
|
||||
|
||||
// If the grace-period kthread is running, use it.
|
||||
if (READ_ONCE(rtp->kthread_ptr)) {
|
||||
wait_rcu_gp(rtp->call_func);
|
||||
return;
|
||||
}
|
||||
rcu_tasks_one_gp(rtp, true);
|
||||
}
|
||||
|
||||
/* Spawn RCU-tasks grace-period kthread. */
|
||||
static void __init rcu_spawn_tasks_kthread_generic(struct rcu_tasks *rtp)
|
||||
{
|
||||
@ -630,7 +662,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
|
||||
struct task_struct *t;
|
||||
|
||||
set_tasks_gp_state(rtp, RTGS_PRE_WAIT_GP);
|
||||
rtp->pregp_func();
|
||||
rtp->pregp_func(&holdouts);
|
||||
|
||||
/*
|
||||
* There were callbacks, so we need to wait for an RCU-tasks
|
||||
@ -639,10 +671,12 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
|
||||
* and make a list of them in holdouts.
|
||||
*/
|
||||
set_tasks_gp_state(rtp, RTGS_SCAN_TASKLIST);
|
||||
rcu_read_lock();
|
||||
for_each_process_thread(g, t)
|
||||
rtp->pertask_func(t, &holdouts);
|
||||
rcu_read_unlock();
|
||||
if (rtp->pertask_func) {
|
||||
rcu_read_lock();
|
||||
for_each_process_thread(g, t)
|
||||
rtp->pertask_func(t, &holdouts);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
set_tasks_gp_state(rtp, RTGS_POST_SCAN_TASKLIST);
|
||||
rtp->postscan_func(&holdouts);
|
||||
@ -760,7 +794,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
|
||||
// disabling.
|
||||
|
||||
/* Pre-grace-period preparation. */
|
||||
static void rcu_tasks_pregp_step(void)
|
||||
static void rcu_tasks_pregp_step(struct list_head *hop)
|
||||
{
|
||||
/*
|
||||
* Wait for all pre-existing t->on_rq and t->nvcsw transitions
|
||||
@ -1105,11 +1139,10 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
|
||||
// 3. Avoids expensive read-side instructions, having overhead similar
|
||||
// to that of Preemptible RCU.
|
||||
//
|
||||
// There are of course downsides. The grace-period code can send IPIs to
|
||||
// CPUs, even when those CPUs are in the idle loop or in nohz_full userspace.
|
||||
// It is necessary to scan the full tasklist, much as for Tasks RCU. There
|
||||
// is a single callback queue guarded by a single lock, again, much as for
|
||||
// Tasks RCU. If needed, these downsides can be at least partially remedied.
|
||||
// There are of course downsides. For example, the grace-period code
|
||||
// can send IPIs to CPUs, even when those CPUs are in the idle loop or
|
||||
// in nohz_full userspace. If needed, these downsides can be at least
|
||||
// partially remedied.
|
||||
//
|
||||
// Perhaps most important, this variant of RCU does not affect the vanilla
|
||||
// flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace
|
||||
@ -1122,38 +1155,30 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
|
||||
// invokes these functions in this order:
|
||||
//
|
||||
// rcu_tasks_trace_pregp_step():
|
||||
// Initialize the count of readers and block CPU-hotplug operations.
|
||||
// rcu_tasks_trace_pertask(), invoked on every non-idle task:
|
||||
// Initialize per-task state and attempt to identify an immediate
|
||||
// quiescent state for that task, or, failing that, attempt to
|
||||
// set that task's .need_qs flag so that task's next outermost
|
||||
// rcu_read_unlock_trace() will report the quiescent state (in which
|
||||
// case the count of readers is incremented). If both attempts fail,
|
||||
// the task is added to a "holdout" list. Note that IPIs are used
|
||||
// to invoke trc_read_check_handler() in the context of running tasks
|
||||
// in order to avoid ordering overhead on common-case shared-variable
|
||||
// accessses.
|
||||
// Disables CPU hotplug, adds all currently executing tasks to the
|
||||
// holdout list, then checks the state of all tasks that blocked
|
||||
// or were preempted within their current RCU Tasks Trace read-side
|
||||
// critical section, adding them to the holdout list if appropriate.
|
||||
// Finally, this function re-enables CPU hotplug.
|
||||
// The ->pertask_func() pointer is NULL, so there is no per-task processing.
|
||||
// rcu_tasks_trace_postscan():
|
||||
// Initialize state and attempt to identify an immediate quiescent
|
||||
// state as above (but only for idle tasks), unblock CPU-hotplug
|
||||
// operations, and wait for an RCU grace period to avoid races with
|
||||
// tasks that are in the process of exiting.
|
||||
// Invokes synchronize_rcu() to wait for late-stage exiting tasks
|
||||
// to finish exiting.
|
||||
// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty:
|
||||
// Scans the holdout list, attempting to identify a quiescent state
|
||||
// for each task on the list. If there is a quiescent state, the
|
||||
// corresponding task is removed from the holdout list.
|
||||
// corresponding task is removed from the holdout list. Once this
|
||||
// list is empty, the grace period has completed.
|
||||
// rcu_tasks_trace_postgp():
|
||||
// Wait for the count of readers do drop to zero, reporting any stalls.
|
||||
// Also execute full memory barriers to maintain ordering with code
|
||||
// executing after the grace period.
|
||||
// Provides the needed full memory barrier and does debug checks.
|
||||
//
|
||||
// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks.
|
||||
//
|
||||
// Pre-grace-period update-side code is ordered before the grace
|
||||
// period via the ->cbs_lock and barriers in rcu_tasks_kthread().
|
||||
// Pre-grace-period read-side code is ordered before the grace period by
|
||||
// atomic_dec_and_test() of the count of readers (for IPIed readers) and by
|
||||
// scheduler context-switch ordering (for locked-down non-running readers).
|
||||
// Pre-grace-period update-side code is ordered before the grace period
|
||||
// via the ->cbs_lock and barriers in rcu_tasks_kthread(). Pre-grace-period
|
||||
// read-side code is ordered before the grace period by atomic operations
|
||||
// on .b.need_qs flag of each task involved in this process, or by scheduler
|
||||
// context-switch ordering (for locked-down non-running readers).
|
||||
|
||||
// The lockdep state must be outside of #ifdef to be useful.
|
||||
#ifdef CONFIG_DEBUG_LOCK_ALLOC
|
||||
@ -1165,9 +1190,6 @@ EXPORT_SYMBOL_GPL(rcu_trace_lock_map);
|
||||
|
||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||
|
||||
static atomic_t trc_n_readers_need_end; // Number of waited-for readers.
|
||||
static DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks.
|
||||
|
||||
// Record outstanding IPIs to each CPU. No point in sending two...
|
||||
static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
|
||||
|
||||
@ -1176,44 +1198,104 @@ static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
|
||||
static unsigned long n_heavy_reader_attempts;
|
||||
static unsigned long n_heavy_reader_updates;
|
||||
static unsigned long n_heavy_reader_ofl_updates;
|
||||
static unsigned long n_trc_holdouts;
|
||||
|
||||
void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
|
||||
DEFINE_RCU_TASKS(rcu_tasks_trace, rcu_tasks_wait_gp, call_rcu_tasks_trace,
|
||||
"RCU Tasks Trace");
|
||||
|
||||
/*
|
||||
* This irq_work handler allows rcu_read_unlock_trace() to be invoked
|
||||
* while the scheduler locks are held.
|
||||
*/
|
||||
static void rcu_read_unlock_iw(struct irq_work *iwp)
|
||||
/* Load from ->trc_reader_special.b.need_qs with proper ordering. */
|
||||
static u8 rcu_ld_need_qs(struct task_struct *t)
|
||||
{
|
||||
wake_up(&trc_wait);
|
||||
smp_mb(); // Enforce full grace-period ordering.
|
||||
return smp_load_acquire(&t->trc_reader_special.b.need_qs);
|
||||
}
|
||||
static DEFINE_IRQ_WORK(rcu_tasks_trace_iw, rcu_read_unlock_iw);
|
||||
|
||||
/* If we are the last reader, wake up the grace-period kthread. */
|
||||
/* Store to ->trc_reader_special.b.need_qs with proper ordering. */
|
||||
static void rcu_st_need_qs(struct task_struct *t, u8 v)
|
||||
{
|
||||
smp_store_release(&t->trc_reader_special.b.need_qs, v);
|
||||
smp_mb(); // Enforce full grace-period ordering.
|
||||
}
|
||||
|
||||
/*
|
||||
* Do a cmpxchg() on ->trc_reader_special.b.need_qs, allowing for
|
||||
* the four-byte operand-size restriction of some platforms.
|
||||
* Returns the old value, which is often ignored.
|
||||
*/
|
||||
u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new)
|
||||
{
|
||||
union rcu_special ret;
|
||||
union rcu_special trs_old = READ_ONCE(t->trc_reader_special);
|
||||
union rcu_special trs_new = trs_old;
|
||||
|
||||
if (trs_old.b.need_qs != old)
|
||||
return trs_old.b.need_qs;
|
||||
trs_new.b.need_qs = new;
|
||||
ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s);
|
||||
return ret.b.need_qs;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs);
|
||||
|
||||
/*
|
||||
* If we are the last reader, signal the grace-period kthread.
|
||||
* Also remove from the per-CPU list of blocked tasks.
|
||||
*/
|
||||
void rcu_read_unlock_trace_special(struct task_struct *t)
|
||||
{
|
||||
int nq = READ_ONCE(t->trc_reader_special.b.need_qs);
|
||||
unsigned long flags;
|
||||
struct rcu_tasks_percpu *rtpcp;
|
||||
union rcu_special trs;
|
||||
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
|
||||
t->trc_reader_special.b.need_mb)
|
||||
// Open-coded full-word version of rcu_ld_need_qs().
|
||||
smp_mb(); // Enforce full grace-period ordering.
|
||||
trs = smp_load_acquire(&t->trc_reader_special);
|
||||
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && t->trc_reader_special.b.need_mb)
|
||||
smp_mb(); // Pairs with update-side barriers.
|
||||
// Update .need_qs before ->trc_reader_nesting for irq/NMI handlers.
|
||||
if (nq)
|
||||
WRITE_ONCE(t->trc_reader_special.b.need_qs, false);
|
||||
if (trs.b.need_qs == (TRC_NEED_QS_CHECKED | TRC_NEED_QS)) {
|
||||
u8 result = rcu_trc_cmpxchg_need_qs(t, TRC_NEED_QS_CHECKED | TRC_NEED_QS,
|
||||
TRC_NEED_QS_CHECKED);
|
||||
|
||||
WARN_ONCE(result != trs.b.need_qs, "%s: result = %d", __func__, result);
|
||||
}
|
||||
if (trs.b.blocked) {
|
||||
rtpcp = per_cpu_ptr(rcu_tasks_trace.rtpcpu, t->trc_blkd_cpu);
|
||||
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
|
||||
list_del_init(&t->trc_blkd_node);
|
||||
WRITE_ONCE(t->trc_reader_special.b.blocked, false);
|
||||
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
|
||||
}
|
||||
WRITE_ONCE(t->trc_reader_nesting, 0);
|
||||
if (nq && atomic_dec_and_test(&trc_n_readers_need_end))
|
||||
irq_work_queue(&rcu_tasks_trace_iw);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);
|
||||
|
||||
/* Add a newly blocked reader task to its CPU's list. */
|
||||
void rcu_tasks_trace_qs_blkd(struct task_struct *t)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct rcu_tasks_percpu *rtpcp;
|
||||
|
||||
local_irq_save(flags);
|
||||
rtpcp = this_cpu_ptr(rcu_tasks_trace.rtpcpu);
|
||||
raw_spin_lock_rcu_node(rtpcp); // irqs already disabled
|
||||
t->trc_blkd_cpu = smp_processor_id();
|
||||
if (!rtpcp->rtp_blkd_tasks.next)
|
||||
INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
|
||||
list_add(&t->trc_blkd_node, &rtpcp->rtp_blkd_tasks);
|
||||
WRITE_ONCE(t->trc_reader_special.b.blocked, true);
|
||||
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_tasks_trace_qs_blkd);
|
||||
|
||||
/* Add a task to the holdout list, if it is not already on the list. */
|
||||
static void trc_add_holdout(struct task_struct *t, struct list_head *bhp)
|
||||
{
|
||||
if (list_empty(&t->trc_holdout_list)) {
|
||||
get_task_struct(t);
|
||||
list_add(&t->trc_holdout_list, bhp);
|
||||
n_trc_holdouts++;
|
||||
}
|
||||
}
|
||||
|
||||
@ -1223,37 +1305,36 @@ static void trc_del_holdout(struct task_struct *t)
|
||||
if (!list_empty(&t->trc_holdout_list)) {
|
||||
list_del_init(&t->trc_holdout_list);
|
||||
put_task_struct(t);
|
||||
n_trc_holdouts--;
|
||||
}
|
||||
}
|
||||
|
||||
/* IPI handler to check task state. */
|
||||
static void trc_read_check_handler(void *t_in)
|
||||
{
|
||||
int nesting;
|
||||
struct task_struct *t = current;
|
||||
struct task_struct *texp = t_in;
|
||||
|
||||
// If the task is no longer running on this CPU, leave.
|
||||
if (unlikely(texp != t)) {
|
||||
if (unlikely(texp != t))
|
||||
goto reset_ipi; // Already on holdout list, so will check later.
|
||||
}
|
||||
|
||||
// If the task is not in a read-side critical section, and
|
||||
// if this is the last reader, awaken the grace-period kthread.
|
||||
if (likely(!READ_ONCE(t->trc_reader_nesting))) {
|
||||
WRITE_ONCE(t->trc_reader_checked, true);
|
||||
nesting = READ_ONCE(t->trc_reader_nesting);
|
||||
if (likely(!nesting)) {
|
||||
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
|
||||
goto reset_ipi;
|
||||
}
|
||||
// If we are racing with an rcu_read_unlock_trace(), try again later.
|
||||
if (unlikely(READ_ONCE(t->trc_reader_nesting) < 0))
|
||||
if (unlikely(nesting < 0))
|
||||
goto reset_ipi;
|
||||
WRITE_ONCE(t->trc_reader_checked, true);
|
||||
|
||||
// Get here if the task is in a read-side critical section. Set
|
||||
// its state so that it will awaken the grace-period kthread upon
|
||||
// exit from that critical section.
|
||||
atomic_inc(&trc_n_readers_need_end); // One more to wait on.
|
||||
WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs));
|
||||
WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
|
||||
// Get here if the task is in a read-side critical section.
|
||||
// Set its state so that it will update state for the grace-period
|
||||
// kthread upon exit from that critical section.
|
||||
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED);
|
||||
|
||||
reset_ipi:
|
||||
// Allow future IPIs to be sent on CPU and for task.
|
||||
@ -1264,48 +1345,50 @@ static void trc_read_check_handler(void *t_in)
|
||||
}
|
||||
|
||||
/* Callback function for scheduler to check locked-down task. */
|
||||
static int trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
static int trc_inspect_reader(struct task_struct *t, void *bhp_in)
|
||||
{
|
||||
struct list_head *bhp = bhp_in;
|
||||
int cpu = task_cpu(t);
|
||||
int nesting;
|
||||
bool ofl = cpu_is_offline(cpu);
|
||||
|
||||
if (task_curr(t)) {
|
||||
WARN_ON_ONCE(ofl && !is_idle_task(t));
|
||||
|
||||
if (task_curr(t) && !ofl) {
|
||||
// If no chance of heavyweight readers, do it the hard way.
|
||||
if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
return -EINVAL;
|
||||
|
||||
// If heavyweight readers are enabled on the remote task,
|
||||
// we can inspect its state despite its currently running.
|
||||
// However, we cannot safely change its state.
|
||||
n_heavy_reader_attempts++;
|
||||
if (!ofl && // Check for "running" idle tasks on offline CPUs.
|
||||
!rcu_dynticks_zero_in_eqs(cpu, &t->trc_reader_nesting))
|
||||
// Check for "running" idle tasks on offline CPUs.
|
||||
if (!rcu_dynticks_zero_in_eqs(cpu, &t->trc_reader_nesting))
|
||||
return -EINVAL; // No quiescent state, do it the hard way.
|
||||
n_heavy_reader_updates++;
|
||||
if (ofl)
|
||||
n_heavy_reader_ofl_updates++;
|
||||
nesting = 0;
|
||||
} else {
|
||||
// The task is not running, so C-language access is safe.
|
||||
nesting = t->trc_reader_nesting;
|
||||
WARN_ON_ONCE(ofl && task_curr(t) && !is_idle_task(t));
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && ofl)
|
||||
n_heavy_reader_ofl_updates++;
|
||||
}
|
||||
|
||||
// If not exiting a read-side critical section, mark as checked
|
||||
// so that the grace-period kthread will remove it from the
|
||||
// holdout list.
|
||||
t->trc_reader_checked = nesting >= 0;
|
||||
if (nesting <= 0)
|
||||
return nesting ? -EINVAL : 0; // If in QS, done, otherwise try again later.
|
||||
if (!nesting) {
|
||||
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
|
||||
return 0; // In QS, so done.
|
||||
}
|
||||
if (nesting < 0)
|
||||
return -EINVAL; // Reader transitioning, try again later.
|
||||
|
||||
// The task is in a read-side critical section, so set up its
|
||||
// state so that it will awaken the grace-period kthread upon exit
|
||||
// from that critical section.
|
||||
atomic_inc(&trc_n_readers_need_end); // One more to wait on.
|
||||
WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs));
|
||||
WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
|
||||
// state so that it will update state upon exit from that critical
|
||||
// section.
|
||||
if (!rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED))
|
||||
trc_add_holdout(t, bhp);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1321,14 +1404,14 @@ static void trc_wait_for_one_reader(struct task_struct *t,
|
||||
|
||||
// The current task had better be in a quiescent state.
|
||||
if (t == current) {
|
||||
t->trc_reader_checked = true;
|
||||
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
|
||||
WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
|
||||
return;
|
||||
}
|
||||
|
||||
// Attempt to nail down the task for inspection.
|
||||
get_task_struct(t);
|
||||
if (!task_call_func(t, trc_inspect_reader, NULL)) {
|
||||
if (!task_call_func(t, trc_inspect_reader, bhp)) {
|
||||
put_task_struct(t);
|
||||
return;
|
||||
}
|
||||
@ -1366,56 +1449,93 @@ static void trc_wait_for_one_reader(struct task_struct *t,
|
||||
}
|
||||
}
|
||||
|
||||
/* Initialize for a new RCU-tasks-trace grace period. */
|
||||
static void rcu_tasks_trace_pregp_step(void)
|
||||
/*
|
||||
* Initialize for first-round processing for the specified task.
|
||||
* Return false if task is NULL or already taken care of, true otherwise.
|
||||
*/
|
||||
static bool rcu_tasks_trace_pertask_prep(struct task_struct *t, bool notself)
|
||||
{
|
||||
int cpu;
|
||||
// During early boot when there is only the one boot CPU, there
|
||||
// is no idle task for the other CPUs. Also, the grace-period
|
||||
// kthread is always in a quiescent state. In addition, just return
|
||||
// if this task is already on the list.
|
||||
if (unlikely(t == NULL) || (t == current && notself) || !list_empty(&t->trc_holdout_list))
|
||||
return false;
|
||||
|
||||
// Allow for fast-acting IPIs.
|
||||
atomic_set(&trc_n_readers_need_end, 1);
|
||||
rcu_st_need_qs(t, 0);
|
||||
t->trc_ipi_to_cpu = -1;
|
||||
return true;
|
||||
}
|
||||
|
||||
/* Do first-round processing for the specified task. */
|
||||
static void rcu_tasks_trace_pertask(struct task_struct *t, struct list_head *hop)
|
||||
{
|
||||
if (rcu_tasks_trace_pertask_prep(t, true))
|
||||
trc_wait_for_one_reader(t, hop);
|
||||
}
|
||||
|
||||
/* Initialize for a new RCU-tasks-trace grace period. */
|
||||
static void rcu_tasks_trace_pregp_step(struct list_head *hop)
|
||||
{
|
||||
LIST_HEAD(blkd_tasks);
|
||||
int cpu;
|
||||
unsigned long flags;
|
||||
struct rcu_tasks_percpu *rtpcp;
|
||||
struct task_struct *t;
|
||||
|
||||
// There shouldn't be any old IPIs, but...
|
||||
for_each_possible_cpu(cpu)
|
||||
WARN_ON_ONCE(per_cpu(trc_ipi_to_cpu, cpu));
|
||||
|
||||
// Disable CPU hotplug across the tasklist scan.
|
||||
// This also waits for all readers in CPU-hotplug code paths.
|
||||
// Disable CPU hotplug across the CPU scan for the benefit of
|
||||
// any IPIs that might be needed. This also waits for all readers
|
||||
// in CPU-hotplug code paths.
|
||||
cpus_read_lock();
|
||||
}
|
||||
|
||||
/* Do first-round processing for the specified task. */
|
||||
static void rcu_tasks_trace_pertask(struct task_struct *t,
|
||||
struct list_head *hop)
|
||||
{
|
||||
// During early boot when there is only the one boot CPU, there
|
||||
// is no idle task for the other CPUs. Just return.
|
||||
if (unlikely(t == NULL))
|
||||
return;
|
||||
// These rcu_tasks_trace_pertask_prep() calls are serialized to
|
||||
// allow safe access to the hop list.
|
||||
for_each_online_cpu(cpu) {
|
||||
rcu_read_lock();
|
||||
t = cpu_curr_snapshot(cpu);
|
||||
if (rcu_tasks_trace_pertask_prep(t, true))
|
||||
trc_add_holdout(t, hop);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
WRITE_ONCE(t->trc_reader_special.b.need_qs, false);
|
||||
WRITE_ONCE(t->trc_reader_checked, false);
|
||||
t->trc_ipi_to_cpu = -1;
|
||||
trc_wait_for_one_reader(t, hop);
|
||||
// Only after all running tasks have been accounted for is it
|
||||
// safe to take care of the tasks that have blocked within their
|
||||
// current RCU tasks trace read-side critical section.
|
||||
for_each_possible_cpu(cpu) {
|
||||
rtpcp = per_cpu_ptr(rcu_tasks_trace.rtpcpu, cpu);
|
||||
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
|
||||
list_splice_init(&rtpcp->rtp_blkd_tasks, &blkd_tasks);
|
||||
while (!list_empty(&blkd_tasks)) {
|
||||
rcu_read_lock();
|
||||
t = list_first_entry(&blkd_tasks, struct task_struct, trc_blkd_node);
|
||||
list_del_init(&t->trc_blkd_node);
|
||||
list_add(&t->trc_blkd_node, &rtpcp->rtp_blkd_tasks);
|
||||
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
|
||||
rcu_tasks_trace_pertask(t, hop);
|
||||
rcu_read_unlock();
|
||||
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
|
||||
}
|
||||
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
|
||||
}
|
||||
|
||||
// Re-enable CPU hotplug now that the holdout list is populated.
|
||||
cpus_read_unlock();
|
||||
}
|
||||
|
||||
/*
|
||||
* Do intermediate processing between task and holdout scans and
|
||||
* pick up the idle tasks.
|
||||
* Do intermediate processing between task and holdout scans.
|
||||
*/
|
||||
static void rcu_tasks_trace_postscan(struct list_head *hop)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
for_each_possible_cpu(cpu)
|
||||
rcu_tasks_trace_pertask(idle_task(cpu), hop);
|
||||
|
||||
// Re-enable CPU hotplug now that the tasklist scan has completed.
|
||||
cpus_read_unlock();
|
||||
|
||||
// Wait for late-stage exiting tasks to finish exiting.
|
||||
// These might have passed the call to exit_tasks_rcu_finish().
|
||||
synchronize_rcu();
|
||||
// Any tasks that exit after this point will set ->trc_reader_checked.
|
||||
// Any tasks that exit after this point will set
|
||||
// TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.
|
||||
}
|
||||
|
||||
/* Communicate task state back to the RCU tasks trace stall warning request. */
|
||||
@ -1429,11 +1549,11 @@ static int trc_check_slow_task(struct task_struct *t, void *arg)
|
||||
{
|
||||
struct trc_stall_chk_rdr *trc_rdrp = arg;
|
||||
|
||||
if (task_curr(t))
|
||||
if (task_curr(t) && cpu_online(task_cpu(t)))
|
||||
return false; // It is running, so decline to inspect it.
|
||||
trc_rdrp->nesting = READ_ONCE(t->trc_reader_nesting);
|
||||
trc_rdrp->ipi_to_cpu = READ_ONCE(t->trc_ipi_to_cpu);
|
||||
trc_rdrp->needqs = READ_ONCE(t->trc_reader_special.b.need_qs);
|
||||
trc_rdrp->needqs = rcu_ld_need_qs(t);
|
||||
return true;
|
||||
}
|
||||
|
||||
@ -1450,18 +1570,21 @@ static void show_stalled_task_trace(struct task_struct *t, bool *firstreport)
|
||||
}
|
||||
cpu = task_cpu(t);
|
||||
if (!task_call_func(t, trc_check_slow_task, &trc_rdr))
|
||||
pr_alert("P%d: %c\n",
|
||||
pr_alert("P%d: %c%c\n",
|
||||
t->pid,
|
||||
".I"[t->trc_ipi_to_cpu >= 0],
|
||||
".i"[is_idle_tsk]);
|
||||
else
|
||||
pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d\n",
|
||||
pr_alert("P%d: %c%c%c%c nesting: %d%c%c cpu: %d%s\n",
|
||||
t->pid,
|
||||
".I"[trc_rdr.ipi_to_cpu >= 0],
|
||||
".i"[is_idle_tsk],
|
||||
".N"[cpu >= 0 && tick_nohz_full_cpu(cpu)],
|
||||
".B"[!!data_race(t->trc_reader_special.b.blocked)],
|
||||
trc_rdr.nesting,
|
||||
" N"[!!trc_rdr.needqs],
|
||||
cpu);
|
||||
" !CN"[trc_rdr.needqs & 0x3],
|
||||
" ?"[trc_rdr.needqs > 0x3],
|
||||
cpu, cpu_online(cpu) ? "" : "(offline)");
|
||||
sched_show_task(t);
|
||||
}
|
||||
|
||||
@ -1481,18 +1604,18 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
|
||||
{
|
||||
struct task_struct *g, *t;
|
||||
|
||||
// Disable CPU hotplug across the holdout list scan.
|
||||
// Disable CPU hotplug across the holdout list scan for IPIs.
|
||||
cpus_read_lock();
|
||||
|
||||
list_for_each_entry_safe(t, g, hop, trc_holdout_list) {
|
||||
// If safe and needed, try to check the current task.
|
||||
if (READ_ONCE(t->trc_ipi_to_cpu) == -1 &&
|
||||
!READ_ONCE(t->trc_reader_checked))
|
||||
!(rcu_ld_need_qs(t) & TRC_NEED_QS_CHECKED))
|
||||
trc_wait_for_one_reader(t, hop);
|
||||
|
||||
// If check succeeded, remove this task from the list.
|
||||
if (smp_load_acquire(&t->trc_ipi_to_cpu) == -1 &&
|
||||
READ_ONCE(t->trc_reader_checked))
|
||||
rcu_ld_need_qs(t) == TRC_NEED_QS_CHECKED)
|
||||
trc_del_holdout(t);
|
||||
else if (needreport)
|
||||
show_stalled_task_trace(t, firstreport);
|
||||
@ -1516,10 +1639,6 @@ static void rcu_tasks_trace_empty_fn(void *unused)
|
||||
static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
|
||||
{
|
||||
int cpu;
|
||||
bool firstreport;
|
||||
struct task_struct *g, *t;
|
||||
LIST_HEAD(holdouts);
|
||||
long ret;
|
||||
|
||||
// Wait for any lingering IPI handlers to complete. Note that
|
||||
// if a CPU has gone offline or transitioned to userspace in the
|
||||
@ -1530,37 +1649,6 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
|
||||
if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
|
||||
smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);
|
||||
|
||||
// Remove the safety count.
|
||||
smp_mb__before_atomic(); // Order vs. earlier atomics
|
||||
atomic_dec(&trc_n_readers_need_end);
|
||||
smp_mb__after_atomic(); // Order vs. later atomics
|
||||
|
||||
// Wait for readers.
|
||||
set_tasks_gp_state(rtp, RTGS_WAIT_READERS);
|
||||
for (;;) {
|
||||
ret = wait_event_idle_exclusive_timeout(
|
||||
trc_wait,
|
||||
atomic_read(&trc_n_readers_need_end) == 0,
|
||||
READ_ONCE(rcu_task_stall_timeout));
|
||||
if (ret)
|
||||
break; // Count reached zero.
|
||||
// Stall warning time, so make a list of the offenders.
|
||||
rcu_read_lock();
|
||||
for_each_process_thread(g, t)
|
||||
if (READ_ONCE(t->trc_reader_special.b.need_qs))
|
||||
trc_add_holdout(t, &holdouts);
|
||||
rcu_read_unlock();
|
||||
firstreport = true;
|
||||
list_for_each_entry_safe(t, g, &holdouts, trc_holdout_list) {
|
||||
if (READ_ONCE(t->trc_reader_special.b.need_qs))
|
||||
show_stalled_task_trace(t, &firstreport);
|
||||
trc_del_holdout(t); // Release task_struct reference.
|
||||
}
|
||||
if (firstreport)
|
||||
pr_err("INFO: rcu_tasks_trace detected stalls? (Counter/taskslist mismatch?)\n");
|
||||
show_stalled_ipi_trace();
|
||||
pr_err("\t%d holdouts\n", atomic_read(&trc_n_readers_need_end));
|
||||
}
|
||||
smp_mb(); // Caller's code must be ordered after wakeup.
|
||||
// Pairs with pretty much every ordering primitive.
|
||||
}
|
||||
@ -1568,11 +1656,14 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
|
||||
/* Report any needed quiescent state for this exiting task. */
|
||||
static void exit_tasks_rcu_finish_trace(struct task_struct *t)
|
||||
{
|
||||
WRITE_ONCE(t->trc_reader_checked, true);
|
||||
union rcu_special trs = READ_ONCE(t->trc_reader_special);
|
||||
|
||||
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
|
||||
WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
|
||||
WRITE_ONCE(t->trc_reader_nesting, 0);
|
||||
if (WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs)))
|
||||
if (WARN_ON_ONCE(rcu_ld_need_qs(t) & TRC_NEED_QS || trs.b.blocked))
|
||||
rcu_read_unlock_trace_special(t);
|
||||
else
|
||||
WRITE_ONCE(t->trc_reader_nesting, 0);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -1646,7 +1737,6 @@ static int __init rcu_spawn_tasks_trace_kthread(void)
|
||||
rcu_tasks_trace.init_fract = 1;
|
||||
}
|
||||
rcu_tasks_trace.pregp_func = rcu_tasks_trace_pregp_step;
|
||||
rcu_tasks_trace.pertask_func = rcu_tasks_trace_pertask;
|
||||
rcu_tasks_trace.postscan_func = rcu_tasks_trace_postscan;
|
||||
rcu_tasks_trace.holdouts_func = check_all_holdout_tasks_trace;
|
||||
rcu_tasks_trace.postgp_func = rcu_tasks_trace_postgp;
|
||||
@ -1659,7 +1749,8 @@ void show_rcu_tasks_trace_gp_kthread(void)
|
||||
{
|
||||
char buf[64];
|
||||
|
||||
sprintf(buf, "N%d h:%lu/%lu/%lu", atomic_read(&trc_n_readers_need_end),
|
||||
sprintf(buf, "N%lu h:%lu/%lu/%lu",
|
||||
data_race(n_trc_holdouts),
|
||||
data_race(n_heavy_reader_ofl_updates),
|
||||
data_race(n_heavy_reader_updates),
|
||||
data_race(n_heavy_reader_attempts));
|
||||
@ -1686,23 +1777,24 @@ struct rcu_tasks_test_desc {
|
||||
struct rcu_head rh;
|
||||
const char *name;
|
||||
bool notrun;
|
||||
unsigned long runstart;
|
||||
};
|
||||
|
||||
static struct rcu_tasks_test_desc tests[] = {
|
||||
{
|
||||
.name = "call_rcu_tasks()",
|
||||
/* If not defined, the test is skipped. */
|
||||
.notrun = !IS_ENABLED(CONFIG_TASKS_RCU),
|
||||
.notrun = IS_ENABLED(CONFIG_TASKS_RCU),
|
||||
},
|
||||
{
|
||||
.name = "call_rcu_tasks_rude()",
|
||||
/* If not defined, the test is skipped. */
|
||||
.notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU),
|
||||
.notrun = IS_ENABLED(CONFIG_TASKS_RUDE_RCU),
|
||||
},
|
||||
{
|
||||
.name = "call_rcu_tasks_trace()",
|
||||
/* If not defined, the test is skipped. */
|
||||
.notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU)
|
||||
.notrun = IS_ENABLED(CONFIG_TASKS_TRACE_RCU)
|
||||
}
|
||||
};
|
||||
|
||||
@ -1713,46 +1805,85 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp)
|
||||
|
||||
pr_info("Callback from %s invoked.\n", rttd->name);
|
||||
|
||||
rttd->notrun = true;
|
||||
rttd->notrun = false;
|
||||
}
|
||||
|
||||
static void rcu_tasks_initiate_self_tests(void)
|
||||
{
|
||||
unsigned long j = jiffies;
|
||||
|
||||
pr_info("Running RCU-tasks wait API self tests\n");
|
||||
#ifdef CONFIG_TASKS_RCU
|
||||
tests[0].runstart = j;
|
||||
synchronize_rcu_tasks();
|
||||
call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_TASKS_RUDE_RCU
|
||||
tests[1].runstart = j;
|
||||
synchronize_rcu_tasks_rude();
|
||||
call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||
tests[2].runstart = j;
|
||||
synchronize_rcu_tasks_trace();
|
||||
call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback);
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Return: 0 - test passed
|
||||
* 1 - test failed, but have not timed out yet
|
||||
* -1 - test failed and timed out
|
||||
*/
|
||||
static int rcu_tasks_verify_self_tests(void)
|
||||
{
|
||||
int ret = 0;
|
||||
int i;
|
||||
unsigned long bst = rcu_task_stall_timeout;
|
||||
|
||||
if (bst <= 0 || bst > RCU_TASK_BOOT_STALL_TIMEOUT)
|
||||
bst = RCU_TASK_BOOT_STALL_TIMEOUT;
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
if (!tests[i].notrun) { // still hanging.
|
||||
pr_err("%s has been failed.\n", tests[i].name);
|
||||
ret = -1;
|
||||
while (tests[i].notrun) { // still hanging.
|
||||
if (time_after(jiffies, tests[i].runstart + bst)) {
|
||||
pr_err("%s has failed boot-time tests.\n", tests[i].name);
|
||||
ret = -1;
|
||||
break;
|
||||
}
|
||||
ret = 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (ret)
|
||||
WARN_ON(1);
|
||||
WARN_ON(ret < 0);
|
||||
|
||||
return ret;
|
||||
}
|
||||
late_initcall(rcu_tasks_verify_self_tests);
|
||||
|
||||
/*
|
||||
* Repeat the rcu_tasks_verify_self_tests() call once every second until the
|
||||
* test passes or has timed out.
|
||||
*/
|
||||
static struct delayed_work rcu_tasks_verify_work;
|
||||
static void rcu_tasks_verify_work_fn(struct work_struct *work __maybe_unused)
|
||||
{
|
||||
int ret = rcu_tasks_verify_self_tests();
|
||||
|
||||
if (ret <= 0)
|
||||
return;
|
||||
|
||||
/* Test fails but not timed out yet, reschedule another check */
|
||||
schedule_delayed_work(&rcu_tasks_verify_work, HZ);
|
||||
}
|
||||
|
||||
static int rcu_tasks_verify_schedule_work(void)
|
||||
{
|
||||
INIT_DELAYED_WORK(&rcu_tasks_verify_work, rcu_tasks_verify_work_fn);
|
||||
rcu_tasks_verify_work_fn(NULL);
|
||||
return 0;
|
||||
}
|
||||
late_initcall(rcu_tasks_verify_schedule_work);
|
||||
#else /* #ifdef CONFIG_PROVE_RCU */
|
||||
static void rcu_tasks_initiate_self_tests(void) { }
|
||||
#endif /* #else #ifdef CONFIG_PROVE_RCU */
|
||||
|
@ -58,7 +58,7 @@ void rcu_qs(void)
|
||||
rcu_ctrlblk.donetail = rcu_ctrlblk.curtail;
|
||||
raise_softirq_irqoff(RCU_SOFTIRQ);
|
||||
}
|
||||
WRITE_ONCE(rcu_ctrlblk.gp_seq, rcu_ctrlblk.gp_seq + 1);
|
||||
WRITE_ONCE(rcu_ctrlblk.gp_seq, rcu_ctrlblk.gp_seq + 2);
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
@ -139,8 +139,10 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
|
||||
/*
|
||||
* Wait for a grace period to elapse. But it is illegal to invoke
|
||||
* synchronize_rcu() from within an RCU read-side critical section.
|
||||
* Therefore, any legal call to synchronize_rcu() is a quiescent
|
||||
* state, and so on a UP system, synchronize_rcu() need do nothing.
|
||||
* Therefore, any legal call to synchronize_rcu() is a quiescent state,
|
||||
* and so on a UP system, synchronize_rcu() need do nothing, other than
|
||||
* let the polled APIs know that another grace period elapsed.
|
||||
*
|
||||
* (But Lai Jiangshan points out the benefits of doing might_sleep()
|
||||
* to reduce latency.)
|
||||
*
|
||||
@ -152,6 +154,7 @@ void synchronize_rcu(void)
|
||||
lock_is_held(&rcu_lock_map) ||
|
||||
lock_is_held(&rcu_sched_lock_map),
|
||||
"Illegal synchronize_rcu() in RCU read-side critical section");
|
||||
WRITE_ONCE(rcu_ctrlblk.gp_seq, rcu_ctrlblk.gp_seq + 2);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(synchronize_rcu);
|
||||
|
||||
@ -213,10 +216,24 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
|
||||
*/
|
||||
bool poll_state_synchronize_rcu(unsigned long oldstate)
|
||||
{
|
||||
return READ_ONCE(rcu_ctrlblk.gp_seq) != oldstate;
|
||||
return oldstate == RCU_GET_STATE_COMPLETED || READ_ONCE(rcu_ctrlblk.gp_seq) != oldstate;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
|
||||
|
||||
#ifdef CONFIG_KASAN_GENERIC
|
||||
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
{
|
||||
if (head) {
|
||||
void *ptr = (void *) head - (unsigned long) func;
|
||||
|
||||
kasan_record_aux_stack_noalloc(ptr);
|
||||
}
|
||||
|
||||
__kvfree_call_rcu(head, func);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
|
||||
#endif
|
||||
|
||||
void __init rcu_init(void)
|
||||
{
|
||||
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
|
||||
|
@ -62,6 +62,7 @@
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/kasan.h>
|
||||
#include <linux/context_tracking.h>
|
||||
#include "../time/tick-internal.h"
|
||||
|
||||
#include "tree.h"
|
||||
@ -75,9 +76,6 @@
|
||||
/* Data structures. */
|
||||
|
||||
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
|
||||
.dynticks_nesting = 1,
|
||||
.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
|
||||
.dynticks = ATOMIC_INIT(1),
|
||||
#ifdef CONFIG_RCU_NOCB_CPU
|
||||
.cblist.flags = SEGCBLIST_RCU_CORE,
|
||||
#endif
|
||||
@ -154,7 +152,11 @@ static void sync_sched_exp_online_cleanup(int cpu);
|
||||
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
|
||||
static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
|
||||
|
||||
/* rcuc/rcub/rcuop kthread realtime priority */
|
||||
/*
|
||||
* rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
|
||||
* real-time priority(enabling/disabling) is controlled by
|
||||
* the extra CONFIG_RCU_NOCB_CPU_CB_BOOST configuration.
|
||||
*/
|
||||
static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
|
||||
module_param(kthread_prio, int, 0444);
|
||||
|
||||
@ -262,56 +264,6 @@ void rcu_softirq_qs(void)
|
||||
rcu_tasks_qs(current, false);
|
||||
}
|
||||
|
||||
/*
|
||||
* Increment the current CPU's rcu_data structure's ->dynticks field
|
||||
* with ordering. Return the new value.
|
||||
*/
|
||||
static noinline noinstr unsigned long rcu_dynticks_inc(int incby)
|
||||
{
|
||||
return arch_atomic_add_return(incby, this_cpu_ptr(&rcu_data.dynticks));
|
||||
}
|
||||
|
||||
/*
|
||||
* Record entry into an extended quiescent state. This is only to be
|
||||
* called when not already in an extended quiescent state, that is,
|
||||
* RCU is watching prior to the call to this function and is no longer
|
||||
* watching upon return.
|
||||
*/
|
||||
static noinstr void rcu_dynticks_eqs_enter(void)
|
||||
{
|
||||
int seq;
|
||||
|
||||
/*
|
||||
* CPUs seeing atomic_add_return() must see prior RCU read-side
|
||||
* critical sections, and we also must force ordering with the
|
||||
* next idle sojourn.
|
||||
*/
|
||||
rcu_dynticks_task_trace_enter(); // Before ->dynticks update!
|
||||
seq = rcu_dynticks_inc(1);
|
||||
// RCU is no longer watching. Better be in extended quiescent state!
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & 0x1));
|
||||
}
|
||||
|
||||
/*
|
||||
* Record exit from an extended quiescent state. This is only to be
|
||||
* called from an extended quiescent state, that is, RCU is not watching
|
||||
* prior to the call to this function and is watching upon return.
|
||||
*/
|
||||
static noinstr void rcu_dynticks_eqs_exit(void)
|
||||
{
|
||||
int seq;
|
||||
|
||||
/*
|
||||
* CPUs seeing atomic_add_return() must see prior idle sojourns,
|
||||
* and we also must force ordering with the next RCU read-side
|
||||
* critical section.
|
||||
*/
|
||||
seq = rcu_dynticks_inc(1);
|
||||
// RCU is now watching. Better not be in an extended quiescent state!
|
||||
rcu_dynticks_task_trace_exit(); // After ->dynticks update!
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !(seq & 0x1));
|
||||
}
|
||||
|
||||
/*
|
||||
* Reset the current CPU's ->dynticks counter to indicate that the
|
||||
* newly onlined CPU is no longer in an extended quiescent state.
|
||||
@ -324,31 +276,19 @@ static noinstr void rcu_dynticks_eqs_exit(void)
|
||||
*/
|
||||
static void rcu_dynticks_eqs_online(void)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
if (atomic_read(&rdp->dynticks) & 0x1)
|
||||
if (ct_dynticks() & RCU_DYNTICKS_IDX)
|
||||
return;
|
||||
rcu_dynticks_inc(1);
|
||||
}
|
||||
|
||||
/*
|
||||
* Is the current CPU in an extended quiescent state?
|
||||
*
|
||||
* No ordering, as we are sampling CPU-local information.
|
||||
*/
|
||||
static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void)
|
||||
{
|
||||
return !(arch_atomic_read(this_cpu_ptr(&rcu_data.dynticks)) & 0x1);
|
||||
ct_state_inc(RCU_DYNTICKS_IDX);
|
||||
}
|
||||
|
||||
/*
|
||||
* Snapshot the ->dynticks counter with full ordering so as to allow
|
||||
* stable comparison of this counter with past and future snapshots.
|
||||
*/
|
||||
static int rcu_dynticks_snap(struct rcu_data *rdp)
|
||||
static int rcu_dynticks_snap(int cpu)
|
||||
{
|
||||
smp_mb(); // Fundamental RCU ordering guarantee.
|
||||
return atomic_read_acquire(&rdp->dynticks);
|
||||
return ct_dynticks_cpu_acquire(cpu);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -357,15 +297,13 @@ static int rcu_dynticks_snap(struct rcu_data *rdp)
|
||||
*/
|
||||
static bool rcu_dynticks_in_eqs(int snap)
|
||||
{
|
||||
return !(snap & 0x1);
|
||||
return !(snap & RCU_DYNTICKS_IDX);
|
||||
}
|
||||
|
||||
/* Return true if the specified CPU is currently idle from an RCU viewpoint. */
|
||||
bool rcu_is_idle_cpu(int cpu)
|
||||
{
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
|
||||
return rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp));
|
||||
return rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu));
|
||||
}
|
||||
|
||||
/*
|
||||
@ -375,7 +313,7 @@ bool rcu_is_idle_cpu(int cpu)
|
||||
*/
|
||||
static bool rcu_dynticks_in_eqs_since(struct rcu_data *rdp, int snap)
|
||||
{
|
||||
return snap != rcu_dynticks_snap(rdp);
|
||||
return snap != rcu_dynticks_snap(rdp->cpu);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -384,19 +322,17 @@ static bool rcu_dynticks_in_eqs_since(struct rcu_data *rdp, int snap)
|
||||
*/
|
||||
bool rcu_dynticks_zero_in_eqs(int cpu, int *vp)
|
||||
{
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
int snap;
|
||||
|
||||
// If not quiescent, force back to earlier extended quiescent state.
|
||||
snap = atomic_read(&rdp->dynticks) & ~0x1;
|
||||
|
||||
snap = ct_dynticks_cpu(cpu) & ~RCU_DYNTICKS_IDX;
|
||||
smp_rmb(); // Order ->dynticks and *vp reads.
|
||||
if (READ_ONCE(*vp))
|
||||
return false; // Non-zero, so report failure;
|
||||
smp_rmb(); // Order *vp read and ->dynticks re-read.
|
||||
|
||||
// If still in the same extended quiescent state, we are good!
|
||||
return snap == atomic_read(&rdp->dynticks);
|
||||
return snap == ct_dynticks_cpu(cpu);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -415,9 +351,9 @@ notrace void rcu_momentary_dyntick_idle(void)
|
||||
int seq;
|
||||
|
||||
raw_cpu_write(rcu_data.rcu_need_heavy_qs, false);
|
||||
seq = rcu_dynticks_inc(2);
|
||||
seq = ct_state_inc(2 * RCU_DYNTICKS_IDX);
|
||||
/* It is illegal to call this from idle state. */
|
||||
WARN_ON_ONCE(!(seq & 0x1));
|
||||
WARN_ON_ONCE(!(seq & RCU_DYNTICKS_IDX));
|
||||
rcu_preempt_deferred_qs(current);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
|
||||
@ -442,13 +378,13 @@ static int rcu_is_cpu_rrupt_from_idle(void)
|
||||
lockdep_assert_irqs_disabled();
|
||||
|
||||
/* Check for counter underflows */
|
||||
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
|
||||
RCU_LOCKDEP_WARN(ct_dynticks_nesting() < 0,
|
||||
"RCU dynticks_nesting counter underflow!");
|
||||
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
|
||||
RCU_LOCKDEP_WARN(ct_dynticks_nmi_nesting() <= 0,
|
||||
"RCU dynticks_nmi_nesting counter underflow/zero!");
|
||||
|
||||
/* Are we at first interrupt nesting level? */
|
||||
nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
|
||||
nesting = ct_dynticks_nmi_nesting();
|
||||
if (nesting > 1)
|
||||
return false;
|
||||
|
||||
@ -458,7 +394,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
|
||||
WARN_ON_ONCE(!nesting && !is_idle_task(current));
|
||||
|
||||
/* Does CPU appear to be idle from an RCU standpoint? */
|
||||
return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
|
||||
return ct_dynticks_nesting() == 0;
|
||||
}
|
||||
|
||||
#define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10)
|
||||
@ -609,66 +545,7 @@ void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
|
||||
|
||||
/*
|
||||
* Enter an RCU extended quiescent state, which can be either the
|
||||
* idle loop or adaptive-tickless usermode execution.
|
||||
*
|
||||
* We crowbar the ->dynticks_nmi_nesting field to zero to allow for
|
||||
* the possibility of usermode upcalls having messed up our count
|
||||
* of interrupt nesting level during the prior busy period.
|
||||
*/
|
||||
static noinstr void rcu_eqs_enter(bool user)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
|
||||
rdp->dynticks_nesting == 0);
|
||||
if (rdp->dynticks_nesting != 1) {
|
||||
// RCU will still be watching, so just do accounting and leave.
|
||||
rdp->dynticks_nesting--;
|
||||
return;
|
||||
}
|
||||
|
||||
lockdep_assert_irqs_disabled();
|
||||
instrumentation_begin();
|
||||
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
|
||||
rcu_preempt_deferred_qs(current);
|
||||
|
||||
// instrumentation for the noinstr rcu_dynticks_eqs_enter()
|
||||
instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
|
||||
|
||||
instrumentation_end();
|
||||
WRITE_ONCE(rdp->dynticks_nesting, 0); /* Avoid irq-access tearing. */
|
||||
// RCU is watching here ...
|
||||
rcu_dynticks_eqs_enter();
|
||||
// ... but is no longer watching here.
|
||||
rcu_dynticks_task_enter();
|
||||
}
|
||||
|
||||
/**
|
||||
* rcu_idle_enter - inform RCU that current CPU is entering idle
|
||||
*
|
||||
* Enter idle mode, in other words, -leave- the mode in which RCU
|
||||
* read-side critical sections can occur. (Though RCU read-side
|
||||
* critical sections can occur in irq handlers in idle, a possibility
|
||||
* handled by irq_enter() and irq_exit().)
|
||||
*
|
||||
* If you add or remove a call to rcu_idle_enter(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void rcu_idle_enter(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
rcu_eqs_enter(false);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_idle_enter);
|
||||
|
||||
#ifdef CONFIG_NO_HZ_FULL
|
||||
|
||||
#if !defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK)
|
||||
#if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK))
|
||||
/*
|
||||
* An empty function that will trigger a reschedule on
|
||||
* IRQ tail once IRQs get re-enabled on userspace/guest resume.
|
||||
@ -690,7 +567,7 @@ static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
|
||||
* last resort is to fire a local irq_work that will trigger a reschedule once IRQs
|
||||
* get re-enabled again.
|
||||
*/
|
||||
noinstr static void rcu_irq_work_resched(void)
|
||||
noinstr void rcu_irq_work_resched(void)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
@ -706,114 +583,7 @@ noinstr static void rcu_irq_work_resched(void)
|
||||
}
|
||||
instrumentation_end();
|
||||
}
|
||||
|
||||
#else
|
||||
static inline void rcu_irq_work_resched(void) { }
|
||||
#endif
|
||||
|
||||
/**
|
||||
* rcu_user_enter - inform RCU that we are resuming userspace.
|
||||
*
|
||||
* Enter RCU idle mode right before resuming userspace. No use of RCU
|
||||
* is permitted between this call and rcu_user_exit(). This way the
|
||||
* CPU doesn't need to maintain the tick for RCU maintenance purposes
|
||||
* when the CPU runs in userspace.
|
||||
*
|
||||
* If you add or remove a call to rcu_user_enter(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
noinstr void rcu_user_enter(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
|
||||
/*
|
||||
* Other than generic entry implementation, we may be past the last
|
||||
* rescheduling opportunity in the entry code. Trigger a self IPI
|
||||
* that will fire and reschedule once we resume in user/guest mode.
|
||||
*/
|
||||
rcu_irq_work_resched();
|
||||
rcu_eqs_enter(true);
|
||||
}
|
||||
|
||||
#endif /* CONFIG_NO_HZ_FULL */
|
||||
|
||||
/**
|
||||
* rcu_nmi_exit - inform RCU of exit from NMI context
|
||||
*
|
||||
* If we are returning from the outermost NMI handler that interrupted an
|
||||
* RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting
|
||||
* to let the RCU grace-period handling know that the CPU is back to
|
||||
* being RCU-idle.
|
||||
*
|
||||
* If you add or remove a call to rcu_nmi_exit(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
noinstr void rcu_nmi_exit(void)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
instrumentation_begin();
|
||||
/*
|
||||
* Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
|
||||
* (We are exiting an NMI handler, so RCU better be paying attention
|
||||
* to us!)
|
||||
*/
|
||||
WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0);
|
||||
WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
|
||||
|
||||
/*
|
||||
* If the nesting level is not 1, the CPU wasn't RCU-idle, so
|
||||
* leave it in non-RCU-idle state.
|
||||
*/
|
||||
if (rdp->dynticks_nmi_nesting != 1) {
|
||||
trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2,
|
||||
atomic_read(&rdp->dynticks));
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
|
||||
rdp->dynticks_nmi_nesting - 2);
|
||||
instrumentation_end();
|
||||
return;
|
||||
}
|
||||
|
||||
/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
|
||||
trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, atomic_read(&rdp->dynticks));
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
|
||||
|
||||
// instrumentation for the noinstr rcu_dynticks_eqs_enter()
|
||||
instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
|
||||
instrumentation_end();
|
||||
|
||||
// RCU is watching here ...
|
||||
rcu_dynticks_eqs_enter();
|
||||
// ... but is no longer watching here.
|
||||
|
||||
if (!in_nmi())
|
||||
rcu_dynticks_task_enter();
|
||||
}
|
||||
|
||||
/**
|
||||
* rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
|
||||
*
|
||||
* Exit from an interrupt handler, which might possibly result in entering
|
||||
* idle mode, in other words, leaving the mode in which read-side critical
|
||||
* sections can occur. The caller must have disabled interrupts.
|
||||
*
|
||||
* This code assumes that the idle loop never does anything that might
|
||||
* result in unbalanced calls to irq_enter() and irq_exit(). If your
|
||||
* architecture's idle loop violates this assumption, RCU will give you what
|
||||
* you deserve, good and hard. But very infrequently and irreproducibly.
|
||||
*
|
||||
* Use things like work queues to work around this limitation.
|
||||
*
|
||||
* You have been warned.
|
||||
*
|
||||
* If you add or remove a call to rcu_irq_exit(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void noinstr rcu_irq_exit(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
rcu_nmi_exit();
|
||||
}
|
||||
#endif /* #if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK)) */
|
||||
|
||||
#ifdef CONFIG_PROVE_RCU
|
||||
/**
|
||||
@ -823,9 +593,9 @@ void rcu_irq_exit_check_preempt(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
|
||||
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) <= 0,
|
||||
RCU_LOCKDEP_WARN(ct_dynticks_nesting() <= 0,
|
||||
"RCU dynticks_nesting counter underflow/zero!");
|
||||
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) !=
|
||||
RCU_LOCKDEP_WARN(ct_dynticks_nmi_nesting() !=
|
||||
DYNTICK_IRQ_NONIDLE,
|
||||
"Bad RCU dynticks_nmi_nesting counter\n");
|
||||
RCU_LOCKDEP_WARN(rcu_dynticks_curr_cpu_in_eqs(),
|
||||
@ -833,94 +603,7 @@ void rcu_irq_exit_check_preempt(void)
|
||||
}
|
||||
#endif /* #ifdef CONFIG_PROVE_RCU */
|
||||
|
||||
/*
|
||||
* Wrapper for rcu_irq_exit() where interrupts are enabled.
|
||||
*
|
||||
* If you add or remove a call to rcu_irq_exit_irqson(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void rcu_irq_exit_irqson(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
local_irq_save(flags);
|
||||
rcu_irq_exit();
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* Exit an RCU extended quiescent state, which can be either the
|
||||
* idle loop or adaptive-tickless usermode execution.
|
||||
*
|
||||
* We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
|
||||
* allow for the possibility of usermode upcalls messing up our count of
|
||||
* interrupt nesting level during the busy period that is just now starting.
|
||||
*/
|
||||
static void noinstr rcu_eqs_exit(bool user)
|
||||
{
|
||||
struct rcu_data *rdp;
|
||||
long oldval;
|
||||
|
||||
lockdep_assert_irqs_disabled();
|
||||
rdp = this_cpu_ptr(&rcu_data);
|
||||
oldval = rdp->dynticks_nesting;
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
|
||||
if (oldval) {
|
||||
// RCU was already watching, so just do accounting and leave.
|
||||
rdp->dynticks_nesting++;
|
||||
return;
|
||||
}
|
||||
rcu_dynticks_task_exit();
|
||||
// RCU is not watching here ...
|
||||
rcu_dynticks_eqs_exit();
|
||||
// ... but is watching here.
|
||||
instrumentation_begin();
|
||||
|
||||
// instrumentation for the noinstr rcu_dynticks_eqs_exit()
|
||||
instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
|
||||
|
||||
trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, atomic_read(&rdp->dynticks));
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
|
||||
WRITE_ONCE(rdp->dynticks_nesting, 1);
|
||||
WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
|
||||
instrumentation_end();
|
||||
}
|
||||
|
||||
/**
|
||||
* rcu_idle_exit - inform RCU that current CPU is leaving idle
|
||||
*
|
||||
* Exit idle mode, in other words, -enter- the mode in which RCU
|
||||
* read-side critical sections can occur.
|
||||
*
|
||||
* If you add or remove a call to rcu_idle_exit(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void rcu_idle_exit(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
local_irq_save(flags);
|
||||
rcu_eqs_exit(false);
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_idle_exit);
|
||||
|
||||
#ifdef CONFIG_NO_HZ_FULL
|
||||
/**
|
||||
* rcu_user_exit - inform RCU that we are exiting userspace.
|
||||
*
|
||||
* Exit RCU idle mode while entering the kernel because it can
|
||||
* run a RCU read side critical section anytime.
|
||||
*
|
||||
* If you add or remove a call to rcu_user_exit(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void noinstr rcu_user_exit(void)
|
||||
{
|
||||
rcu_eqs_exit(true);
|
||||
}
|
||||
|
||||
/**
|
||||
* __rcu_irq_enter_check_tick - Enable scheduler tick on CPU if RCU needs it.
|
||||
*
|
||||
@ -983,109 +666,6 @@ void __rcu_irq_enter_check_tick(void)
|
||||
}
|
||||
#endif /* CONFIG_NO_HZ_FULL */
|
||||
|
||||
/**
|
||||
* rcu_nmi_enter - inform RCU of entry to NMI context
|
||||
*
|
||||
* If the CPU was idle from RCU's viewpoint, update rdp->dynticks and
|
||||
* rdp->dynticks_nmi_nesting to let the RCU grace-period handling know
|
||||
* that the CPU is active. This implementation permits nested NMIs, as
|
||||
* long as the nesting level does not overflow an int. (You will probably
|
||||
* run out of stack space first.)
|
||||
*
|
||||
* If you add or remove a call to rcu_nmi_enter(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
noinstr void rcu_nmi_enter(void)
|
||||
{
|
||||
long incby = 2;
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
/* Complain about underflow. */
|
||||
WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
|
||||
|
||||
/*
|
||||
* If idle from RCU viewpoint, atomically increment ->dynticks
|
||||
* to mark non-idle and increment ->dynticks_nmi_nesting by one.
|
||||
* Otherwise, increment ->dynticks_nmi_nesting by two. This means
|
||||
* if ->dynticks_nmi_nesting is equal to one, we are guaranteed
|
||||
* to be in the outermost NMI handler that interrupted an RCU-idle
|
||||
* period (observation due to Andy Lutomirski).
|
||||
*/
|
||||
if (rcu_dynticks_curr_cpu_in_eqs()) {
|
||||
|
||||
if (!in_nmi())
|
||||
rcu_dynticks_task_exit();
|
||||
|
||||
// RCU is not watching here ...
|
||||
rcu_dynticks_eqs_exit();
|
||||
// ... but is watching here.
|
||||
|
||||
instrumentation_begin();
|
||||
// instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
|
||||
instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
|
||||
// instrumentation for the noinstr rcu_dynticks_eqs_exit()
|
||||
instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
|
||||
|
||||
incby = 1;
|
||||
} else if (!in_nmi()) {
|
||||
instrumentation_begin();
|
||||
rcu_irq_enter_check_tick();
|
||||
} else {
|
||||
instrumentation_begin();
|
||||
}
|
||||
|
||||
trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
|
||||
rdp->dynticks_nmi_nesting,
|
||||
rdp->dynticks_nmi_nesting + incby, atomic_read(&rdp->dynticks));
|
||||
instrumentation_end();
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */
|
||||
rdp->dynticks_nmi_nesting + incby);
|
||||
barrier();
|
||||
}
|
||||
|
||||
/**
|
||||
* rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
|
||||
*
|
||||
* Enter an interrupt handler, which might possibly result in exiting
|
||||
* idle mode, in other words, entering the mode in which read-side critical
|
||||
* sections can occur. The caller must have disabled interrupts.
|
||||
*
|
||||
* Note that the Linux kernel is fully capable of entering an interrupt
|
||||
* handler that it never exits, for example when doing upcalls to user mode!
|
||||
* This code assumes that the idle loop never does upcalls to user mode.
|
||||
* If your architecture's idle loop does do upcalls to user mode (or does
|
||||
* anything else that results in unbalanced calls to the irq_enter() and
|
||||
* irq_exit() functions), RCU will give you what you deserve, good and hard.
|
||||
* But very infrequently and irreproducibly.
|
||||
*
|
||||
* Use things like work queues to work around this limitation.
|
||||
*
|
||||
* You have been warned.
|
||||
*
|
||||
* If you add or remove a call to rcu_irq_enter(), be sure to test with
|
||||
* CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
noinstr void rcu_irq_enter(void)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
rcu_nmi_enter();
|
||||
}
|
||||
|
||||
/*
|
||||
* Wrapper for rcu_irq_enter() where interrupts are enabled.
|
||||
*
|
||||
* If you add or remove a call to rcu_irq_enter_irqson(), be sure to test
|
||||
* with CONFIG_RCU_EQS_DEBUG=y.
|
||||
*/
|
||||
void rcu_irq_enter_irqson(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
local_irq_save(flags);
|
||||
rcu_irq_enter();
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check to see if any future non-offloaded RCU-related work will need
|
||||
* to be done by the current CPU, even if none need be done immediately,
|
||||
@ -1223,7 +803,7 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp)
|
||||
*/
|
||||
static int dyntick_save_progress_counter(struct rcu_data *rdp)
|
||||
{
|
||||
rdp->dynticks_snap = rcu_dynticks_snap(rdp);
|
||||
rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu);
|
||||
if (rcu_dynticks_in_eqs(rdp->dynticks_snap)) {
|
||||
trace_rcu_fqs(rcu_state.name, rdp->gp_seq, rdp->cpu, TPS("dti"));
|
||||
rcu_gpnum_ovf(rdp->mynode, rdp);
|
||||
@ -1775,6 +1355,79 @@ static void rcu_strict_gp_boundary(void *unused)
|
||||
invoke_rcu_core();
|
||||
}
|
||||
|
||||
// Has rcu_init() been invoked? This is used (for example) to determine
|
||||
// whether spinlocks may be acquired safely.
|
||||
static bool rcu_init_invoked(void)
|
||||
{
|
||||
return !!rcu_state.n_online_cpus;
|
||||
}
|
||||
|
||||
// Make the polled API aware of the beginning of a grace period.
|
||||
static void rcu_poll_gp_seq_start(unsigned long *snap)
|
||||
{
|
||||
struct rcu_node *rnp = rcu_get_root();
|
||||
|
||||
if (rcu_init_invoked())
|
||||
raw_lockdep_assert_held_rcu_node(rnp);
|
||||
|
||||
// If RCU was idle, note beginning of GP.
|
||||
if (!rcu_seq_state(rcu_state.gp_seq_polled))
|
||||
rcu_seq_start(&rcu_state.gp_seq_polled);
|
||||
|
||||
// Either way, record current state.
|
||||
*snap = rcu_state.gp_seq_polled;
|
||||
}
|
||||
|
||||
// Make the polled API aware of the end of a grace period.
|
||||
static void rcu_poll_gp_seq_end(unsigned long *snap)
|
||||
{
|
||||
struct rcu_node *rnp = rcu_get_root();
|
||||
|
||||
if (rcu_init_invoked())
|
||||
raw_lockdep_assert_held_rcu_node(rnp);
|
||||
|
||||
// If the previously noted GP is still in effect, record the
|
||||
// end of that GP. Either way, zero counter to avoid counter-wrap
|
||||
// problems.
|
||||
if (*snap && *snap == rcu_state.gp_seq_polled) {
|
||||
rcu_seq_end(&rcu_state.gp_seq_polled);
|
||||
rcu_state.gp_seq_polled_snap = 0;
|
||||
rcu_state.gp_seq_polled_exp_snap = 0;
|
||||
} else {
|
||||
*snap = 0;
|
||||
}
|
||||
}
|
||||
|
||||
// Make the polled API aware of the beginning of a grace period, but
|
||||
// where caller does not hold the root rcu_node structure's lock.
|
||||
static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
|
||||
{
|
||||
struct rcu_node *rnp = rcu_get_root();
|
||||
|
||||
if (rcu_init_invoked()) {
|
||||
lockdep_assert_irqs_enabled();
|
||||
raw_spin_lock_irq_rcu_node(rnp);
|
||||
}
|
||||
rcu_poll_gp_seq_start(snap);
|
||||
if (rcu_init_invoked())
|
||||
raw_spin_unlock_irq_rcu_node(rnp);
|
||||
}
|
||||
|
||||
// Make the polled API aware of the end of a grace period, but where
|
||||
// caller does not hold the root rcu_node structure's lock.
|
||||
static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
|
||||
{
|
||||
struct rcu_node *rnp = rcu_get_root();
|
||||
|
||||
if (rcu_init_invoked()) {
|
||||
lockdep_assert_irqs_enabled();
|
||||
raw_spin_lock_irq_rcu_node(rnp);
|
||||
}
|
||||
rcu_poll_gp_seq_end(snap);
|
||||
if (rcu_init_invoked())
|
||||
raw_spin_unlock_irq_rcu_node(rnp);
|
||||
}
|
||||
|
||||
/*
|
||||
* Initialize a new grace period. Return false if no grace period required.
|
||||
*/
|
||||
@ -1810,6 +1463,7 @@ static noinline_for_stack bool rcu_gp_init(void)
|
||||
rcu_seq_start(&rcu_state.gp_seq);
|
||||
ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
|
||||
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
|
||||
rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
|
||||
raw_spin_unlock_irq_rcu_node(rnp);
|
||||
|
||||
/*
|
||||
@ -1971,19 +1625,23 @@ static void rcu_gp_fqs(bool first_time)
|
||||
*/
|
||||
static noinline_for_stack void rcu_gp_fqs_loop(void)
|
||||
{
|
||||
bool first_gp_fqs;
|
||||
bool first_gp_fqs = true;
|
||||
int gf = 0;
|
||||
unsigned long j;
|
||||
int ret;
|
||||
struct rcu_node *rnp = rcu_get_root();
|
||||
|
||||
first_gp_fqs = true;
|
||||
j = READ_ONCE(jiffies_till_first_fqs);
|
||||
if (rcu_state.cbovld)
|
||||
gf = RCU_GP_FLAG_OVLD;
|
||||
ret = 0;
|
||||
for (;;) {
|
||||
if (!ret) {
|
||||
if (rcu_state.cbovld) {
|
||||
j = (j + 2) / 3;
|
||||
if (j <= 0)
|
||||
j = 1;
|
||||
}
|
||||
if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
|
||||
WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
|
||||
/*
|
||||
* jiffies_force_qs before RCU_GP_WAIT_FQS state
|
||||
@ -2001,7 +1659,15 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
|
||||
rcu_gp_torture_wait();
|
||||
WRITE_ONCE(rcu_state.gp_state, RCU_GP_DOING_FQS);
|
||||
/* Locking provides needed memory barriers. */
|
||||
/* If grace period done, leave loop. */
|
||||
/*
|
||||
* Exit the loop if the root rcu_node structure indicates that the grace period
|
||||
* has ended, leave the loop. The rcu_preempt_blocked_readers_cgp(rnp) check
|
||||
* is required only for single-node rcu_node trees because readers blocking
|
||||
* the current grace period are queued only on leaf rcu_node structures.
|
||||
* For multi-node trees, checking the root node's ->qsmask suffices, because a
|
||||
* given root node's ->qsmask bit is cleared only when all CPUs and tasks from
|
||||
* the corresponding leaf nodes have passed through their quiescent state.
|
||||
*/
|
||||
if (!READ_ONCE(rnp->qsmask) &&
|
||||
!rcu_preempt_blocked_readers_cgp(rnp))
|
||||
break;
|
||||
@ -2069,6 +1735,7 @@ static noinline void rcu_gp_cleanup(void)
|
||||
* safe for us to drop the lock in order to mark the grace
|
||||
* period as completed in all of the rcu_node structures.
|
||||
*/
|
||||
rcu_poll_gp_seq_end(&rcu_state.gp_seq_polled_snap);
|
||||
raw_spin_unlock_irq_rcu_node(rnp);
|
||||
|
||||
/*
|
||||
@ -2530,7 +2197,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
|
||||
trace_rcu_batch_end(rcu_state.name, 0,
|
||||
!rcu_segcblist_empty(&rdp->cblist),
|
||||
need_resched(), is_idle_task(current),
|
||||
rcu_is_callbacks_kthread());
|
||||
rcu_is_callbacks_kthread(rdp));
|
||||
return;
|
||||
}
|
||||
|
||||
@ -2608,7 +2275,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
|
||||
rcu_nocb_lock_irqsave(rdp, flags);
|
||||
rdp->n_cbs_invoked += count;
|
||||
trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
|
||||
is_idle_task(current), rcu_is_callbacks_kthread());
|
||||
is_idle_task(current), rcu_is_callbacks_kthread(rdp));
|
||||
|
||||
/* Update counts and requeue any remaining callbacks. */
|
||||
rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
|
||||
@ -3211,7 +2878,6 @@ struct kfree_rcu_cpu_work {
|
||||
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
|
||||
* @lock: Synchronize access to this structure
|
||||
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
|
||||
* @monitor_todo: Tracks whether a @monitor_work delayed work is pending
|
||||
* @initialized: The @rcu_work fields have been initialized
|
||||
* @count: Number of objects for which GP not started
|
||||
* @bkvcache:
|
||||
@ -3236,7 +2902,6 @@ struct kfree_rcu_cpu {
|
||||
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
|
||||
raw_spinlock_t lock;
|
||||
struct delayed_work monitor_work;
|
||||
bool monitor_todo;
|
||||
bool initialized;
|
||||
int count;
|
||||
|
||||
@ -3416,6 +3081,18 @@ static void kfree_rcu_work(struct work_struct *work)
|
||||
}
|
||||
}
|
||||
|
||||
static bool
|
||||
need_offload_krc(struct kfree_rcu_cpu *krcp)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < FREE_N_CHANNELS; i++)
|
||||
if (krcp->bkvhead[i])
|
||||
return true;
|
||||
|
||||
return !!krcp->head;
|
||||
}
|
||||
|
||||
/*
|
||||
* This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
|
||||
*/
|
||||
@ -3472,9 +3149,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
|
||||
// of the channels that is still busy we should rearm the
|
||||
// work to repeat an attempt. Because previous batches are
|
||||
// still in progress.
|
||||
if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
|
||||
krcp->monitor_todo = false;
|
||||
else
|
||||
if (need_offload_krc(krcp))
|
||||
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
|
||||
|
||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
@ -3662,11 +3337,8 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
WRITE_ONCE(krcp->count, krcp->count + 1);
|
||||
|
||||
// Set timer to drain after KFREE_DRAIN_JIFFIES.
|
||||
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
|
||||
!krcp->monitor_todo) {
|
||||
krcp->monitor_todo = true;
|
||||
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
|
||||
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
|
||||
}
|
||||
|
||||
unlock_return:
|
||||
krc_this_cpu_unlock(krcp, flags);
|
||||
@ -3741,14 +3413,8 @@ void __init kfree_rcu_scheduler_running(void)
|
||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||
|
||||
raw_spin_lock_irqsave(&krcp->lock, flags);
|
||||
if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
|
||||
krcp->monitor_todo) {
|
||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
continue;
|
||||
}
|
||||
krcp->monitor_todo = true;
|
||||
schedule_delayed_work_on(cpu, &krcp->monitor_work,
|
||||
KFREE_DRAIN_JIFFIES);
|
||||
if (need_offload_krc(krcp))
|
||||
schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
|
||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
}
|
||||
}
|
||||
@ -3837,8 +3503,18 @@ void synchronize_rcu(void)
|
||||
lock_is_held(&rcu_lock_map) ||
|
||||
lock_is_held(&rcu_sched_lock_map),
|
||||
"Illegal synchronize_rcu() in RCU read-side critical section");
|
||||
if (rcu_blocking_is_gp())
|
||||
if (rcu_blocking_is_gp()) {
|
||||
// Note well that this code runs with !PREEMPT && !SMP.
|
||||
// In addition, all code that advances grace periods runs at
|
||||
// process level. Therefore, this normal GP overlaps with
|
||||
// other normal GPs only by being fully nested within them,
|
||||
// which allows reuse of ->gp_seq_polled_snap.
|
||||
rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_snap);
|
||||
rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_snap);
|
||||
if (rcu_init_invoked())
|
||||
cond_resched_tasks_rcu_qs();
|
||||
return; // Context allows vacuous grace periods.
|
||||
}
|
||||
if (rcu_gp_is_expedited())
|
||||
synchronize_rcu_expedited();
|
||||
else
|
||||
@ -3860,7 +3536,7 @@ unsigned long get_state_synchronize_rcu(void)
|
||||
* before the load from ->gp_seq.
|
||||
*/
|
||||
smp_mb(); /* ^^^ */
|
||||
return rcu_seq_snap(&rcu_state.gp_seq);
|
||||
return rcu_seq_snap(&rcu_state.gp_seq_polled);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
|
||||
|
||||
@ -3889,7 +3565,13 @@ unsigned long start_poll_synchronize_rcu(void)
|
||||
rdp = this_cpu_ptr(&rcu_data);
|
||||
rnp = rdp->mynode;
|
||||
raw_spin_lock_rcu_node(rnp); // irqs already disabled.
|
||||
needwake = rcu_start_this_gp(rnp, rdp, gp_seq);
|
||||
// Note it is possible for a grace period to have elapsed between
|
||||
// the above call to get_state_synchronize_rcu() and the below call
|
||||
// to rcu_seq_snap. This is OK, the worst that happens is that we
|
||||
// get a grace period that no one needed. These accesses are ordered
|
||||
// by smp_mb(), and we are accessing them in the opposite order
|
||||
// from which they are updated at grace-period start, as required.
|
||||
needwake = rcu_start_this_gp(rnp, rdp, rcu_seq_snap(&rcu_state.gp_seq));
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
if (needwake)
|
||||
rcu_gp_kthread_wake();
|
||||
@ -3911,7 +3593,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
|
||||
*
|
||||
* Yes, this function does not take counter wrap into account.
|
||||
* But counter wrap is harmless. If the counter wraps, we have waited for
|
||||
* more than 2 billion grace periods (and way more on a 64-bit system!).
|
||||
* more than a billion grace periods (and way more on a 64-bit system!).
|
||||
* Those needing to keep oldstate values for very long time periods
|
||||
* (many hours even on 32-bit systems) should check them occasionally
|
||||
* and either refresh them or set a flag indicating that the grace period
|
||||
@ -3924,7 +3606,8 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
|
||||
*/
|
||||
bool poll_state_synchronize_rcu(unsigned long oldstate)
|
||||
{
|
||||
if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
|
||||
if (oldstate == RCU_GET_STATE_COMPLETED ||
|
||||
rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) {
|
||||
smp_mb(); /* Ensure GP ends before subsequent accesses. */
|
||||
return true;
|
||||
}
|
||||
@ -3935,20 +3618,20 @@ EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
|
||||
/**
|
||||
* cond_synchronize_rcu - Conditionally wait for an RCU grace period
|
||||
*
|
||||
* @oldstate: value from get_state_synchronize_rcu() or start_poll_synchronize_rcu()
|
||||
* @oldstate: value from get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or start_poll_synchronize_rcu_expedited()
|
||||
*
|
||||
* If a full RCU grace period has elapsed since the earlier call to
|
||||
* get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just return.
|
||||
* Otherwise, invoke synchronize_rcu() to wait for a full grace period.
|
||||
*
|
||||
* Yes, this function does not take counter wrap into account. But
|
||||
* counter wrap is harmless. If the counter wraps, we have waited for
|
||||
* Yes, this function does not take counter wrap into account.
|
||||
* But counter wrap is harmless. If the counter wraps, we have waited for
|
||||
* more than 2 billion grace periods (and way more on a 64-bit system!),
|
||||
* so waiting for one additional grace period should be just fine.
|
||||
* so waiting for a couple of additional grace periods should be just fine.
|
||||
*
|
||||
* This function provides the same memory-ordering guarantees that
|
||||
* would be provided by a synchronize_rcu() that was invoked at the call
|
||||
* to the function that provided @oldstate, and that returned at the end
|
||||
* to the function that provided @oldstate and that returned at the end
|
||||
* of this function.
|
||||
*/
|
||||
void cond_synchronize_rcu(unsigned long oldstate)
|
||||
@ -4221,13 +3904,14 @@ static void rcu_init_new_rnp(struct rcu_node *rnp_leaf)
|
||||
static void __init
|
||||
rcu_boot_init_percpu_data(int cpu)
|
||||
{
|
||||
struct context_tracking *ct = this_cpu_ptr(&context_tracking);
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
|
||||
/* Set up local state, ensuring consistent view of global state. */
|
||||
rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
|
||||
INIT_WORK(&rdp->strict_work, strict_work_handler);
|
||||
WARN_ON_ONCE(rdp->dynticks_nesting != 1);
|
||||
WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp)));
|
||||
WARN_ON_ONCE(ct->dynticks_nesting != 1);
|
||||
WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu)));
|
||||
rdp->barrier_seq_snap = rcu_state.barrier_sequence;
|
||||
rdp->rcu_ofl_gp_seq = rcu_state.gp_seq;
|
||||
rdp->rcu_ofl_gp_flags = RCU_GP_CLEANED;
|
||||
@ -4251,6 +3935,7 @@ rcu_boot_init_percpu_data(int cpu)
|
||||
int rcutree_prepare_cpu(unsigned int cpu)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
struct rcu_node *rnp = rcu_get_root();
|
||||
|
||||
@ -4259,7 +3944,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
|
||||
rdp->qlen_last_fqs_check = 0;
|
||||
rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs);
|
||||
rdp->blimit = blimit;
|
||||
rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */
|
||||
ct->dynticks_nesting = 1; /* CPU not up, no tearing. */
|
||||
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
|
||||
|
||||
/*
|
||||
@ -4441,6 +4126,7 @@ void rcu_report_dead(unsigned int cpu)
|
||||
rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
|
||||
if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
|
||||
/* Report quiescent state -before- changing ->qsmaskinitnext! */
|
||||
rcu_disable_urgency_upon_qs(rdp);
|
||||
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
}
|
||||
@ -4486,6 +4172,7 @@ void rcutree_migrate_callbacks(int cpu)
|
||||
needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
|
||||
rcu_segcblist_disable(&rdp->cblist);
|
||||
WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
|
||||
check_cb_ovld_locked(my_rdp, my_rnp);
|
||||
if (rcu_rdp_is_offloaded(my_rdp)) {
|
||||
raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
|
||||
__call_rcu_nocb_wake(my_rdp, true, flags);
|
||||
@ -4701,6 +4388,9 @@ static void __init rcu_init_one(void)
|
||||
init_waitqueue_head(&rnp->exp_wq[3]);
|
||||
spin_lock_init(&rnp->exp_lock);
|
||||
mutex_init(&rnp->boost_kthread_mutex);
|
||||
raw_spin_lock_init(&rnp->exp_poll_lock);
|
||||
rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED;
|
||||
INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp);
|
||||
}
|
||||
}
|
||||
|
||||
@ -4926,6 +4616,10 @@ void __init rcu_init(void)
|
||||
qovld_calc = DEFAULT_RCU_QOVLD_MULT * qhimark;
|
||||
else
|
||||
qovld_calc = qovld;
|
||||
|
||||
// Kick-start any polled grace periods that started early.
|
||||
if (!(per_cpu_ptr(&rcu_data, cpu)->mynode->exp_seq_poll_rq & 0x1))
|
||||
(void)start_poll_synchronize_rcu_expedited();
|
||||
}
|
||||
|
||||
#include "tree_stall.h"
|
||||
|
@ -133,6 +133,10 @@ struct rcu_node {
|
||||
wait_queue_head_t exp_wq[4];
|
||||
struct rcu_exp_work rew;
|
||||
bool exp_need_flush; /* Need to flush workitem? */
|
||||
raw_spinlock_t exp_poll_lock;
|
||||
/* Lock and data for polled expedited grace periods. */
|
||||
unsigned long exp_seq_poll_rq;
|
||||
struct work_struct exp_poll_wq;
|
||||
} ____cacheline_internodealigned_in_smp;
|
||||
|
||||
/*
|
||||
@ -187,9 +191,6 @@ struct rcu_data {
|
||||
|
||||
/* 3) dynticks interface. */
|
||||
int dynticks_snap; /* Per-GP tracking for dynticks. */
|
||||
long dynticks_nesting; /* Track process nesting level. */
|
||||
long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
|
||||
atomic_t dynticks; /* Even value for idle, else odd. */
|
||||
bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */
|
||||
bool rcu_urgent_qs; /* GP old need light quiescent state. */
|
||||
bool rcu_forced_tick; /* Forced tick to provide QS. */
|
||||
@ -235,6 +236,7 @@ struct rcu_data {
|
||||
* if rdp_gp.
|
||||
*/
|
||||
struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */
|
||||
struct rcu_data *nocb_toggling_rdp; /* rdp queued for (de-)offloading */
|
||||
|
||||
/* The following fields are used by CB kthread, hence new cacheline. */
|
||||
struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
|
||||
@ -323,6 +325,9 @@ struct rcu_state {
|
||||
short gp_state; /* GP kthread sleep state. */
|
||||
unsigned long gp_wake_time; /* Last GP kthread wake. */
|
||||
unsigned long gp_wake_seq; /* ->gp_seq at ^^^. */
|
||||
unsigned long gp_seq_polled; /* GP seq for polled API. */
|
||||
unsigned long gp_seq_polled_snap; /* ->gp_seq_polled at normal GP start. */
|
||||
unsigned long gp_seq_polled_exp_snap; /* ->gp_seq_polled at expedited GP start. */
|
||||
|
||||
/* End of fields guarded by root rcu_node's lock. */
|
||||
|
||||
@ -425,12 +430,11 @@ static void rcu_flavor_sched_clock_irq(int user);
|
||||
static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
|
||||
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
|
||||
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
|
||||
static bool rcu_is_callbacks_kthread(void);
|
||||
static bool rcu_is_callbacks_kthread(struct rcu_data *rdp);
|
||||
static void rcu_cpu_kthread_setup(unsigned int cpu);
|
||||
static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp);
|
||||
static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
|
||||
static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
|
||||
static void rcu_preempt_deferred_qs(struct task_struct *t);
|
||||
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
|
||||
static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
|
||||
static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
|
||||
@ -470,10 +474,6 @@ do { \
|
||||
|
||||
static void rcu_bind_gp_kthread(void);
|
||||
static bool rcu_nohz_full_cpu(void);
|
||||
static void rcu_dynticks_task_enter(void);
|
||||
static void rcu_dynticks_task_exit(void);
|
||||
static void rcu_dynticks_task_trace_enter(void);
|
||||
static void rcu_dynticks_task_trace_exit(void);
|
||||
|
||||
/* Forward declarations for tree_stall.h */
|
||||
static void record_gp_stall_check_time(void);
|
||||
@ -481,3 +481,6 @@ static void rcu_iw_handler(struct irq_work *iwp);
|
||||
static void check_cpu_stall(struct rcu_data *rdp);
|
||||
static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
|
||||
const unsigned long gpssdelay);
|
||||
|
||||
/* Forward declarations for tree_exp.h. */
|
||||
static void sync_rcu_do_polled_gp(struct work_struct *wp);
|
||||
|
@ -18,6 +18,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp);
|
||||
static void rcu_exp_gp_seq_start(void)
|
||||
{
|
||||
rcu_seq_start(&rcu_state.expedited_sequence);
|
||||
rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_exp_snap);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -34,6 +35,7 @@ static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void)
|
||||
*/
|
||||
static void rcu_exp_gp_seq_end(void)
|
||||
{
|
||||
rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_exp_snap);
|
||||
rcu_seq_end(&rcu_state.expedited_sequence);
|
||||
smp_mb(); /* Ensure that consecutive grace periods serialize. */
|
||||
}
|
||||
@ -356,7 +358,7 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
|
||||
!(rnp->qsmaskinitnext & mask)) {
|
||||
mask_ofl_test |= mask;
|
||||
} else {
|
||||
snap = rcu_dynticks_snap(rdp);
|
||||
snap = rcu_dynticks_snap(cpu);
|
||||
if (rcu_dynticks_in_eqs(snap))
|
||||
mask_ofl_test |= mask;
|
||||
else
|
||||
@ -621,7 +623,6 @@ static void synchronize_rcu_expedited_wait(void)
|
||||
return;
|
||||
if (rcu_stall_is_suppressed())
|
||||
continue;
|
||||
panic_on_rcu_stall();
|
||||
trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
|
||||
pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
|
||||
rcu_state.name);
|
||||
@ -636,10 +637,11 @@ static void synchronize_rcu_expedited_wait(void)
|
||||
continue;
|
||||
ndetected++;
|
||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
pr_cont(" %d-%c%c%c", cpu,
|
||||
pr_cont(" %d-%c%c%c%c", cpu,
|
||||
"O."[!!cpu_online(cpu)],
|
||||
"o."[!!(rdp->grpmask & rnp->expmaskinit)],
|
||||
"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
|
||||
"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
|
||||
"D."[!!(rdp->cpu_no_qs.b.exp)]);
|
||||
}
|
||||
}
|
||||
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
|
||||
@ -669,6 +671,7 @@ static void synchronize_rcu_expedited_wait(void)
|
||||
}
|
||||
}
|
||||
jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
|
||||
panic_on_rcu_stall();
|
||||
}
|
||||
}
|
||||
|
||||
@ -913,8 +916,18 @@ void synchronize_rcu_expedited(void)
|
||||
"Illegal synchronize_rcu_expedited() in RCU read-side critical section");
|
||||
|
||||
/* Is the state is such that the call is a grace period? */
|
||||
if (rcu_blocking_is_gp())
|
||||
return;
|
||||
if (rcu_blocking_is_gp()) {
|
||||
// Note well that this code runs with !PREEMPT && !SMP.
|
||||
// In addition, all code that advances grace periods runs
|
||||
// at process level. Therefore, this expedited GP overlaps
|
||||
// with other expedited GPs only by being fully nested within
|
||||
// them, which allows reuse of ->gp_seq_polled_exp_snap.
|
||||
rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_exp_snap);
|
||||
rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_exp_snap);
|
||||
if (rcu_init_invoked())
|
||||
cond_resched();
|
||||
return; // Context allows vacuous grace periods.
|
||||
}
|
||||
|
||||
/* If expedited grace periods are prohibited, fall back to normal. */
|
||||
if (rcu_gp_is_normal()) {
|
||||
@ -950,3 +963,93 @@ void synchronize_rcu_expedited(void)
|
||||
synchronize_rcu_expedited_destroy_work(&rew);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
|
||||
|
||||
/*
|
||||
* Ensure that start_poll_synchronize_rcu_expedited() has the expedited
|
||||
* RCU grace periods that it needs.
|
||||
*/
|
||||
static void sync_rcu_do_polled_gp(struct work_struct *wp)
|
||||
{
|
||||
unsigned long flags;
|
||||
int i = 0;
|
||||
struct rcu_node *rnp = container_of(wp, struct rcu_node, exp_poll_wq);
|
||||
unsigned long s;
|
||||
|
||||
raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
|
||||
s = rnp->exp_seq_poll_rq;
|
||||
rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED;
|
||||
raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
|
||||
if (s == RCU_GET_STATE_COMPLETED)
|
||||
return;
|
||||
while (!poll_state_synchronize_rcu(s)) {
|
||||
synchronize_rcu_expedited();
|
||||
if (i == 10 || i == 20)
|
||||
pr_info("%s: i = %d s = %lx gp_seq_polled = %lx\n", __func__, i, s, READ_ONCE(rcu_state.gp_seq_polled));
|
||||
i++;
|
||||
}
|
||||
raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
|
||||
s = rnp->exp_seq_poll_rq;
|
||||
if (poll_state_synchronize_rcu(s))
|
||||
rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED;
|
||||
raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
* start_poll_synchronize_rcu_expedited - Snapshot current RCU state and start expedited grace period
|
||||
*
|
||||
* Returns a cookie to pass to a call to cond_synchronize_rcu(),
|
||||
* cond_synchronize_rcu_expedited(), or poll_state_synchronize_rcu(),
|
||||
* allowing them to determine whether or not any sort of grace period has
|
||||
* elapsed in the meantime. If the needed expedited grace period is not
|
||||
* already slated to start, initiates that grace period.
|
||||
*/
|
||||
unsigned long start_poll_synchronize_rcu_expedited(void)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct rcu_data *rdp;
|
||||
struct rcu_node *rnp;
|
||||
unsigned long s;
|
||||
|
||||
s = get_state_synchronize_rcu();
|
||||
rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
|
||||
rnp = rdp->mynode;
|
||||
if (rcu_init_invoked())
|
||||
raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
|
||||
if (!poll_state_synchronize_rcu(s)) {
|
||||
rnp->exp_seq_poll_rq = s;
|
||||
if (rcu_init_invoked())
|
||||
queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
|
||||
}
|
||||
if (rcu_init_invoked())
|
||||
raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
|
||||
|
||||
return s;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
|
||||
|
||||
/**
|
||||
* cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
|
||||
*
|
||||
* @oldstate: value from get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or start_poll_synchronize_rcu_expedited()
|
||||
*
|
||||
* If any type of full RCU grace period has elapsed since the earlier
|
||||
* call to get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
|
||||
* or start_poll_synchronize_rcu_expedited(), just return. Otherwise,
|
||||
* invoke synchronize_rcu_expedited() to wait for a full grace period.
|
||||
*
|
||||
* Yes, this function does not take counter wrap into account.
|
||||
* But counter wrap is harmless. If the counter wraps, we have waited for
|
||||
* more than 2 billion grace periods (and way more on a 64-bit system!),
|
||||
* so waiting for a couple of additional grace periods should be just fine.
|
||||
*
|
||||
* This function provides the same memory-ordering guarantees that
|
||||
* would be provided by a synchronize_rcu() that was invoked at the call
|
||||
* to the function that provided @oldstate and that returned at the end
|
||||
* of this function.
|
||||
*/
|
||||
void cond_synchronize_rcu_expedited(unsigned long oldstate)
|
||||
{
|
||||
if (!poll_state_synchronize_rcu(oldstate))
|
||||
synchronize_rcu_expedited();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
|
||||
|
@ -546,52 +546,51 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Check if we ignore this rdp.
|
||||
*
|
||||
* We check that without holding the nocb lock but
|
||||
* we make sure not to miss a freshly offloaded rdp
|
||||
* with the current ordering:
|
||||
*
|
||||
* rdp_offload_toggle() nocb_gp_enabled_cb()
|
||||
* ------------------------- ----------------------------
|
||||
* WRITE flags LOCK nocb_gp_lock
|
||||
* LOCK nocb_gp_lock READ/WRITE nocb_gp_sleep
|
||||
* READ/WRITE nocb_gp_sleep UNLOCK nocb_gp_lock
|
||||
* UNLOCK nocb_gp_lock READ flags
|
||||
*/
|
||||
static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp)
|
||||
{
|
||||
u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP;
|
||||
|
||||
return rcu_segcblist_test_flags(&rdp->cblist, flags);
|
||||
}
|
||||
|
||||
static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp,
|
||||
bool *needwake_state)
|
||||
static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
|
||||
bool *wake_state)
|
||||
{
|
||||
struct rcu_segcblist *cblist = &rdp->cblist;
|
||||
unsigned long flags;
|
||||
int ret;
|
||||
|
||||
if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
|
||||
if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
|
||||
rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
|
||||
if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
|
||||
*needwake_state = true;
|
||||
}
|
||||
return false;
|
||||
rcu_nocb_lock_irqsave(rdp, flags);
|
||||
if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) &&
|
||||
!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
|
||||
/*
|
||||
* Offloading. Set our flag and notify the offload worker.
|
||||
* We will handle this rdp until it ever gets de-offloaded.
|
||||
*/
|
||||
rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
|
||||
if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
|
||||
*wake_state = true;
|
||||
ret = 1;
|
||||
} else if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) &&
|
||||
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
|
||||
/*
|
||||
* De-offloading. Clear our flag and notify the de-offload worker.
|
||||
* We will ignore this rdp until it ever gets re-offloaded.
|
||||
*/
|
||||
rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
|
||||
if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
|
||||
*wake_state = true;
|
||||
ret = 0;
|
||||
} else {
|
||||
WARN_ON_ONCE(1);
|
||||
ret = -1;
|
||||
}
|
||||
|
||||
/*
|
||||
* De-offloading. Clear our flag and notify the de-offload worker.
|
||||
* We will ignore this rdp until it ever gets re-offloaded.
|
||||
*/
|
||||
WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
|
||||
rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
|
||||
if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
|
||||
*needwake_state = true;
|
||||
return true;
|
||||
rcu_nocb_unlock_irqrestore(rdp, flags);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void nocb_gp_sleep(struct rcu_data *my_rdp, int cpu)
|
||||
{
|
||||
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
|
||||
swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
|
||||
!READ_ONCE(my_rdp->nocb_gp_sleep));
|
||||
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep"));
|
||||
}
|
||||
|
||||
/*
|
||||
* No-CBs GP kthreads come here to wait for additional callbacks to show up
|
||||
@ -609,7 +608,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||
bool needwait_gp = false; // This prevents actual uninitialized use.
|
||||
bool needwake;
|
||||
bool needwake_gp;
|
||||
struct rcu_data *rdp;
|
||||
struct rcu_data *rdp, *rdp_toggling = NULL;
|
||||
struct rcu_node *rnp;
|
||||
unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
|
||||
bool wasempty = false;
|
||||
@ -634,19 +633,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||
* is added to the list, so the skipped-over rcu_data structures
|
||||
* won't be ignored for long.
|
||||
*/
|
||||
list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
|
||||
bool needwake_state = false;
|
||||
|
||||
if (!nocb_gp_enabled_cb(rdp))
|
||||
continue;
|
||||
list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) {
|
||||
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
|
||||
rcu_nocb_lock_irqsave(rdp, flags);
|
||||
if (nocb_gp_update_state_deoffloading(rdp, &needwake_state)) {
|
||||
rcu_nocb_unlock_irqrestore(rdp, flags);
|
||||
if (needwake_state)
|
||||
swake_up_one(&rdp->nocb_state_wq);
|
||||
continue;
|
||||
}
|
||||
lockdep_assert_held(&rdp->nocb_lock);
|
||||
bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
|
||||
if (bypass_ncbs &&
|
||||
(time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
|
||||
@ -656,8 +646,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||
bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
|
||||
} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
|
||||
rcu_nocb_unlock_irqrestore(rdp, flags);
|
||||
if (needwake_state)
|
||||
swake_up_one(&rdp->nocb_state_wq);
|
||||
continue; /* No callbacks here, try next. */
|
||||
}
|
||||
if (bypass_ncbs) {
|
||||
@ -705,8 +693,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||
}
|
||||
if (needwake_gp)
|
||||
rcu_gp_kthread_wake();
|
||||
if (needwake_state)
|
||||
swake_up_one(&rdp->nocb_state_wq);
|
||||
}
|
||||
|
||||
my_rdp->nocb_gp_bypass = bypass;
|
||||
@ -723,13 +709,19 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||
/* Polling, so trace if first poll in the series. */
|
||||
if (gotcbs)
|
||||
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll"));
|
||||
schedule_timeout_idle(1);
|
||||
if (list_empty(&my_rdp->nocb_head_rdp)) {
|
||||
raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
|
||||
if (!my_rdp->nocb_toggling_rdp)
|
||||
WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
|
||||
raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
|
||||
/* Wait for any offloading rdp */
|
||||
nocb_gp_sleep(my_rdp, cpu);
|
||||
} else {
|
||||
schedule_timeout_idle(1);
|
||||
}
|
||||
} else if (!needwait_gp) {
|
||||
/* Wait for callbacks to appear. */
|
||||
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
|
||||
swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
|
||||
!READ_ONCE(my_rdp->nocb_gp_sleep));
|
||||
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep"));
|
||||
nocb_gp_sleep(my_rdp, cpu);
|
||||
} else {
|
||||
rnp = my_rdp->mynode;
|
||||
trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("StartWait"));
|
||||
@ -739,15 +731,49 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||
!READ_ONCE(my_rdp->nocb_gp_sleep));
|
||||
trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait"));
|
||||
}
|
||||
|
||||
if (!rcu_nocb_poll) {
|
||||
raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
|
||||
// (De-)queue an rdp to/from the group if its nocb state is changing
|
||||
rdp_toggling = my_rdp->nocb_toggling_rdp;
|
||||
if (rdp_toggling)
|
||||
my_rdp->nocb_toggling_rdp = NULL;
|
||||
|
||||
if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
|
||||
WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
|
||||
del_timer(&my_rdp->nocb_timer);
|
||||
}
|
||||
WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
|
||||
raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
|
||||
} else {
|
||||
rdp_toggling = READ_ONCE(my_rdp->nocb_toggling_rdp);
|
||||
if (rdp_toggling) {
|
||||
/*
|
||||
* Paranoid locking to make sure nocb_toggling_rdp is well
|
||||
* reset *before* we (re)set SEGCBLIST_KTHREAD_GP or we could
|
||||
* race with another round of nocb toggling for this rdp.
|
||||
* Nocb locking should prevent from that already but we stick
|
||||
* to paranoia, especially in rare path.
|
||||
*/
|
||||
raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
|
||||
my_rdp->nocb_toggling_rdp = NULL;
|
||||
raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
|
||||
}
|
||||
}
|
||||
|
||||
if (rdp_toggling) {
|
||||
bool wake_state = false;
|
||||
int ret;
|
||||
|
||||
ret = nocb_gp_toggle_rdp(rdp_toggling, &wake_state);
|
||||
if (ret == 1)
|
||||
list_add_tail(&rdp_toggling->nocb_entry_rdp, &my_rdp->nocb_head_rdp);
|
||||
else if (ret == 0)
|
||||
list_del(&rdp_toggling->nocb_entry_rdp);
|
||||
if (wake_state)
|
||||
swake_up_one(&rdp_toggling->nocb_state_wq);
|
||||
}
|
||||
|
||||
my_rdp->nocb_gp_seq = -1;
|
||||
WARN_ON(signal_pending(current));
|
||||
}
|
||||
@ -966,16 +992,15 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
|
||||
swake_up_one(&rdp->nocb_cb_wq);
|
||||
|
||||
raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
|
||||
// Queue this rdp for add/del to/from the list to iterate on rcuog
|
||||
WRITE_ONCE(rdp_gp->nocb_toggling_rdp, rdp);
|
||||
if (rdp_gp->nocb_gp_sleep) {
|
||||
rdp_gp->nocb_gp_sleep = false;
|
||||
wake_gp = true;
|
||||
}
|
||||
raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
|
||||
|
||||
if (wake_gp)
|
||||
wake_up_process(rdp_gp->nocb_gp_kthread);
|
||||
|
||||
return 0;
|
||||
return wake_gp;
|
||||
}
|
||||
|
||||
static long rcu_nocb_rdp_deoffload(void *arg)
|
||||
@ -983,9 +1008,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
|
||||
struct rcu_data *rdp = arg;
|
||||
struct rcu_segcblist *cblist = &rdp->cblist;
|
||||
unsigned long flags;
|
||||
int ret;
|
||||
int wake_gp;
|
||||
struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
|
||||
|
||||
WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
|
||||
/*
|
||||
* rcu_nocb_rdp_deoffload() may be called directly if
|
||||
* rcuog/o[p] spawn failed, because at this time the rdp->cpu
|
||||
* is not online yet.
|
||||
*/
|
||||
WARN_ON_ONCE((rdp->cpu != raw_smp_processor_id()) && cpu_online(rdp->cpu));
|
||||
|
||||
pr_info("De-offloading %d\n", rdp->cpu);
|
||||
|
||||
@ -1009,12 +1040,41 @@ static long rcu_nocb_rdp_deoffload(void *arg)
|
||||
*/
|
||||
rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
|
||||
invoke_rcu_core();
|
||||
ret = rdp_offload_toggle(rdp, false, flags);
|
||||
swait_event_exclusive(rdp->nocb_state_wq,
|
||||
!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
|
||||
SEGCBLIST_KTHREAD_GP));
|
||||
/* Stop nocb_gp_wait() from iterating over this structure. */
|
||||
list_del_rcu(&rdp->nocb_entry_rdp);
|
||||
wake_gp = rdp_offload_toggle(rdp, false, flags);
|
||||
|
||||
mutex_lock(&rdp_gp->nocb_gp_kthread_mutex);
|
||||
if (rdp_gp->nocb_gp_kthread) {
|
||||
if (wake_gp)
|
||||
wake_up_process(rdp_gp->nocb_gp_kthread);
|
||||
|
||||
/*
|
||||
* If rcuo[p] kthread spawn failed, directly remove SEGCBLIST_KTHREAD_CB.
|
||||
* Just wait SEGCBLIST_KTHREAD_GP to be cleared by rcuog.
|
||||
*/
|
||||
if (!rdp->nocb_cb_kthread) {
|
||||
rcu_nocb_lock_irqsave(rdp, flags);
|
||||
rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB);
|
||||
rcu_nocb_unlock_irqrestore(rdp, flags);
|
||||
}
|
||||
|
||||
swait_event_exclusive(rdp->nocb_state_wq,
|
||||
!rcu_segcblist_test_flags(cblist,
|
||||
SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP));
|
||||
} else {
|
||||
/*
|
||||
* No kthread to clear the flags for us or remove the rdp from the nocb list
|
||||
* to iterate. Do it here instead. Locking doesn't look stricly necessary
|
||||
* but we stick to paranoia in this rare path.
|
||||
*/
|
||||
rcu_nocb_lock_irqsave(rdp, flags);
|
||||
rcu_segcblist_clear_flags(&rdp->cblist,
|
||||
SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
|
||||
rcu_nocb_unlock_irqrestore(rdp, flags);
|
||||
|
||||
list_del(&rdp->nocb_entry_rdp);
|
||||
}
|
||||
mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
|
||||
|
||||
/*
|
||||
* Lock one last time to acquire latest callback updates from kthreads
|
||||
* so we can later handle callbacks locally without locking.
|
||||
@ -1035,7 +1095,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
|
||||
WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
|
||||
|
||||
|
||||
return ret;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int rcu_nocb_cpu_deoffload(int cpu)
|
||||
@ -1043,8 +1103,8 @@ int rcu_nocb_cpu_deoffload(int cpu)
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
int ret = 0;
|
||||
|
||||
mutex_lock(&rcu_state.barrier_mutex);
|
||||
cpus_read_lock();
|
||||
mutex_lock(&rcu_state.barrier_mutex);
|
||||
if (rcu_rdp_is_offloaded(rdp)) {
|
||||
if (cpu_online(cpu)) {
|
||||
ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
|
||||
@ -1055,8 +1115,8 @@ int rcu_nocb_cpu_deoffload(int cpu)
|
||||
ret = -EINVAL;
|
||||
}
|
||||
}
|
||||
cpus_read_unlock();
|
||||
mutex_unlock(&rcu_state.barrier_mutex);
|
||||
cpus_read_unlock();
|
||||
|
||||
return ret;
|
||||
}
|
||||
@ -1067,7 +1127,8 @@ static long rcu_nocb_rdp_offload(void *arg)
|
||||
struct rcu_data *rdp = arg;
|
||||
struct rcu_segcblist *cblist = &rdp->cblist;
|
||||
unsigned long flags;
|
||||
int ret;
|
||||
int wake_gp;
|
||||
struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
|
||||
|
||||
WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
|
||||
/*
|
||||
@ -1077,17 +1138,10 @@ static long rcu_nocb_rdp_offload(void *arg)
|
||||
if (!rdp->nocb_gp_rdp)
|
||||
return -EINVAL;
|
||||
|
||||
pr_info("Offloading %d\n", rdp->cpu);
|
||||
if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* Cause future nocb_gp_wait() invocations to iterate over
|
||||
* structure, resetting ->nocb_gp_sleep and waking up the related
|
||||
* "rcuog". Since nocb_gp_wait() in turn locks ->nocb_gp_lock
|
||||
* before setting ->nocb_gp_sleep again, we are guaranteed to
|
||||
* iterate this newly added structure before "rcuog" goes to
|
||||
* sleep again.
|
||||
*/
|
||||
list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp);
|
||||
pr_info("Offloading %d\n", rdp->cpu);
|
||||
|
||||
/*
|
||||
* Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
|
||||
@ -1111,7 +1165,9 @@ static long rcu_nocb_rdp_offload(void *arg)
|
||||
* WRITE flags READ callbacks
|
||||
* rcu_nocb_unlock() rcu_nocb_unlock()
|
||||
*/
|
||||
ret = rdp_offload_toggle(rdp, true, flags);
|
||||
wake_gp = rdp_offload_toggle(rdp, true, flags);
|
||||
if (wake_gp)
|
||||
wake_up_process(rdp_gp->nocb_gp_kthread);
|
||||
swait_event_exclusive(rdp->nocb_state_wq,
|
||||
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) &&
|
||||
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
|
||||
@ -1124,7 +1180,7 @@ static long rcu_nocb_rdp_offload(void *arg)
|
||||
rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE);
|
||||
rcu_nocb_unlock_irqrestore(rdp, flags);
|
||||
|
||||
return ret;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int rcu_nocb_cpu_offload(int cpu)
|
||||
@ -1132,8 +1188,8 @@ int rcu_nocb_cpu_offload(int cpu)
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
int ret = 0;
|
||||
|
||||
mutex_lock(&rcu_state.barrier_mutex);
|
||||
cpus_read_lock();
|
||||
mutex_lock(&rcu_state.barrier_mutex);
|
||||
if (!rcu_rdp_is_offloaded(rdp)) {
|
||||
if (cpu_online(cpu)) {
|
||||
ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
|
||||
@ -1144,8 +1200,8 @@ int rcu_nocb_cpu_offload(int cpu)
|
||||
ret = -EINVAL;
|
||||
}
|
||||
}
|
||||
cpus_read_unlock();
|
||||
mutex_unlock(&rcu_state.barrier_mutex);
|
||||
cpus_read_unlock();
|
||||
|
||||
return ret;
|
||||
}
|
||||
@ -1155,11 +1211,21 @@ void __init rcu_init_nohz(void)
|
||||
{
|
||||
int cpu;
|
||||
bool need_rcu_nocb_mask = false;
|
||||
bool offload_all = false;
|
||||
struct rcu_data *rdp;
|
||||
|
||||
#if defined(CONFIG_NO_HZ_FULL)
|
||||
if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
|
||||
#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
|
||||
if (!rcu_state.nocb_is_setup) {
|
||||
need_rcu_nocb_mask = true;
|
||||
offload_all = true;
|
||||
}
|
||||
#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
|
||||
|
||||
#if defined(CONFIG_NO_HZ_FULL)
|
||||
if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
|
||||
need_rcu_nocb_mask = true;
|
||||
offload_all = false; /* NO_HZ_FULL has its own mask. */
|
||||
}
|
||||
#endif /* #if defined(CONFIG_NO_HZ_FULL) */
|
||||
|
||||
if (need_rcu_nocb_mask) {
|
||||
@ -1180,6 +1246,9 @@ void __init rcu_init_nohz(void)
|
||||
cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
|
||||
#endif /* #if defined(CONFIG_NO_HZ_FULL) */
|
||||
|
||||
if (offload_all)
|
||||
cpumask_setall(rcu_nocb_mask);
|
||||
|
||||
if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
|
||||
pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
|
||||
cpumask_and(rcu_nocb_mask, cpu_possible_mask,
|
||||
@ -1246,7 +1315,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
|
||||
"rcuog/%d", rdp_gp->cpu);
|
||||
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
|
||||
mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
|
||||
return;
|
||||
goto end;
|
||||
}
|
||||
WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
|
||||
if (kthread_prio)
|
||||
@ -1258,12 +1327,21 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
|
||||
t = kthread_run(rcu_nocb_cb_kthread, rdp,
|
||||
"rcuo%c/%d", rcu_state.abbr, cpu);
|
||||
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
|
||||
return;
|
||||
goto end;
|
||||
|
||||
if (kthread_prio)
|
||||
if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST) && kthread_prio)
|
||||
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
|
||||
|
||||
WRITE_ONCE(rdp->nocb_cb_kthread, t);
|
||||
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
|
||||
return;
|
||||
end:
|
||||
mutex_lock(&rcu_state.barrier_mutex);
|
||||
if (rcu_rdp_is_offloaded(rdp)) {
|
||||
rcu_nocb_rdp_deoffload(rdp);
|
||||
cpumask_clear_cpu(cpu, rcu_nocb_mask);
|
||||
}
|
||||
mutex_unlock(&rcu_state.barrier_mutex);
|
||||
}
|
||||
|
||||
/* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */
|
||||
|
@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
|
||||
* be quite short, for example, in the case of the call from
|
||||
* rcu_read_unlock_special().
|
||||
*/
|
||||
static void
|
||||
static notrace void
|
||||
rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
||||
{
|
||||
bool empty_exp;
|
||||
@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
||||
* is disabled. This function cannot be expected to understand these
|
||||
* nuances, so the caller must handle them.
|
||||
*/
|
||||
static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
||||
static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
||||
{
|
||||
return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
|
||||
READ_ONCE(t->rcu_read_unlock_special.s)) &&
|
||||
@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
||||
* evaluate safety in terms of interrupt, softirq, and preemption
|
||||
* disabling.
|
||||
*/
|
||||
static void rcu_preempt_deferred_qs(struct task_struct *t)
|
||||
notrace void rcu_preempt_deferred_qs(struct task_struct *t)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
@ -899,8 +899,8 @@ void rcu_note_context_switch(bool preempt)
|
||||
this_cpu_write(rcu_data.rcu_urgent_qs, false);
|
||||
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs)))
|
||||
rcu_momentary_dyntick_idle();
|
||||
rcu_tasks_qs(current, preempt);
|
||||
out:
|
||||
rcu_tasks_qs(current, preempt);
|
||||
trace_rcu_utilization(TPS("End context switch"));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
|
||||
@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
|
||||
* Because there is no preemptible RCU, there can be no deferred quiescent
|
||||
* states.
|
||||
*/
|
||||
static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
||||
static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
||||
// period for a quiescent state from this CPU. Note that requests from
|
||||
// tasks are handled when removing the task from the blocked-tasks list
|
||||
// below.
|
||||
static void rcu_preempt_deferred_qs(struct task_struct *t)
|
||||
notrace void rcu_preempt_deferred_qs(struct task_struct *t)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
@ -1012,6 +1012,25 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
|
||||
WRITE_ONCE(rdp->rcuc_activity, jiffies);
|
||||
}
|
||||
|
||||
static bool rcu_is_callbacks_nocb_kthread(struct rcu_data *rdp)
|
||||
{
|
||||
#ifdef CONFIG_RCU_NOCB_CPU
|
||||
return rdp->nocb_cb_kthread == current;
|
||||
#else
|
||||
return false;
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Is the current CPU running the RCU-callbacks kthread?
|
||||
* Caller must have preemption disabled.
|
||||
*/
|
||||
static bool rcu_is_callbacks_kthread(struct rcu_data *rdp)
|
||||
{
|
||||
return rdp->rcu_cpu_kthread_task == current ||
|
||||
rcu_is_callbacks_nocb_kthread(rdp);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_RCU_BOOST
|
||||
|
||||
/*
|
||||
@ -1140,7 +1159,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
|
||||
(rnp->gp_tasks != NULL &&
|
||||
rnp->boost_tasks == NULL &&
|
||||
rnp->qsmask == 0 &&
|
||||
(!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld))) {
|
||||
(!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld ||
|
||||
IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)))) {
|
||||
if (rnp->exp_tasks == NULL)
|
||||
WRITE_ONCE(rnp->boost_tasks, rnp->gp_tasks);
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
@ -1151,15 +1171,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Is the current CPU running the RCU-callbacks kthread?
|
||||
* Caller must have preemption disabled.
|
||||
*/
|
||||
static bool rcu_is_callbacks_kthread(void)
|
||||
{
|
||||
return __this_cpu_read(rcu_data.rcu_cpu_kthread_task) == current;
|
||||
}
|
||||
|
||||
#define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000)
|
||||
|
||||
/*
|
||||
@ -1242,11 +1253,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
}
|
||||
|
||||
static bool rcu_is_callbacks_kthread(void)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
|
||||
{
|
||||
}
|
||||
@ -1290,37 +1296,3 @@ static void rcu_bind_gp_kthread(void)
|
||||
return;
|
||||
housekeeping_affine(current, HK_TYPE_RCU);
|
||||
}
|
||||
|
||||
/* Record the current task on dyntick-idle entry. */
|
||||
static __always_inline void rcu_dynticks_task_enter(void)
|
||||
{
|
||||
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
|
||||
WRITE_ONCE(current->rcu_tasks_idle_cpu, smp_processor_id());
|
||||
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
|
||||
}
|
||||
|
||||
/* Record no current task on dyntick-idle exit. */
|
||||
static __always_inline void rcu_dynticks_task_exit(void)
|
||||
{
|
||||
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
|
||||
WRITE_ONCE(current->rcu_tasks_idle_cpu, -1);
|
||||
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
|
||||
}
|
||||
|
||||
/* Turn on heavyweight RCU tasks trace readers on idle/user entry. */
|
||||
static __always_inline void rcu_dynticks_task_trace_enter(void)
|
||||
{
|
||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
current->trc_reader_special.b.need_mb = true;
|
||||
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
|
||||
}
|
||||
|
||||
/* Turn off heavyweight RCU tasks trace readers on idle/user exit. */
|
||||
static __always_inline void rcu_dynticks_task_trace_exit(void)
|
||||
{
|
||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
current->trc_reader_special.b.need_mb = false;
|
||||
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
|
||||
}
|
||||
|
@ -409,7 +409,19 @@ static bool rcu_is_gp_kthread_starving(unsigned long *jp)
|
||||
|
||||
static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp)
|
||||
{
|
||||
unsigned long j = jiffies - READ_ONCE(rdp->rcuc_activity);
|
||||
int cpu;
|
||||
struct task_struct *rcuc;
|
||||
unsigned long j;
|
||||
|
||||
rcuc = rdp->rcu_cpu_kthread_task;
|
||||
if (!rcuc)
|
||||
return false;
|
||||
|
||||
cpu = task_cpu(rcuc);
|
||||
if (cpu_is_offline(cpu) || idle_cpu(cpu))
|
||||
return false;
|
||||
|
||||
j = jiffies - READ_ONCE(rdp->rcuc_activity);
|
||||
|
||||
if (jp)
|
||||
*jp = j;
|
||||
@ -434,6 +446,9 @@ static void print_cpu_stall_info(int cpu)
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
char *ticks_title;
|
||||
unsigned long ticks_value;
|
||||
bool rcuc_starved;
|
||||
unsigned long j;
|
||||
char buf[32];
|
||||
|
||||
/*
|
||||
* We could be printing a lot while holding a spinlock. Avoid
|
||||
@ -450,8 +465,11 @@ static void print_cpu_stall_info(int cpu)
|
||||
}
|
||||
delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
|
||||
falsepositive = rcu_is_gp_kthread_starving(NULL) &&
|
||||
rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp));
|
||||
pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
|
||||
rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu));
|
||||
rcuc_starved = rcu_is_rcuc_kthread_starving(rdp, &j);
|
||||
if (rcuc_starved)
|
||||
sprintf(buf, " rcuc=%ld jiffies(starved)", j);
|
||||
pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%04x/%ld/%#lx softirq=%u/%u fqs=%ld%s%s\n",
|
||||
cpu,
|
||||
"O."[!!cpu_online(cpu)],
|
||||
"o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
|
||||
@ -460,36 +478,14 @@ static void print_cpu_stall_info(int cpu)
|
||||
rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
|
||||
"!."[!delta],
|
||||
ticks_value, ticks_title,
|
||||
rcu_dynticks_snap(rdp) & 0xfff,
|
||||
rdp->dynticks_nesting, rdp->dynticks_nmi_nesting,
|
||||
rcu_dynticks_snap(cpu) & 0xffff,
|
||||
ct_dynticks_nesting_cpu(cpu), ct_dynticks_nmi_nesting_cpu(cpu),
|
||||
rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
|
||||
data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
|
||||
rcuc_starved ? buf : "",
|
||||
falsepositive ? " (false positive?)" : "");
|
||||
}
|
||||
|
||||
static void rcuc_kthread_dump(struct rcu_data *rdp)
|
||||
{
|
||||
int cpu;
|
||||
unsigned long j;
|
||||
struct task_struct *rcuc;
|
||||
|
||||
rcuc = rdp->rcu_cpu_kthread_task;
|
||||
if (!rcuc)
|
||||
return;
|
||||
|
||||
cpu = task_cpu(rcuc);
|
||||
if (cpu_is_offline(cpu) || idle_cpu(cpu))
|
||||
return;
|
||||
|
||||
if (!rcu_is_rcuc_kthread_starving(rdp, &j))
|
||||
return;
|
||||
|
||||
pr_err("%s kthread starved for %ld jiffies\n", rcuc->comm, j);
|
||||
sched_show_task(rcuc);
|
||||
if (!trigger_single_cpu_backtrace(cpu))
|
||||
dump_cpu_task(cpu);
|
||||
}
|
||||
|
||||
/* Complain about starvation of grace-period kthread. */
|
||||
static void rcu_check_gp_kthread_starvation(void)
|
||||
{
|
||||
@ -661,9 +657,6 @@ static void print_cpu_stall(unsigned long gps)
|
||||
rcu_check_gp_kthread_expired_fqs_timer();
|
||||
rcu_check_gp_kthread_starvation();
|
||||
|
||||
if (!use_softirq)
|
||||
rcuc_kthread_dump(rdp);
|
||||
|
||||
rcu_dump_cpu_stacks();
|
||||
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
|
@ -85,7 +85,7 @@ module_param(rcu_normal_after_boot, int, 0444);
|
||||
* and while lockdep is disabled.
|
||||
*
|
||||
* Note that if the CPU is in the idle loop from an RCU point of view (ie:
|
||||
* that we are in the section between rcu_idle_enter() and rcu_idle_exit())
|
||||
* that we are in the section between ct_idle_enter() and ct_idle_exit())
|
||||
* then rcu_read_lock_held() sets ``*ret`` to false even if the CPU did an
|
||||
* rcu_read_lock(). The reason for this is that RCU ignores CPUs that are
|
||||
* in such a section, considering these as in extended quiescent state,
|
||||
@ -516,6 +516,19 @@ int rcu_cpu_stall_suppress_at_boot __read_mostly; // !0 = suppress boot stalls.
|
||||
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress_at_boot);
|
||||
module_param(rcu_cpu_stall_suppress_at_boot, int, 0444);
|
||||
|
||||
/**
|
||||
* get_completed_synchronize_rcu - Return a pre-completed polled state cookie
|
||||
*
|
||||
* Returns a value that will always be treated by functions like
|
||||
* poll_state_synchronize_rcu() as a cookie whose grace period has already
|
||||
* completed.
|
||||
*/
|
||||
unsigned long get_completed_synchronize_rcu(void)
|
||||
{
|
||||
return RCU_GET_STATE_COMPLETED;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(get_completed_synchronize_rcu);
|
||||
|
||||
#ifdef CONFIG_PROVE_RCU
|
||||
|
||||
/*
|
||||
|
@ -4262,6 +4262,38 @@ int task_call_func(struct task_struct *p, task_call_f func, void *arg)
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* cpu_curr_snapshot - Return a snapshot of the currently running task
|
||||
* @cpu: The CPU on which to snapshot the task.
|
||||
*
|
||||
* Returns the task_struct pointer of the task "currently" running on
|
||||
* the specified CPU. If the same task is running on that CPU throughout,
|
||||
* the return value will be a pointer to that task's task_struct structure.
|
||||
* If the CPU did any context switches even vaguely concurrently with the
|
||||
* execution of this function, the return value will be a pointer to the
|
||||
* task_struct structure of a randomly chosen task that was running on
|
||||
* that CPU somewhere around the time that this function was executing.
|
||||
*
|
||||
* If the specified CPU was offline, the return value is whatever it
|
||||
* is, perhaps a pointer to the task_struct structure of that CPU's idle
|
||||
* task, but there is no guarantee. Callers wishing a useful return
|
||||
* value must take some action to ensure that the specified CPU remains
|
||||
* online throughout.
|
||||
*
|
||||
* This function executes full memory barriers before and after fetching
|
||||
* the pointer, which permits the caller to confine this function's fetch
|
||||
* with respect to the caller's accesses to other shared variables.
|
||||
*/
|
||||
struct task_struct *cpu_curr_snapshot(int cpu)
|
||||
{
|
||||
struct task_struct *t;
|
||||
|
||||
smp_mb(); /* Pairing determined by caller's synchronization design. */
|
||||
t = rcu_dereference(cpu_curr(cpu));
|
||||
smp_mb(); /* Pairing determined by caller's synchronization design. */
|
||||
return t;
|
||||
}
|
||||
|
||||
/**
|
||||
* wake_up_process - Wake up a specific process
|
||||
* @p: The process to be woken up.
|
||||
@ -6563,7 +6595,7 @@ void __sched schedule_idle(void)
|
||||
} while (need_resched());
|
||||
}
|
||||
|
||||
#if defined(CONFIG_CONTEXT_TRACKING) && !defined(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK)
|
||||
#if defined(CONFIG_CONTEXT_TRACKING_USER) && !defined(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK)
|
||||
asmlinkage __visible void __sched schedule_user(void)
|
||||
{
|
||||
/*
|
||||
|
@ -53,14 +53,14 @@ static noinline int __cpuidle cpu_idle_poll(void)
|
||||
{
|
||||
trace_cpu_idle(0, smp_processor_id());
|
||||
stop_critical_timings();
|
||||
rcu_idle_enter();
|
||||
ct_idle_enter();
|
||||
local_irq_enable();
|
||||
|
||||
while (!tif_need_resched() &&
|
||||
(cpu_idle_force_poll || tick_check_broadcast_expired()))
|
||||
cpu_relax();
|
||||
|
||||
rcu_idle_exit();
|
||||
ct_idle_exit();
|
||||
start_critical_timings();
|
||||
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
|
||||
|
||||
@ -98,12 +98,12 @@ void __cpuidle default_idle_call(void)
|
||||
*
|
||||
* Trace IRQs enable here, then switch off RCU, and have
|
||||
* arch_cpu_idle() use raw_local_irq_enable(). Note that
|
||||
* rcu_idle_enter() relies on lockdep IRQ state, so switch that
|
||||
* ct_idle_enter() relies on lockdep IRQ state, so switch that
|
||||
* last -- this is very similar to the entry code.
|
||||
*/
|
||||
trace_hardirqs_on_prepare();
|
||||
lockdep_hardirqs_on_prepare();
|
||||
rcu_idle_enter();
|
||||
ct_idle_enter();
|
||||
lockdep_hardirqs_on(_THIS_IP_);
|
||||
|
||||
arch_cpu_idle();
|
||||
@ -116,7 +116,7 @@ void __cpuidle default_idle_call(void)
|
||||
*/
|
||||
raw_local_irq_disable();
|
||||
lockdep_hardirqs_off(_THIS_IP_);
|
||||
rcu_idle_exit();
|
||||
ct_idle_exit();
|
||||
lockdep_hardirqs_on(_THIS_IP_);
|
||||
raw_local_irq_enable();
|
||||
|
||||
|
@ -27,6 +27,7 @@
|
||||
#include <linux/capability.h>
|
||||
#include <linux/cgroup_api.h>
|
||||
#include <linux/cgroup.h>
|
||||
#include <linux/context_tracking.h>
|
||||
#include <linux/cpufreq.h>
|
||||
#include <linux/cpumask_api.h>
|
||||
#include <linux/ctype.h>
|
||||
|
@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str)
|
||||
if (val)
|
||||
static_branch_enable(&csdlock_debug_enabled);
|
||||
|
||||
return 0;
|
||||
return 1;
|
||||
}
|
||||
early_param("csdlock_debug", csdlock_debug);
|
||||
__setup("csdlock_debug=", csdlock_debug);
|
||||
|
||||
static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
|
||||
static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
|
||||
|
@ -620,7 +620,7 @@ void irq_enter_rcu(void)
|
||||
*/
|
||||
void irq_enter(void)
|
||||
{
|
||||
rcu_irq_enter();
|
||||
ct_irq_enter();
|
||||
irq_enter_rcu();
|
||||
}
|
||||
|
||||
@ -672,7 +672,7 @@ void irq_exit_rcu(void)
|
||||
void irq_exit(void)
|
||||
{
|
||||
__irq_exit_rcu();
|
||||
rcu_irq_exit();
|
||||
ct_irq_exit();
|
||||
/* must be last! */
|
||||
lockdep_hardirq_exit();
|
||||
}
|
||||
|
@ -73,6 +73,15 @@ config TIME_KUNIT_TEST
|
||||
|
||||
If unsure, say N.
|
||||
|
||||
config CONTEXT_TRACKING
|
||||
bool
|
||||
|
||||
config CONTEXT_TRACKING_IDLE
|
||||
bool
|
||||
select CONTEXT_TRACKING
|
||||
help
|
||||
Tracks idle state on behalf of RCU.
|
||||
|
||||
if GENERIC_CLOCKEVENTS
|
||||
menu "Timers subsystem"
|
||||
|
||||
@ -111,7 +120,7 @@ config NO_HZ_FULL
|
||||
# NO_HZ_COMMON dependency
|
||||
# We need at least one periodic CPU for timekeeping
|
||||
depends on SMP
|
||||
depends on HAVE_CONTEXT_TRACKING
|
||||
depends on HAVE_CONTEXT_TRACKING_USER
|
||||
# VIRT_CPU_ACCOUNTING_GEN dependency
|
||||
depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
|
||||
select NO_HZ_COMMON
|
||||
@ -137,31 +146,37 @@ config NO_HZ_FULL
|
||||
|
||||
endchoice
|
||||
|
||||
config CONTEXT_TRACKING
|
||||
bool
|
||||
config CONTEXT_TRACKING_USER
|
||||
bool
|
||||
depends on HAVE_CONTEXT_TRACKING_USER
|
||||
select CONTEXT_TRACKING
|
||||
help
|
||||
Track transitions between kernel and user on behalf of RCU and
|
||||
tickless cputime accounting. The former case relies on context
|
||||
tracking to enter/exit RCU extended quiescent states.
|
||||
|
||||
config CONTEXT_TRACKING_FORCE
|
||||
bool "Force context tracking"
|
||||
depends on CONTEXT_TRACKING
|
||||
config CONTEXT_TRACKING_USER_FORCE
|
||||
bool "Force user context tracking"
|
||||
depends on CONTEXT_TRACKING_USER
|
||||
default y if !NO_HZ_FULL
|
||||
help
|
||||
The major pre-requirement for full dynticks to work is to
|
||||
support the context tracking subsystem. But there are also
|
||||
support the user context tracking subsystem. But there are also
|
||||
other dependencies to provide in order to make the full
|
||||
dynticks working.
|
||||
|
||||
This option stands for testing when an arch implements the
|
||||
context tracking backend but doesn't yet fulfill all the
|
||||
user context tracking backend but doesn't yet fulfill all the
|
||||
requirements to make the full dynticks feature working.
|
||||
Without the full dynticks, there is no way to test the support
|
||||
for context tracking and the subsystems that rely on it: RCU
|
||||
for user context tracking and the subsystems that rely on it: RCU
|
||||
userspace extended quiescent state and tickless cputime
|
||||
accounting. This option copes with the absence of the full
|
||||
dynticks subsystem by forcing the context tracking on all
|
||||
dynticks subsystem by forcing the user context tracking on all
|
||||
CPUs in the system.
|
||||
|
||||
Say Y only if you're working on the development of an
|
||||
architecture backend for the context tracking.
|
||||
architecture backend for the user context tracking.
|
||||
|
||||
Say N otherwise, this option brings an overhead that you
|
||||
don't want in production.
|
||||
|
@ -570,7 +570,7 @@ void __init tick_nohz_init(void)
|
||||
}
|
||||
|
||||
for_each_cpu(cpu, tick_nohz_full_mask)
|
||||
context_tracking_cpu_set(cpu);
|
||||
ct_cpu_track_user(cpu);
|
||||
|
||||
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
|
||||
"kernel/nohz:predown", NULL,
|
||||
|
@ -3105,17 +3105,17 @@ void __trace_stack(struct trace_array *tr, unsigned int trace_ctx,
|
||||
}
|
||||
|
||||
/*
|
||||
* When an NMI triggers, RCU is enabled via rcu_nmi_enter(),
|
||||
* When an NMI triggers, RCU is enabled via ct_nmi_enter(),
|
||||
* but if the above rcu_is_watching() failed, then the NMI
|
||||
* triggered someplace critical, and rcu_irq_enter() should
|
||||
* triggered someplace critical, and ct_irq_enter() should
|
||||
* not be called from NMI.
|
||||
*/
|
||||
if (unlikely(in_nmi()))
|
||||
return;
|
||||
|
||||
rcu_irq_enter_irqson();
|
||||
ct_irq_enter_irqson();
|
||||
__ftrace_trace_stack(buffer, trace_ctx, skip, NULL);
|
||||
rcu_irq_exit_irqson();
|
||||
ct_irq_exit_irqson();
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -35,7 +35,7 @@ then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Remember where we started so that we can get back and the end.
|
||||
# Remember where we started so that we can get back at the end.
|
||||
curcommit="`git status | head -1 | awk '{ print $NF }'`"
|
||||
|
||||
nfail=0
|
||||
@ -73,15 +73,10 @@ do
|
||||
# Test the specified commit.
|
||||
git checkout $i > $resdir/$ds/$idir/git-checkout.out 2>&1
|
||||
echo git checkout return code: $? "(Commit $ntry: $i)"
|
||||
kvm.sh --allcpus --duration 3 --trust-make > $resdir/$ds/$idir/kvm.sh.out 2>&1
|
||||
kvm.sh --allcpus --duration 3 --trust-make --datestamp "$ds/$idir" > $resdir/$ds/$idir/kvm.sh.out 2>&1
|
||||
ret=$?
|
||||
echo kvm.sh return code $ret for commit $i from branch $gitbr
|
||||
|
||||
# Move the build products to their resting place.
|
||||
runresdir="`grep -m 1 '^Results directory:' < $resdir/$ds/$idir/kvm.sh.out | sed -e 's/^Results directory://'`"
|
||||
mv $runresdir $resdir/$ds/$idir
|
||||
rrd="`echo $runresdir | sed -e 's,^.*/,,'`"
|
||||
echo Run results: $resdir/$ds/$idir/$rrd
|
||||
echo Run results: $resdir/$ds/$idir
|
||||
if test "$ret" -ne 0
|
||||
then
|
||||
# Failure, so leave all evidence intact.
|
||||
|
@ -262,6 +262,7 @@ echo All batches started. `date` | tee -a "$oldrun/remote-log"
|
||||
# Wait for all remaining scenarios to complete and collect results.
|
||||
for i in $systems
|
||||
do
|
||||
echo " ---" Waiting for $i `date` | tee -a "$oldrun/remote-log"
|
||||
while checkremotefile "$i" "$resdir/$ds/remote.run"
|
||||
do
|
||||
sleep 30
|
||||
|
@ -164,7 +164,7 @@ do
|
||||
shift
|
||||
;;
|
||||
--gdb)
|
||||
TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO=y"; export TORTURE_KCONFIG_GDB_ARG
|
||||
TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y"; export TORTURE_KCONFIG_GDB_ARG
|
||||
TORTURE_BOOT_GDB_ARG="nokaslr"; export TORTURE_BOOT_GDB_ARG
|
||||
TORTURE_QEMU_GDB_ARG="-s -S"; export TORTURE_QEMU_GDB_ARG
|
||||
;;
|
||||
@ -180,7 +180,7 @@ do
|
||||
shift
|
||||
;;
|
||||
--kasan)
|
||||
TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG
|
||||
TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG
|
||||
if test -n "$torture_qemu_mem_default"
|
||||
then
|
||||
TORTURE_QEMU_MEM=2G
|
||||
@ -192,7 +192,7 @@ do
|
||||
shift
|
||||
;;
|
||||
--kcsan)
|
||||
TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_STRICT=y CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y"; export TORTURE_KCONFIG_KCSAN_ARG
|
||||
TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_KCSAN=y CONFIG_KCSAN_STRICT=y CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y"; export TORTURE_KCONFIG_KCSAN_ARG
|
||||
;;
|
||||
--kmake-arg|--kmake-args)
|
||||
checkarg --kmake-arg "(kernel make arguments)" $# "$2" '.*' '^error$'
|
||||
|
Loading…
Reference in New Issue
Block a user