ANDROID: rcu: Add a minimum time for marking boot as completed

On many systems, a great deal of boot (in userspace) happens after the
kernel thinks the boot has completed. It is difficult to determine if
the system has really booted from the kernel side. Some features like
lazy-RCU can risk slowing down boot time if, say, a callback has been
added that the boot synchronously depends on. Further expedited callbacks
can get unexpedited way earlier than it should be, thus slowing down
boot (as shown in the data below).

For these reasons, this commit adds a config option
'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
Userspace can also make RCU's view of the system as booted, by writing the
time in milliseconds to: /sys/module/rcupdate/parameters/rcu_boot_end_delay
Or even just writing a value of 0 to this sysfs node.
However, under no circumstance will the boot be allowed to end earlier
than just before init is launched.

The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
suites ChromeOS and also a PREEMPT_RT system below very well, which need
no config or parameter changes, and just a simple application of this patch. A
system designer can also choose a specific value here to keep RCU from marking
boot completion.  As noted earlier, RCU's perspective of the system as booted
will not be marker until at least rcu_boot_end_delay milliseconds have passed
or an update is made via writing a small value (or 0) in milliseconds to:
/sys/module/rcupdate/parameters/rcu_boot_end_delay.

One side-effect of this patch is, there is a risk that a real-time workload
launched just after the kernel boots will suffer interruptions due to expedited
RCU, which previous ended just before init was launched. However, to mitigate
such an issue (however unlikely), the user should either tune
CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
of 0 to /sys/module/rcupdate/parameters/rcu_boot_end_delay, once userspace
boots, and before launching the real-time workload.

Qiuxu also noted impressive boot-time improvements with earlier version
of patch. An excerpt from the data he shared:

1) Testing environment:
    OS            : CentOS Stream 8 (non-RT OS)
    Kernel     : v6.2
    Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
    Qemu  args  : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …

2) OS boot time definition:
    The time from the start of the kernel boot to the shell command line
    prompt is shown from the console. [ Different people may have
    different OS boot time definitions. ]

3) Measurement method (very rough method):
    A timer in the kernel periodically prints the boot time every 100ms.
    As soon as the shell command line prompt is shown from the console,
    we record the boot time printed by the timer, then the printed boot
    time is the OS boot time.

4) Measured OS boot time (in seconds)
   a) Measured 10 times w/o this patch:
        8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
        The average OS boot time was: ~8.7s

   b) Measure 10 times w/ this patch:
        8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
        The average OS boot time was: ~8.3s.

(CHROMIUM tag rationale: Submitted upstream but got lots of pushback as
it may harm a PREEMPT_RT system -- the concern is VERY theoretical and
this improves things for ChromeOS. Plus we are not a PREEMPT_RT system.
So I am strongly suggesting this mostly simple change for ChromeOS.)

Bug: 258241771
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4350228
Commit-Queue: Joel Fernandes <joelaf@google.com>
Commit-Queue: Vineeth Pillai <vineethrp@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4909180
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ibd262189d7f92dbcc57f1508efe90fcfba95a6cc
This commit is contained in:
Joel Fernandes (Google) 2023-03-03 21:38:51 +00:00 committed by Todd Kjos
parent ffe09c06a8
commit a079cc5876
3 changed files with 109 additions and 2 deletions

View File

@ -5130,6 +5130,21 @@
rcutorture.verbose= [KNL]
Enable additional printk() statements.
rcupdate.rcu_boot_end_delay= [KNL]
Minimum time in milliseconds from the start of boot
that must elapse before the boot sequence can be marked
complete from RCU's perspective, after which RCU's
behavior becomes more relaxed. The default value is also
configurable via CONFIG_RCU_BOOT_END_DELAY.
Userspace can also mark the boot as completed
sooner by writing the time in milliseconds, say once
userspace considers the system as booted, to:
/sys/module/rcupdate/parameters/rcu_boot_end_delay
Or even just writing a value of 0 to this sysfs node.
The sysfs node can also be used to extend the delay
to be larger than the default, assuming the marking
of boot complete has not yet occurred.
rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
Dump ftrace buffer after reporting RCU CPU
stall warning.

View File

@ -319,4 +319,25 @@ config RCU_LAZY
To save power, batch RCU callbacks and flush after delay, memory
pressure, or callback list growing too big.
config RCU_BOOT_END_DELAY
int "Minimum time before RCU may consider in-kernel boot as completed"
range 0 120000
default 20000
help
Default value of the minimum time in milliseconds from the start of boot
that must elapse before the boot sequence can be marked complete from RCU's
perspective, after which RCU's behavior becomes more relaxed.
Userspace can also mark the boot as completed sooner than this default
by writing the time in milliseconds, say once userspace considers
the system as booted, to: /sys/module/rcupdate/parameters/rcu_boot_end_delay.
Or even just writing a value of 0 to this sysfs node. The sysfs node can
also be used to extend the delay to be larger than the default, assuming
the marking of boot completion has not yet occurred.
The actual delay for RCU's view of the system to be marked as booted can be
higher than this value if the kernel takes a long time to initialize but it
will never be smaller than this value.
Accept the default if unsure.
endmenu # "RCU Subsystem"

View File

@ -43,6 +43,7 @@
#include <linux/slab.h>
#include <linux/irq_work.h>
#include <linux/rcupdate_trace.h>
#include <linux/jiffies.h>
#define CREATE_TRACE_POINTS
@ -224,13 +225,50 @@ void rcu_unexpedite_gp(void)
}
EXPORT_SYMBOL_GPL(rcu_unexpedite_gp);
/*
* Minimum time in milliseconds from the start boot until RCU can consider
* in-kernel boot as completed. This can also be tuned at runtime to end the
* boot earlier, by userspace init code writing the time in milliseconds (even
* 0) to: /sys/module/rcupdate/parameters/rcu_boot_end_delay. The sysfs node
* can also be used to extend the delay to be larger than the default, assuming
* the marking of boot complete has not yet occurred.
*/
static int rcu_boot_end_delay = CONFIG_RCU_BOOT_END_DELAY;
static bool rcu_boot_ended __read_mostly;
static bool rcu_boot_end_called __read_mostly;
static DEFINE_MUTEX(rcu_boot_end_lock);
/*
* Inform RCU of the end of the in-kernel boot sequence.
* Inform RCU of the end of the in-kernel boot sequence. The boot sequence will
* not be marked ended until at least rcu_boot_end_delay milliseconds have passed.
*/
void rcu_end_inkernel_boot(void)
void rcu_end_inkernel_boot(void);
static void rcu_boot_end_work_fn(struct work_struct *work)
{
rcu_end_inkernel_boot();
}
static DECLARE_DELAYED_WORK(rcu_boot_end_work, rcu_boot_end_work_fn);
/* Must be called with rcu_boot_end_lock held. */
static void rcu_end_inkernel_boot_locked(void)
{
rcu_boot_end_called = true;
if (rcu_boot_ended)
return;
if (rcu_boot_end_delay) {
u64 boot_ms = div_u64(ktime_get_boot_fast_ns(), 1000000UL);
if (boot_ms < rcu_boot_end_delay) {
schedule_delayed_work(&rcu_boot_end_work,
msecs_to_jiffies(rcu_boot_end_delay - boot_ms));
return;
}
}
cancel_delayed_work(&rcu_boot_end_work);
rcu_unexpedite_gp();
rcu_async_relax();
if (rcu_normal_after_boot)
@ -238,6 +276,39 @@ void rcu_end_inkernel_boot(void)
rcu_boot_ended = true;
}
void rcu_end_inkernel_boot(void)
{
mutex_lock(&rcu_boot_end_lock);
rcu_end_inkernel_boot_locked();
mutex_unlock(&rcu_boot_end_lock);
}
static int param_set_rcu_boot_end(const char *val, const struct kernel_param *kp)
{
uint end_ms;
int ret = kstrtouint(val, 0, &end_ms);
if (ret)
return ret;
/*
* rcu_end_inkernel_boot() should be called at least once during init
* before we can allow param changes to end the boot.
*/
mutex_lock(&rcu_boot_end_lock);
rcu_boot_end_delay = end_ms;
if (!rcu_boot_ended && rcu_boot_end_called) {
rcu_end_inkernel_boot_locked();
}
mutex_unlock(&rcu_boot_end_lock);
return ret;
}
static const struct kernel_param_ops rcu_boot_end_ops = {
.set = param_set_rcu_boot_end,
.get = param_get_uint,
};
module_param_cb(rcu_boot_end_delay, &rcu_boot_end_ops, &rcu_boot_end_delay, 0644);
/*
* Let rcutorture know when it is OK to turn it up to eleven.
*/