android_kernel_xiaomi_sm8450/kernel/time
Jiri Wiesner 8868106251 clocksource: Skip watchdog check for large watchdog intervals
commit 644649553508b9bacf0fc7a5bdc4f9e0165576a5 upstream.

There have been reports of the watchdog marking clocksources unstable on
machines with 8 NUMA nodes:

  clocksource: timekeeping watchdog on CPU373:
  Marking clocksource 'tsc' as unstable because the skew is too large:
  clocksource:   'hpet' wd_nsec: 14523447520
  clocksource:   'tsc'  cs_nsec: 14524115132

The measured clocksource skew - the absolute difference between cs_nsec
and wd_nsec - was 668 microseconds:

  cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612

The kernel used 200 microseconds for the uncertainty_margin of both the
clocksource and watchdog, resulting in a threshold of 400 microseconds (the
md variable). Both the cs_nsec and the wd_nsec value indicate that the
readout interval was circa 14.5 seconds.  The observed behaviour is that
watchdog checks failed for large readout intervals on 8 NUMA node
machines. This indicates that the size of the skew was directly proportinal
to the length of the readout interval on those machines. The measured
clocksource skew, 668 microseconds, was evaluated against a threshold (the
md variable) that is suited for readout intervals of roughly
WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.

The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew
threshold") was to tighten the threshold for evaluating skew and set the
lower bound for the uncertainty_margin of clocksources to twice
WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource
watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to
125 microseconds to fit the limit of NTP, which is able to use a
clocksource that suffers from up to 500 microseconds of skew per second.
Both the TSC and the HPET use default uncertainty_margin. When the
readout interval gets stretched the default uncertainty_margin is no
longer a suitable lower bound for evaluating skew - it imposes a limit
that is far stricter than the skew with which NTP can deal.

The root causes of the skew being directly proportinal to the length of
the readout interval are:

  * the inaccuracy of the shift/mult pairs of clocksources and the watchdog
  * the conversion to nanoseconds is imprecise for large readout intervals

Prevent this by skipping the current watchdog check if the readout
interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout
interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin
(of the TSC and HPET) corresponds to a limit on clocksource skew of 250
ppm (microseconds of skew per second).  To keep the limit imposed by NTP
(500 microseconds of skew per second) for all possible readout intervals,
the margins would have to be scaled so that the threshold value is
proportional to the length of the actual readout interval.

As for why the readout interval may get stretched: Since the watchdog is
executed in softirq context the expiration of the watchdog timer can get
severely delayed on account of a ksoftirqd thread not getting to run in a
timely manner. Surely, a system with such belated softirq execution is not
working well and the scheduling issue should be looked into but the
clocksource watchdog should be able to deal with it accordingly.

Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240122172350.GA740@incl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23 08:42:22 +01:00
..
alarmtimer.c alarmtimer: Prevent starvation by small intervals and SIG_IGN 2023-02-22 12:55:59 +01:00
clockevents.c tick: Remove outgoing CPU from broadcast masks 2019-03-23 18:26:43 +01:00
clocksource.c clocksource: Skip watchdog check for large watchdog intervals 2024-02-23 08:42:22 +01:00
hrtimer.c hrtimer: Report offline hrtimer enqueue 2024-02-23 08:42:22 +01:00
itimer.c time: Prevent undefined behaviour in timespec64_to_ns() 2020-10-26 11:48:11 +01:00
jiffies.c clocksource: Reduce clocksource-skew threshold 2022-01-27 10:54:05 +01:00
Kconfig posix-cpu-timers: Provide mechanisms to defer timer handling to task_work 2020-08-06 16:50:59 +02:00
Makefile ns: Introduce Time Namespace 2020-01-14 12:20:48 +01:00
namespace.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
ntp_internal.h ntp: Audit NTP parameters adjustment 2019-04-15 18:14:01 -04:00
ntp.c ntp/y2038: Remove incorrect time_t truncation 2019-11-12 08:13:44 +01:00
posix-clock.c posix-clocks: Rename the clock_get() callback to clock_get_timespec() 2020-01-14 12:20:49 +01:00
posix-cpu-timers.c posix-cpu-timers: Implement the missing timer_wait_running callback 2023-05-17 11:47:31 +02:00
posix-stubs.c timers: Prevent union confusion from unexpected restart_syscall() 2023-03-11 16:39:49 +01:00
posix-timers.c posix-timers: Ensure timer ID search-loop limit is valid 2023-07-27 08:44:36 +02:00
posix-timers.h posix-clocks: Introduce clock_get_ktime() callback 2020-01-14 12:20:51 +01:00
sched_clock.c time/sched_clock: Mark sched_clock_read_begin/retry() as notrace 2020-10-26 11:34:31 +01:00
test_udelay.c time/debug: Remove license boilerplate 2018-11-23 11:51:21 +01:00
tick-broadcast-hrtimer.c tick: broadcast-hrtimer: Fix a race in bc_set_next 2019-09-27 14:45:55 +02:00
tick-broadcast.c tick: Get rid of tick_period 2023-05-17 11:47:46 +02:00
tick-common.c tick/common: Align tick period during sched_timer setup 2023-06-28 10:28:06 +02:00
tick-internal.h tick: Get rid of tick_period 2023-05-17 11:47:46 +02:00
tick-oneshot.c hrtimers/tick/clockevents: Remove sloppy license references 2018-11-23 11:51:21 +01:00
tick-sched.c tick/sched: Preserve number of idle sleeps across CPU hotplug events 2024-02-23 08:42:01 +01:00
tick-sched.h tick: Detect and fix jiffies update stall 2023-08-30 16:23:17 +02:00
time.c y2038: remove unused time32 interfaces 2020-02-21 11:22:15 -08:00
timeconst.bc time: Add SPDX license identifiers 2018-11-23 11:51:20 +01:00
timeconv.c time: Add SPDX license identifiers 2018-11-23 11:51:20 +01:00
timecounter.c time: Remove license boilerplate 2018-11-23 11:51:21 +01:00
timekeeping_debug.c timekeeping/debug: No need to check return value of debugfs_create functions 2019-01-29 20:08:41 +01:00
timekeeping_internal.h timekeeping/vsyscall: Provide vdso_update_begin/end() 2020-08-06 10:57:30 +02:00
timekeeping.c timekeeping: contribute wall clock to rng on time change 2022-08-21 15:16:20 +02:00
timekeeping.h timekeeping: Split jiffies seqlock 2020-03-21 16:00:23 +01:00
timer_list.c timer_list: Guard procfs specific code 2019-06-23 00:08:52 +02:00
timer.c timers: Fix warning condition in __run_timers() 2022-04-20 09:23:30 +02:00
vsyscall.c timekeeping/vsyscall: Provide vdso_update_begin/end() 2020-08-06 10:57:30 +02:00