android_kernel_xiaomi_sm8450/kernel
Jiri Wiesner 8868106251 clocksource: Skip watchdog check for large watchdog intervals
commit 644649553508b9bacf0fc7a5bdc4f9e0165576a5 upstream.

There have been reports of the watchdog marking clocksources unstable on
machines with 8 NUMA nodes:

  clocksource: timekeeping watchdog on CPU373:
  Marking clocksource 'tsc' as unstable because the skew is too large:
  clocksource:   'hpet' wd_nsec: 14523447520
  clocksource:   'tsc'  cs_nsec: 14524115132

The measured clocksource skew - the absolute difference between cs_nsec
and wd_nsec - was 668 microseconds:

  cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612

The kernel used 200 microseconds for the uncertainty_margin of both the
clocksource and watchdog, resulting in a threshold of 400 microseconds (the
md variable). Both the cs_nsec and the wd_nsec value indicate that the
readout interval was circa 14.5 seconds.  The observed behaviour is that
watchdog checks failed for large readout intervals on 8 NUMA node
machines. This indicates that the size of the skew was directly proportinal
to the length of the readout interval on those machines. The measured
clocksource skew, 668 microseconds, was evaluated against a threshold (the
md variable) that is suited for readout intervals of roughly
WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.

The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew
threshold") was to tighten the threshold for evaluating skew and set the
lower bound for the uncertainty_margin of clocksources to twice
WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource
watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to
125 microseconds to fit the limit of NTP, which is able to use a
clocksource that suffers from up to 500 microseconds of skew per second.
Both the TSC and the HPET use default uncertainty_margin. When the
readout interval gets stretched the default uncertainty_margin is no
longer a suitable lower bound for evaluating skew - it imposes a limit
that is far stricter than the skew with which NTP can deal.

The root causes of the skew being directly proportinal to the length of
the readout interval are:

  * the inaccuracy of the shift/mult pairs of clocksources and the watchdog
  * the conversion to nanoseconds is imprecise for large readout intervals

Prevent this by skipping the current watchdog check if the readout
interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout
interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin
(of the TSC and HPET) corresponds to a limit on clocksource skew of 250
ppm (microseconds of skew per second).  To keep the limit imposed by NTP
(500 microseconds of skew per second) for all possible readout intervals,
the margins would have to be scaled so that the threshold value is
proportional to the length of the actual readout interval.

As for why the readout interval may get stretched: Since the watchdog is
executed in softirq context the expiration of the watchdog timer can get
severely delayed on account of a ksoftirqd thread not getting to run in a
timely manner. Surely, a system with such belated softirq execution is not
working well and the scheduling issue should be looked into but the
clocksource watchdog should be able to deal with it accordingly.

Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240122172350.GA740@incl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23 08:42:22 +01:00
..
bpf bpf: Set uattr->batch.count as zero before batched update or deletion 2024-02-23 08:42:07 +01:00
cgroup cgroup: Remove duplicates in cgroup v1 tasks file 2023-10-25 11:54:16 +02:00
configs
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-25 14:37:56 -08:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 14:37:45 -08:00
entry entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set 2023-01-04 11:39:22 +01:00
events perf: Fix the nr_addr_filters fix 2024-02-23 08:42:15 +01:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:06:44 +01:00
gcov gcov: add support for checksum field 2023-01-14 10:16:24 +01:00
irq genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware 2023-11-28 16:54:58 +00:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-27 08:43:57 +02:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:06:52 +01:00
locking lockdep: Fix block chain corruption 2023-12-08 08:46:09 +01:00
power PM: hibernate: Enforce ordering during image compression/decompression 2024-02-23 08:41:52 +01:00
printk printk: ringbuffer: Fix truncating buffer size min_t cast 2023-09-19 12:20:21 +02:00
rcu rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects 2023-11-28 16:54:57 +00:00
sched sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0 2023-11-20 11:06:43 +01:00
time clocksource: Skip watchdog check for large watchdog intervals 2024-02-23 08:42:22 +01:00
trace tracing: Ensure visibility when inserting an element into tracing_map 2024-02-23 08:41:56 +01:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-14 10:16:14 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:41:53 +01:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-31 17:15:13 +02:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-09-03 10:09:31 +02:00
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 16:54:56 +00:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:42:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:20:13 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:23:45 +02:00
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c hrtimers: Push pending hrtimers away from outgoing CPU earlier 2023-12-13 18:26:56 +01:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:44:30 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:23:21 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:40:18 +01:00
fork.c kernel/fork: beware of __put_task_struct() calling context 2023-09-23 11:01:05 +02:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: make some symbols static 2020-08-12 10:58:02 -07:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-27 08:43:40 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:45:37 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:47:33 +02:00
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list 2024-01-25 14:37:51 -08:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:10:29 +02:00
kthread.c kthread: Fix PF_KTHREAD vs to_kthread() race 2021-09-03 10:09:31 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module-internal.h
module.c modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules 2023-09-19 12:20:02 +02:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 16:54:51 +00:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:23:21 +01:00
params.c params: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-03-11 16:39:19 +01:00
pid.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-21 15:16:05 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:20:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:54:58 +00:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:47:34 +02:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-11 16:40:04 +01:00
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:40:03 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:01:05 +02:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Fix setting loaded filter count during TSYNC 2021-08-18 08:59:06 +02:00
signal.c task_work: unconditionally run task_work from get_signal() 2023-01-04 11:39:23 +01:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:23:29 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-31 17:15:13 +02:00
sys.c kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() 2023-04-26 11:27:38 +02:00
sysctl-test.c
sysctl.c sysctl: move some boundary constants from sysctl.c to sysctl_vals 2023-06-28 10:28:09 +02:00
task_work.c task_work: add helper for more targeted task_work canceling 2023-01-04 11:39:23 +01:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:23:17 +02:00
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-03-17 08:45:13 +01:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-27 08:43:40 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:54:56 +00:00
workqueue_internal.h
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:46:13 +01:00