Commit Graph

854 Commits

Author SHA1 Message Date
Zhiguo Niu
1472dd897f lockdep: fix deadlock issue between lockdep and rcu
commit a6f88ac32c6e63e69c595bfae220d8641704c9b7 upstream.

There is a deadlock scenario between lockdep and rcu when
rcu nocb feature is enabled, just as following call stack:

     rcuop/x
-000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?)
-001|queued_spin_lock(inline) // try to hold nocb_gp_lock
-001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80)
-002|__raw_spin_lock_irqsave(inline)
-002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80)
-003|wake_nocb_gp_defer(inline)
-003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680)
-004|__call_rcu_common(inline)
-004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?)
-005|call_rcu_zapped(inline)
-005|free_zapped_rcu(ch = ?)// hold graph lock
-006|rcu_do_batch(rdp = 0xFFFFFF817F245680)
-007|nocb_cb_wait(inline)
-007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680)
-008|kthread(_create = 0xFFFFFF80803122C0)
-009|ret_from_fork(asm)

     rcuop/y
-000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0)
-001|queued_spin_lock()
-001|lockdep_lock()
-001|graph_lock() // try to hold graph lock
-002|lookup_chain_cache_add()
-002|validate_chain()
-003|lock_acquire
-004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80)
-005|lock_timer_base(inline)
-006|mod_timer(inline)
-006|wake_nocb_gp_defer(inline)// hold nocb_gp_lock
-006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680)
-007|__call_rcu_common(inline)
-007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?)
-008|call_rcu_hurry(inline)
-008|rcu_sync_call(inline)
-008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58)
-009|rcu_do_batch(rdp = 0xFFFFFF817F266680)
-010|nocb_cb_wait(inline)
-010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680)
-011|kthread(_create = 0xFFFFFF8080363740)
-012|ret_from_fork(asm)

rcuop/x and rcuop/y are rcu nocb threads with the same nocb gp thread.
This patch release the graph lock before lockdep call_rcu.

Fixes: a0b0fd53e1 ("locking/lockdep: Free lock classes that are no longer in use")
Cc: stable@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20240620225436.3127927-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17 15:08:05 +02:00
Roland Xu
1401da1486 rtmutex: Drop rt_mutex::wait_lock before scheduling
commit d33d26036a0274b472299d7dcdaa5fb34329f91b upstream.

rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held.  In the
good case it returns with the lock held and in the deadlock case it emits a
warning and goes into an endless scheduling loop with the lock held, which
triggers the 'scheduling in atomic' warning.

Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning
and dropping into the schedule for ever loop.

[ tglx: Moved unlock before the WARN(), removed the pointless comment,
  	massaged changelog, added Fixes tag ]

Fixes: 3d5c9340d1 ("rtmutex: Handle deadlock detection smarter")
Signed-off-by: Roland Xu <mu001999@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300MB0635.AUSP300.PROD.OUTLOOK.COM
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12 11:06:51 +02:00
Peter Zijlstra
1301467cbe lockdep: Fix block chain corruption
[ Upstream commit bca4104b00fec60be330cd32818dd5c70db3d469 ]

Kent reported an occasional KASAN splat in lockdep. Mark then noted:

> I suspect the dodgy access is to chain_block_buckets[-1], which hits the last 4
> bytes of the redzone and gets (incorrectly/misleadingly) attributed to
> nr_large_chain_blocks.

That would mean @size == 0, at which point size_to_bucket() returns -1
and the above happens.

alloc_chain_hlocks() has 'size - req', for the first with the
precondition 'size >= rq', which allows the 0.

This code is trying to split a block, del_chain_block() takes what we
need, and add_chain_block() puts back the remainder, except in the
above case the remainder is 0 sized and things go sideways.

Fixes: 810507fe6f ("locking/lockdep: Reuse freed chain_hlocks entries")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lkml.kernel.org/r/20231121114126.GH8262@noisy.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08 08:46:09 +01:00
John Stultz
9ed2d68b39 locking/ww_mutex/test: Fix potential workqueue corruption
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]

In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.

Often this would cause strange crashes as the mutexes would be
freed while they were being used.

Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.

Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.

It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.

So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28 16:54:49 +00:00
Peter Zijlstra
4cea839177 locking/lockdep: Fix lockdep_init_map_*() confusion
[ Upstream commit eae6d58d67d9739be5f7ae2dbead1d0ef6528243 ]

Commit dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t") added yet
another lockdep_init_map_*() variant, but forgot to update all the
existing users of the most complicated version.

This could lead to a loss of lock_type and hence an incorrect report.
Given the relative rarity of both local_lock and these annotations,
this is unlikely to happen in practise, still, best fix things.

Fixes: dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YqyEDtoan20K0CVD@worktop.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21 15:15:33 +02:00
Tetsuo Handa
19bf7199c3 lockdep: Allow tuning tracing capacity constants.
commit 5dc33592e95534dc8455ce3e9baaaf3dae0fff82 upstream.

Since syzkaller continues various test cases until the kernel crashes,
syzkaller tends to examine more locking dependencies than normal systems.
As a result, syzbot is reporting that the fuzz testing was terminated
due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis
via /proc/lockdep* did not show any obvious culprit [4] [5], we have no
choice but allow tuning tracing capacity constants.

[1] https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
[2] https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
[3] https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
[4] https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f233@i-love.sakura.ne.jp
[5] https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae7d2@i-love.sakura.ne.jp

References: https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21 15:15:27 +02:00
Waiman Long
ecd384c436 locking/lockdep: Iterate lock_classes directly when reading lockdep files
[ Upstream commit fb7275acd6fb988313dddd8d3d19efa70d9015ad ]

When dumping lock_classes information via /proc/lockdep, we can't take
the lockdep lock as the lock hold time is indeterminate. Iterating
over all_lock_classes without holding lock can be dangerous as there
is a slight chance that it may branch off to other lists leading to
infinite loop or even access invalid memory if changes are made to
all_lock_classes list in parallel.

To avoid this problem, iteration of lock classes is now done directly
on the lock_classes array itself. The lock_classes_in_use bitmap is
checked to see if the lock class is being used. To avoid iterating
the full array all the times, a new max_lock_class_idx value is added
to track the maximum lock_class index that is currently being used.

We can theoretically take the lockdep lock for iterating all_lock_classes
when other lockdep files (lockdep_stats and lock_stat) are accessed as
the lock hold time will be shorter for them. For consistency, they are
also modified to iterate the lock_classes array directly.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220211035526.1329503-2-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:40:32 +02:00
Waiman Long
68c80088f5 locking/lockdep: Avoid potential access of invalid memory in lock_class
commit 61cc4534b6550997c97a03759ab46b29d44c0017 upstream.

It was found that reading /proc/lockdep after a lockdep splat may
potentially cause an access to freed memory if lockdep_unregister_key()
is called after the splat but before access to /proc/lockdep [1]. This
is due to the fact that graph_lock() call in lockdep_unregister_key()
fails after the clearing of debug_locks by the splat process.

After lockdep_unregister_key() is called, the lock_name may be freed
but the corresponding lock_class structure still have a reference to
it. That invalid memory pointer will then be accessed when /proc/lockdep
is read by a user and a use-after-free (UAF) error will be reported if
KASAN is enabled.

To fix this problem, lockdep_unregister_key() is now modified to always
search for a matching key irrespective of the debug_locks state and
zap the corresponding lock class if a matching one is found.

[1] https://lore.kernel.org/lkml/77f05c15-81b6-bddd-9650-80d5f23fe330@i-love.sakura.ne.jp/

Fixes: 8b39adbee8 ("locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Link: https://lkml.kernel.org/r/20220103023558.1377055-1-longman@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:39:48 +02:00
Cheng Jui Wang
6062d1267f lockdep: Correct lock_classes index mapping
commit 28df029d53a2fd80c1b8674d47895648ad26dcfb upstream.

A kernel exception was hit when trying to dump /proc/lockdep_chains after
lockdep report "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!":

Unable to handle kernel paging request at virtual address 00054005450e05c3
...
00054005450e05c3] address between user and kernel address ranges
...
pc : [0xffffffece769b3a8] string+0x50/0x10c
lr : [0xffffffece769ac88] vsnprintf+0x468/0x69c
...
 Call trace:
  string+0x50/0x10c
  vsnprintf+0x468/0x69c
  seq_printf+0x8c/0xd8
  print_name+0x64/0xf4
  lc_show+0xb8/0x128
  seq_read_iter+0x3cc/0x5fc
  proc_reg_read_iter+0xdc/0x1d4

The cause of the problem is the function lock_chain_get_class() will
shift lock_classes index by 1, but the index don't need to be shifted
anymore since commit 01bb6f0af9 ("locking/lockdep: Change the range
of class_idx in held_lock struct") already change the index to start
from 0.

The lock_classes[-1] located at chain_hlocks array. When printing
lock_classes[-1] after the chain_hlocks entries are modified, the
exception happened.

The output of lockdep_chains are incorrect due to this problem too.

Fixes: f611e8cf98 ("lockdep: Take read/write status in consideration when generate chainkey")
Signed-off-by: Cheng Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20220210105011.21712-1-cheng-jui.wang@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:01:08 +01:00
Randy Dunlap
cffed7e631 kernel: delete repeated words in comments
[ Upstream commit c034f48e99907d5be147ac8f0f3e630a9307c2be ]

Drop repeated words in kernel/events/.
{if, the, that, with, time}

Drop repeated words in kernel/locking/.
{it, no, the}

Drop repeated words in kernel/sched/.
{in, not}

Link: https://lkml.kernel.org/r/20210127023412.26292-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Will Deacon <will@kernel.org>	[kernel/locking/]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-01 17:25:45 +01:00
Sebastian Andrzej Siewior
fb4a58f519 lockdep: Let lock_is_held_type() detect recursive read as read
[ Upstream commit 2507003a1d10917c9158077bf6030719d02c941e ]

lock_is_held_type(, 1) detects acquired read locks. It only recognized
locks acquired with lock_acquire_shared(). Read locks acquired with
lock_acquire_shared_recursive() are not recognized because a `2' is
stored as the read value.

Rework the check to additionally recognise lock's read value one and two
as a read held lock.

Fixes: e918188611 ("locking: More accurate annotations for read_lock()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20210903084001.lblecrvz4esl4mrr@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:04:02 +01:00
Peter Zijlstra
28118dcc87 locking/lockdep: Avoid RCU-induced noinstr fail
[ Upstream commit ce0b9c805dd66d5e49fd53ec5415ae398f4c56e6 ]

vmlinux.o: warning: objtool: look_up_lock_class()+0xc7: call to rcu_read_lock_any_held() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095148.311980536@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:03:53 +01:00
Peter Zijlstra
d5462a630f locking/lockdep: Mark local_lock_t
[ Upstream commit dfd5e3f5fe27bda91d5cc028c86ffbb7a0614489 ]

The local_lock_t's are special, because they cannot form IRQ
inversions, make sure we can tell them apart from the rest of the
locks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15 09:50:41 +02:00
Peter Zijlstra
97bc540bfb locking/mutex: Fix HANDOFF condition
[ Upstream commit 048661a1f963e9517630f080687d48af79ed784c ]

Yanfei reported that setting HANDOFF should not depend on recomputing
@first, only on @first state. Which would then give:

  if (ww_ctx || !first)
    first = __mutex_waiter_is_first(lock, &waiter);
  if (first)
    __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);

But because 'ww_ctx || !first' is basically 'always' and the test for
first is relatively cheap, omit that first branch entirely.

Reported-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com>
Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15 09:50:23 +02:00
Peter Zijlstra
fca9e784a3 lockdep: Fix wait-type for empty stack
[ Upstream commit f8b298cc39f0619544c607eaef09fd0b2afd10f3 ]

Even the very first lock can violate the wait-context check, consider
the various IRQ contexts.

Fixes: de8f5e4f2d ("lockdep: Introduce wait-type checks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210617190313.256987481@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:10 +02:00
Boqun Feng
963baea02d lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
[ Upstream commit 7b1f8c6179769af6ffa055e1169610b51d71edd5 ]

In the step #3 of check_irq_usage(), we seach backwards to find a lock
whose usage conflicts the usage of @target_entry1 on safe/unsafe.
However, we should only keep the irq-unsafe usage of @target_entry1 into
consideration, because it could be a case where a lock is hardirq-unsafe
but soft-safe, and in check_irq_usage() we find it because its
hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but
currently since we don't filter out the other usage bits, so we may find
a lock dependency path softirq-unsafe -> softirq-safe, which in fact
doesn't cause a deadlock. And this may cause misleading lockdep splats.

Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the
backwards search.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210618170110.3699115-4-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:56 +02:00
Boqun Feng
93cc59d8d0 locking/lockdep: Fix the dep path printing for backwards BFS
[ Upstream commit 69c7a5fb2482636f525f016c8333fdb9111ecb9d ]

We use the same code to print backwards lock dependency path as the
forwards lock dependency path, and this could result into incorrect
printing because for a backwards lock_list ->trace is not the call trace
where the lock of ->class is acquired.

Fix this by introducing a separate function on printing the backwards
dependency path. Also add a few comments about the printing while we are
at it.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210618170110.3699115-2-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:56 +02:00
Peter Zijlstra
ca2acbd548 locking/lockdep: Improve noinstr vs errors
[ Upstream commit 49faa77759b211fff344898edc23bb780707fff5 ]

Better handle the failure paths.

  vmlinux.o: warning: objtool: debug_locks_off()+0x23: call to console_verbose() leaves .noinstr.text section
  vmlinux.o: warning: objtool: debug_locks_off()+0x19: call to __kasan_check_write() leaves .noinstr.text section

  debug_locks_off+0x19/0x40:
  instrument_atomic_write at include/linux/instrumented.h:86
  (inlined by) __debug_locks_off at include/linux/debug_locks.h:17
  (inlined by) debug_locks_off at lib/debug_locks.c:41

Fixes: 6eebad1ad3 ("lockdep: __always_inline more for noinstr")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210621120120.784404944@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30 08:47:18 -04:00
Zqiang
e354e3744b locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
[ Upstream commit 3a010c493271f04578b133de977e0e5dd2848cea ]

When a interruptible mutex locker is interrupted by a signal
without acquiring this lock and removed from the wait queue.
if the mutex isn't contended enough to have a waiter
put into the wait queue again, the setting of the WAITER
bit will force mutex locker to go into the slowpath to
acquire the lock every time, so if the wait queue is empty,
the WAITER bit need to be clear.

Fixes: 040a0a3710 ("mutex: Add support for wound/wait style locks")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210517034005.30828-1-qiang.zhang@windriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26 12:06:50 +02:00
Leo Yan
5dfed1be0e locking/lockdep: Correct calling tracepoints
[ Upstream commit 89e70d5c583c55088faa2201d397ee30a15704aa ]

The commit eb1f00237a ("lockdep,trace: Expose tracepoints") reverses
tracepoints for lock_contended() and lock_acquired(), thus the ftrace
log shows the wrong locking sequence that "acquired" event is prior to
"contended" event:

  <idle>-0       [001] d.s3 20803.501685: lock_acquire: 0000000008b91ab4 &sg_policy->update_lock
  <idle>-0       [001] d.s3 20803.501686: lock_acquired: 0000000008b91ab4 &sg_policy->update_lock
  <idle>-0       [001] d.s3 20803.501689: lock_contended: 0000000008b91ab4 &sg_policy->update_lock
  <idle>-0       [001] d.s3 20803.501690: lock_release: 0000000008b91ab4 &sg_policy->update_lock

This patch fixes calling tracepoints for lock_contended() and
lock_acquired().

Fixes: eb1f00237a ("lockdep,trace: Expose tracepoints")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210512120937.90211-1-leo.yan@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26 12:06:50 +02:00
Ali Saidi
82fa9ced35 locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
[ Upstream commit 84a24bf8c52e66b7ac89ada5e3cfbe72d65c1896 ]

While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock.  The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads
following the lock acquisition to observe values speculatively before
the write lock is truly acquired.

We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from
under it.

  Writer                                | Reader
  --------------------------------------------------------------------------------
  ep_scan_ready_list()                  |
  |- write_lock_irq()                   |
      |- queued_write_lock_slowpath()   |
	|- atomic_cond_read_acquire()   |
				        | read_lock_irqsave(&ep->lock, flags);
     --> (observes value before unlock) |  chain_epi_lockless()
     |                                  |    epi->next = xchg(&ep->ovflist, epi);
     |                                  | read_unlock_irqrestore(&ep->lock, flags);
     |                                  |
     |     atomic_cmpxchg_relaxed()     |
     |-- READ_ONCE(ep->ovflist);        |

A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire
semantics addresses this issue at which point the atomic_cond_read can
be switched to use relaxed semantics.

Fixes: b519b56e37 ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock")
Signed-off-by: Ali Saidi <alisaidi@amazon.com>
[peterz: use try_cmpxchg()]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-28 13:40:00 +02:00
Tetsuo Handa
f12e8cf6b1 lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
[ Upstream commit 3a85969e9d912d5dd85362ee37b5f81266e00e77 ]

Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest to developers to check whether initialization functions for objects
were called, before making developers wonder what annotation is missing.

[ mingo: Minor tweaks to the message. ]

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-21 13:00:51 +02:00
Arnd Bergmann
c65a000a23 lockdep: Address clang -Wformat warning printing for %hd
commit 6d48b7912cc72275dc7c59ff961c8bac7ef66a92 upstream.

Clang doesn't like format strings that truncate a 32-bit
value to something shorter:

  kernel/locking/lockdep.c:709:4: error: format specifies type 'short' but the argument has type 'int' [-Werror,-Wformat]

In this case, the warning is a slightly questionable, as it could realize
that both class->wait_type_outer and class->wait_type_inner are in fact
8-bit struct members, even though the result of the ?: operator becomes an
'int'.

However, there is really no point in printing the number as a 16-bit
'short' rather than either an 8-bit or 32-bit number, so just change
it to a normal %d.

Fixes: de8f5e4f2d ("lockdep: Introduce wait-type checks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322115531.3987555-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14 08:42:12 +02:00
Waiman Long
905ef030bd locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
[ Upstream commit 5de2055d31ea88fd9ae9709ac95c372a505a60fa ]

The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the
function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx)
combination is repetitive.

In fact, ww_ctx should not be used at all if !use_ww_ctx.  Simplify
ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an
clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx &&
ww_ctx) by just (ww_ctx).

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Link: https://lore.kernel.org/r/20210316153119.13802-2-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07 15:00:06 +02:00
Peter Zijlstra
d48f03f6b2 locking/lockdep: Avoid unmatched unlock
[ Upstream commit 7f82e631d236cafd28518b998c6d4d8dc2ef68f6 ]

Commit f6f48e1804 ("lockdep: Teach lockdep about "USED" <- "IN-NMI"
inversions") overlooked that print_usage_bug() releases the graph_lock
and called it without the graph lock held.

Fixes: f6f48e1804 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/YBfkuyIfB1+VRxXP@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:47 +01:00
Peter Zijlstra
d2c880ed2c locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
[ Upstream commit 77ca93a6b1223e210e58e1000c09d8d420403c94 ]

  vmlinux.o: warning: objtool: lock_is_held_type()+0x60: call to check_flags.part.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210106144017.652218215@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-07 15:37:14 +01:00
Thomas Gleixner
6d28ac502f rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
commit 2156ac1934166d6deb6cd0f6ffc4c1076ec63697 upstream

Nothing uses the argument. Remove it as preparation to use
pi_state_update_owner().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:55:17 +01:00
Peter Zijlstra
ba548335c8 locking/lockdep: Cure noinstr fail
commit 0afda3a888dccf12557b41ef42eee942327d122b upstream.

When the compiler doesn't feel like inlining, it causes a noinstr
fail:

  vmlinux.o: warning: objtool: lock_is_held_type()+0xb: call to lockdep_enabled() leaves .noinstr.text section

Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210106144017.592595176@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:21 +01:00
Eric W. Biederman
933b7cc860 rwsem: Implement down_read_interruptible
[ Upstream commit 31784cff7ee073b34d6eddabb95e3be2880a425c ]

In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_interruptible.  This is needed for perf_event_open to be
converted (with no semantic changes) from working on a mutex to
wroking on a rwsem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09 13:46:24 +01:00
Eric W. Biederman
27bae39e4f rwsem: Implement down_read_killable_nested
[ Upstream commit 0f9368b5bf6db0c04afc5454b1be79022a681615 ]

In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_killable_nested.  This is needed so that kcmp_lock
can be converted from working on a mutexes to working on rw_semaphores.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09 13:46:24 +01:00
Boqun Feng
43be4388e9 lockdep: Put graph lock/unlock under lock_recursion protection
A warning was hit when running xfstests/generic/068 in a Hyper-V guest:

[...] ------------[ cut here ]------------
[...] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
[...] WARNING: CPU: 2 PID: 1350 at kernel/locking/lockdep.c:5280 check_flags.part.0+0x165/0x170
[...] ...
[...] Workqueue: events pwq_unbound_release_workfn
[...] RIP: 0010:check_flags.part.0+0x165/0x170
[...] ...
[...] Call Trace:
[...]  lock_is_held_type+0x72/0x150
[...]  ? lock_acquire+0x16e/0x4a0
[...]  rcu_read_lock_sched_held+0x3f/0x80
[...]  __send_ipi_one+0x14d/0x1b0
[...]  hv_send_ipi+0x12/0x30
[...]  __pv_queued_spin_unlock_slowpath+0xd1/0x110
[...]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[...]  .slowpath+0x9/0xe
[...]  lockdep_unregister_key+0x128/0x180
[...]  pwq_unbound_release_workfn+0xbb/0xf0
[...]  process_one_work+0x227/0x5c0
[...]  worker_thread+0x55/0x3c0
[...]  ? process_one_work+0x5c0/0x5c0
[...]  kthread+0x153/0x170
[...]  ? __kthread_bind_mask+0x60/0x60
[...]  ret_from_fork+0x1f/0x30

The cause of the problem is we have call chain lockdep_unregister_key()
-> <irq disabled by raw_local_irq_save()> lockdep_unlock() ->
arch_spin_unlock() -> __pv_queued_spin_unlock_slowpath() -> pv_kick() ->
__send_ipi_one() -> trace_hyperv_send_ipi_one().

Although this particular warning is triggered because Hyper-V has a
trace point in ipi sending, but in general arch_spin_unlock() may call
another function having a trace point in it, so put the arch_spin_lock()
and arch_spin_unlock() after lock_recursion protection to fix this
problem and avoid similiar problems.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201113110512.1056501-1-boqun.feng@gmail.com
2020-11-17 13:15:35 +01:00
Boqun Feng
d61fc96a37 lockdep: Avoid to modify chain keys in validate_chain()
Chris Wilson reported a problem spotted by check_chain_key(): a chain
key got changed in validate_chain() because we modify the ->read in
validate_chain() to skip checks for dependency adding, and ->read is
taken into calculation for chain key since commit f611e8cf98
("lockdep: Take read/write status in consideration when generate
chainkey").

Fix this by avoiding to modify ->read in validate_chain() based on two
facts: a) since we now support recursive read lock detection, there is
no need to skip checks for dependency adding for recursive readers, b)
since we have a), there is only one case left (nest_lock) where we want
to skip checks in validate_chain(), we simply remove the modification
for ->read and rely on the return value of check_deadlock() to skip the
dependency adding.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201102053743.450459-1-boqun.feng@gmail.com
2020-11-10 18:38:38 +01:00
Linus Torvalds
8d99084efc A couple of locking fixes:
- Fix incorrect failure injection handling on the fuxtex code
 
  - Prevent a preemption warning in lockdep when tracking local_irq_enable()
    and interrupts are already enabled
 
  - Remove more raw_cpu_read() usage from lockdep which causes state
    corruption on !X86 architectures.
 
  - Make the nr_unused_locks accounting in lockdep correct again.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evEUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYod93EADG90GmRBYQxn6y2eQUKE/9f5SMiMFJ
 KdvuqNBqHOhYm3iUPbZJcb0P/JZi1NHP0fBFMishdESGi96tD96K7T04WD0gmjtm
 ArWFroe8uzEYtY9atlEwM0Nrvq0w8ZBLv9x1adXzJ59vB8/8Uq+wzYioSWn9yMcv
 ye3jfVyAlM7ouFHDQAA36s/nhvZfxms4C0t+6S3gjVTIp/6riGuYh5t7dbXUMlnu
 nGLiIJFjU+ekurweVDGpqD/nAxYfqf3UxebWnrosf7iu6suwYwaPFZGZ/kxlbr5e
 qWx0B1RuhjAoefVJlPTkHmuhd0SnH/Gm/tTNkQ3LidJhPTIhLJlb7zffwyZlc510
 VdaUipfZ6bNqDD6/dK6fKJtdKSE4w/z3pT53954NUD5zw/jIcHlgnaQieh72DH+F
 1EKqmsNrwHAxYfMndQxLGdIoBScUAFzHzDnzsY9KKS2cfhChljzLa2nDIfMsDfKQ
 aROugzEbZPQEb1iWUEOF3XopcuZzZQCaPlLDLvAnsBeYEPm0gdmbKFPFsDjOyBVX
 /Qc41O7DyHKcoiLX2zM2c7CxnV5J6YEZz3jQSZLFlpH9Ih7jwAl9/6VirggNUNvV
 YVsgM/myhYQtJBqHHojNppZFFW3KdgfxWuY7+qt7Ox5w/ck5qYQwRnoB4FROwVHV
 pzcYTBE5qkQnIw==
 =S01o
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - Fix incorrect failure injection handling in the fuxtex code

   - Prevent a preemption warning in lockdep when tracking
     local_irq_enable() and interrupts are already enabled

   - Remove more raw_cpu_read() usage from lockdep which causes state
     corruption on !X86 architectures.

   - Make the nr_unused_locks accounting in lockdep correct again"

* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix nr_unused_locks accounting
  locking/lockdep: Remove more raw_cpu_read() usage
  futex: Fix incorrect should_fail_futex() handling
  lockdep: Fix preemption WARN for spurious IRQ-enable
2020-11-01 11:08:17 -08:00
Peter Zijlstra
1a39340865 lockdep: Fix nr_unused_locks accounting
Chris reported that commit 24d5a3bffef1 ("lockdep: Fix
usage_traceoverflow") breaks the nr_unused_locks validation code
triggered by /proc/lockdep_stats.

By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad
indicator for accounting nr_unused_locks; simplyfy by using any first
bit.

Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net
2020-10-30 17:07:18 +01:00
Peter Zijlstra
d48e385003 locking/lockdep: Remove more raw_cpu_read() usage
I initially thought raw_cpu_read() was OK, since if it is !0 we have
IRQs disabled and can't get migrated, so if we get migrated both CPUs
must have 0 and it doesn't matter which 0 we read.

And while that is true; it isn't the whole store, on pretty much all
architectures (except x86) this can result in computing the address for
one CPU, getting migrated, the old CPU continuing execution with another
task (possibly setting recursion) and then the new CPU reading the value
of the old CPU, which is no longer 0.

Similer to:

  baffd723e4 ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"")

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net
2020-10-30 17:07:18 +01:00
Peter Zijlstra
f8e48a3dca lockdep: Fix preemption WARN for spurious IRQ-enable
It is valid (albeit uncommon) to call local_irq_enable() without first
having called local_irq_disable(). In this case we enter
lockdep_hardirqs_on*() with IRQs enabled and trip a preemption warning
for using __this_cpu_read().

Use this_cpu_read() instead to avoid the warning.

Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Reported-by: syzbot+53f8ce8bbc07924b6417@syzkaller.appspotmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-10-22 12:37:22 +02:00
Linus Torvalds
41eea65e2a Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:

 - Debugging for smp_call_function()

 - RT raw/non-raw lock ordering fixes

 - Strict grace periods for KASAN

 - New smp_call_function() torture test

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

[ This doesn't actually pull the tag - I've dropped the last merge from
  the RCU branch due to questions about the series.   - Linus ]

* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  smp: Make symbol 'csd_bug_count' static
  kernel/smp: Provide CSD lock timeout diagnostics
  smp: Add source and destination CPUs to __call_single_data
  rcu: Shrink each possible cpu krcp
  rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  torture: Add gdb support
  rcutorture: Allow pointer leaks to test diagnostic code
  rcutorture: Hoist OOM registry up one level
  refperf: Avoid null pointer dereference when buf fails to allocate
  rcutorture: Properly synchronize with OOM notifier
  rcutorture: Properly set rcu_fwds for OOM handling
  torture: Add kvm.sh --help and update help message
  rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
  torture: Update initrd documentation
  rcutorture: Replace HTTP links with HTTPS ones
  locktorture: Make function torture_percpu_rwsem_init() static
  torture: document --allcpus argument added to the kvm.sh script
  rcutorture: Output number of elapsed grace periods
  rcutorture: Remove KCSAN stubs
  rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
  ...
2020-10-18 14:34:50 -07:00
Ingo Molnar
e705d39796 Merge branch 'locking/urgent' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-09 08:55:17 +02:00
Peter Zijlstra
4d004099a6 lockdep: Fix lockdep recursion
Steve reported that lockdep_assert*irq*(), when nested inside lockdep
itself, will trigger a false-positive.

One example is the stack-trace code, as called from inside lockdep,
triggering tracing, which in turn calls RCU, which then uses
lockdep_assert_irqs_disabled().

Fixes: a21ee6055c ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-09 08:53:30 +02:00
Peter Zijlstra
2bb8945bcc lockdep: Fix usage_traceoverflow
Basically print_lock_class_header()'s for loop is out of sync with the
the size of of ->usage_traces[].

Also clean things up a bit while at it, to avoid such mishaps in the future.

Fixes: 23870f1227 ("locking/lockdep: Fix "USED" <- "IN-NMI" inversions")
Reported-by: Qian Cai <cai@redhat.com>
Debugged-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Qian Cai <cai@redhat.com>
Link: https://lkml.kernel.org/r/20200930094937.GE2651@hirez.programming.kicks-ass.net
2020-10-09 08:53:08 +02:00
Ingo Molnar
b36c830f8c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull v5.10 RCU changes from Paul E. McKenney:

- Debugging for smp_call_function().

- Strict grace periods for KASAN.  The point of this series is to find
  RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
  Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
  further disabled by dfefault.  Finally, the help text includes
  a goodly list of scary caveats.

- New smp_call_function() torture test.

- Torture-test updates.

- Documentation updates.

- Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-09 08:21:56 +02:00
Boqun Feng
6d1823ccc4 lockdep: Optimize the memory usage of circular queue
Qian Cai reported a BFS_EQUEUEFULL warning [1] after read recursive
deadlock detection merged into tip tree recently. Unlike the previous
lockep graph searching, which iterate every lock class (every node in
the graph) exactly once, the graph searching for read recurisve deadlock
detection needs to iterate every lock dependency (every edge in the
graph) once, as a result, the maximum memory cost of the circular queue
changes from O(V), where V is the number of lock classes (nodes or
vertices) in the graph, to O(E), where E is the number of lock
dependencies (edges), because every lock class or dependency gets
enqueued once in the BFS. Therefore we hit the BFS_EQUEUEFULL case.

However, actually we don't need to enqueue all dependencies for the BFS,
because every time we enqueue a dependency, we almostly enqueue all
other dependencies in the same dependency list ("almostly" is because
we currently check before enqueue, so if a dependency doesn't pass the
check stage we won't enqueue it, however, we can always do in reverse
ordering), based on this, we can only enqueue the first dependency from
a dependency list and every time we want to fetch a new dependency to
work, we can either:

  1)	fetch the dependency next to the current dependency in the
	dependency list
or

  2)	if the dependency in 1) doesn't exist, fetch the dependency from
	the queue.

With this approach, the "max bfs queue depth" for a x86_64_defconfig +
lockdep and selftest config kernel can get descreased from:

        max bfs queue depth:                   201

to (after apply this patch)

        max bfs queue depth:                   61

While I'm at it, clean up the code logic a little (e.g. directly return
other than set a "ret" value and goto the "exit" label).

[1]: https://lore.kernel.org/lkml/17343f6f7f2438fc376125384133c5ba70c2a681.camel@redhat.com/

Reported-by: Qian Cai <cai@redhat.com>
Reported-by: syzbot+62ebe501c1ce9a91f68c@syzkaller.appspotmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200917080210.108095-1-boqun.feng@gmail.com
2020-09-29 09:56:59 +02:00
Hou Tao
e6b1a44ecc locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
that percpu-rwsem is a blocking primitive, should be just fine.

However, file_end_write() is used from IRQ context and will cause
load-store issues on architectures where the per-cpu accessors are not
natively irq-safe.

Fix it by using the IRQ-safe this_cpu_*() for operations on
read_count. This will generate more expensive code on a number of
platforms, which might cause a performance regression for some of the
other percpu-rwsem users.

If any such is reported, we can consider alternative solutions.

Fixes: 70fe2f4815 ("aio: fix freeze protection of aio writes")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com
2020-09-16 16:26:56 +02:00
peterz@infradead.org
23870f1227 locking/lockdep: Fix "USED" <- "IN-NMI" inversions
During the LPC RCU BoF Paul asked how come the "USED" <- "IN-NMI"
detector doesn't trip over rcu_read_lock()'s lockdep annotation.

Looking into this I found a very embarrasing typo in
verify_lock_unused():

	-	if (!(class->usage_mask & LOCK_USED))
	+	if (!(class->usage_mask & LOCKF_USED))

fixing that will indeed cause rcu_read_lock() to insta-splat :/

The above typo means that instead of testing for: 0x100 (1 <<
LOCK_USED), we test for 8 (LOCK_USED), which corresponds to (1 <<
LOCK_ENABLED_HARDIRQ).

So instead of testing for _any_ used lock, it will only match any lock
used with interrupts enabled.

The rcu_read_lock() annotation uses .check=0, which means it will not
set any of the interrupt bits and will thus never match.

In order to properly fix the situation and allow rcu_read_lock() to
correctly work, split LOCK_USED into LOCK_USED and LOCK_USED_READ and by
having .read users set USED_READ and test USED, pure read-recursive
locks are permitted.

Fixes: f6f48e1804 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20200902160323.GK1362448@hirez.programming.kicks-ass.net
2020-09-03 11:19:42 +02:00
Boqun Feng
f611e8cf98 lockdep: Take read/write status in consideration when generate chainkey
Currently, the chainkey of a lock chain is a hash sum of the class_idx
of all the held locks, the read/write status are not taken in to
consideration while generating the chainkey. This could result into a
problem, if we have:

	P1()
	{
		read_lock(B);
		lock(A);
	}

	P2()
	{
		lock(A);
		read_lock(B);
	}

	P3()
	{
		lock(A);
		write_lock(B);
	}

, and P1(), P2(), P3() run one by one. And when running P2(), lockdep
detects such a lock chain A -> B is not a deadlock, then it's added in
the chain cache, and then when running P3(), even if it's a deadlock, we
could miss it because of the hit of chain cache. This could be confirmed
by self testcase "chain cached mixed R-L/L-W ".

To resolve this, we use concept "hlock_id" to generate the chainkey, the
hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16
type. With this, the chainkeys are different is the lock sequences have
the same locks but different read/write status.

Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks
array now store the "hlock_id"s rather than lock_class indexes.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-15-boqun.feng@gmail.com
2020-08-26 12:42:06 +02:00
Boqun Feng
621c9dac0e lockdep: Add recursive read locks into dependency graph
Since we have all the fundamental to handle recursive read locks, we now
add them into the dependency graph.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-13-boqun.feng@gmail.com
2020-08-26 12:42:06 +02:00
Boqun Feng
f08e388857 lockdep: Fix recursive read lock related safe->unsafe detection
Currently, in safe->unsafe detection, lockdep misses the fact that a
LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
cause deadlock too, for example:

	P1                          P2
	<irq disabled>
	write_lock(l1);             <irq enabled>
				    read_lock(l2);
	write_lock(l2);
				    <in irq>
				    read_lock(l1);

Actually, all of the following cases may cause deadlocks:

	LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
	LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
	LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
	LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ

To fix this, we need to 1) change the calculation of exclusive_mask() so
that READ bits are not dropped and 2) always call usage() in
mark_lock_irq() to check usage deadlocks, even when the new usage of the
lock is READ.

Besides, adjust usage_match() and usage_acculumate() to recursive read
lock changes.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-12-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
68e3056785 lockdep: Adjust check_redundant() for recursive read change
check_redundant() will report redundancy if it finds a path could
replace the about-to-add dependency in the BFS search. With recursive
read lock changes, we certainly need to change the match function for
the check_redundant(), because the path needs to match not only the lock
class but also the dependency kinds. For example, if the about-to-add
dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)->
.. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can
also say we find an -(SR)-> path from A to B), we can not replace the
dependency with that path in the BFS search. Because the -(SN)->
dependency can make a strong path with a following -(S*)-> dependency,
however an -(SR)-> path cannot.

Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that
means if we find a path which is stronger than or equal to the
about-to-add dependency, we can report the redundancy. By "stronger", it
means both the start and the end of the path are not weaker than the
start and the end of the dependency (E is "stronger" than S and N is
"stronger" than R), so that we can replace the dependency with that
path.

To make sure we find a path whose start point is not weaker than the
about-to-add dependency, we use a trick: the ->only_xr of the root
(start point) of __bfs() is initialized as @prev-> == 0, therefore if
@prev is E, __bfs() will pick only -(E*)-> for the first dependency,
otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency.

To make sure we find a path whose end point is not weaker than the
about-to-add dependency, we replace the match function for __bfs()
check_redundant(), we check for the case that either @next is R
(anything is not weaker than it) or the end point of the path is N
(which is not weaker than anything).

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-11-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
9de0c9bbce lockdep: Support deadlock detection for recursive read locks in check_noncircular()
Currently, lockdep only has limit support for deadlock detection for
recursive read locks.

This patch support deadlock detection for recursive read locks. The
basic idea is:

We are about to add dependency B -> A in to the dependency graph, we use
check_noncircular() to find whether we have a strong dependency path
A -> .. -> B so that we have a strong dependency circle (a closed strong
dependency path):

	 A -> .. -> B -> A

, which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->.

Since A -> .. -> B is already a strong dependency path, so if either
B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B ->
A is strong, otherwise not. So we introduce a new match function
hlock_conflict() to replace the class_equal() for the deadlock check in
check_noncircular().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-10-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
61775ed243 lockdep: Make __bfs(.match) return bool
The "match" parameter of __bfs() is used for checking whether we hit a
match in the search, therefore it should return a boolean value rather
than an integer for better readability.

This patch then changes the return type of the function parameter and the
match functions to bool.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-9-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00