Commit Graph

1150025 Commits

Author SHA1 Message Date
Paul Lawrence
6aef06abba ANDROID: fuse-bpf: Check inode not null
fuse_iget_backing returns an inode or null, not a ERR_PTR. So check it's
not NULL

Also make sure we put the inode if d_splice_alias fails

Bug: 293349757
Test: fuse_test runs
Signed_off_by: Paul Lawrence <paullawrence@google.com>

Change-Id: I1eadad32f80bab6730e461412b4b7ab4d6c56bf2
2023-07-31 23:09:25 +00:00
Paul Lawrence
4bbda90bd8 ANDROID: fuse-bpf: Fix flock test compile error
Bug: 293161755
Test: fuse_test compiles
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I249672bab85966e20a26018f65f135fe15c6eff5
2023-07-31 23:09:25 +00:00
Daniel Rosenberg
84ac22a0d3 ANDROID: fuse-bpf: Add partial ioctl support
This adds passthrough only support for ioctls with fuse-bpf.
compat_ioctls will return -ENOTTY.

Bug: 279519292
Test: F2fsMiscTest#testAtomicWrite
Change-Id: Ia3052e465d87dc1d15ae13955fba8a7f93bc387b
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2023-07-31 23:09:25 +00:00
xieliujie
e341d2312c ANDROID: ABI: Update oplus symbol list
3 function symbol(s) added
  'int __traceiter_android_rvh_rtmutex_force_update(void*, struct task_struct*, struct task_struct*, int*)'
  'int __traceiter_android_vh_rtmutex_waiter_prio(void*, struct task_struct*, int*)'
  'int __traceiter_android_vh_task_blocks_on_rtmutex(void*, struct rt_mutex_base*, struct rt_mutex_waiter*, struct task_struct*, struct ww_acquire_ctx*, unsigned int*)'

3 variable symbol(s) added
  'struct tracepoint __tracepoint_android_rvh_rtmutex_force_update'
  'struct tracepoint __tracepoint_android_vh_rtmutex_waiter_prio'
  'struct tracepoint __tracepoint_android_vh_task_blocks_on_rtmutex'

Bug: 290585456
Change-Id: I4af3d1c8df44822b7f5fd5d5682e65d7c6c4dcc3
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-07-31 22:47:04 +00:00
Jann Horn
f5c707dc65 UPSTREAM: mm/mempolicy: Take VMA lock before replacing policy
mbind() calls down into vma_replace_policy() without taking the per-VMA
locks, replaces the VMA's vma->vm_policy pointer, and frees the old
policy.  That's bad; a concurrent page fault might still be using the
old policy (in vma_alloc_folio()), resulting in use-after-free.

Normally this will manifest as a use-after-free read first, but it can
result in memory corruption, including because vma_alloc_folio() can
call mpol_cond_put() on the freed policy, which conditionally changes
the policy's refcount member.

This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA
systems as long as the kernel was built with CONFIG_NUMA.

Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 293665307
(cherry picked from commit 6c21e066f9256ea1df6f88768f6ae1080b7cf509)
Change-Id: I2e3a4ee8bad97457ee3e127694f0609e7a240a2f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-29 07:25:37 +00:00
Jann Horn
890b1aabb1 BACKPORT: mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.

However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.

This means we can get UAF in the following scenario:

  THREAD 1                   THREAD 2
  ========                   ========
  <page fault>
    lock_vma_under_rcu()
      rcu_read_lock()
      mas_walk()
      check vma->anon_vma

                             mremap() syscall
                               move_vma()
                                vma_start_write()
                                 unlink_anon_vmas()
                             <syscall end>

    handle_mm_fault()
      __handle_mm_fault()
        handle_pte_fault()
          do_pte_missing()
            do_anonymous_page()
              anon_vma_prepare()
                __anon_vma_prepare()
                  find_mergeable_anon_vma()
                    mas_walk() [looks up VMA X]

                             munmap() syscall (deletes VMA X)

                    reusable_anon_vma() [called on freed VMA X]

This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.

This patch is based on some previous discussion with Linus Torvalds on
the security list.

Cc: stable@vger.kernel.org
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 293665307
(cherry picked from commit 657b5146955eba331e01b9a6ae89ce2e716ba306)
[surenb: removed vma_is_tcp() call not present in 6.1]
Change-Id: I4bd91e1db337ff35eb7c1d436f4372944556dd7d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-29 06:57:25 +00:00
Lorenzo Pieralisi
d3b37a712a BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627
GIC700 erratum 2941627 may cause GIC-700 missing SPIs wake
requests when SPIs are deactivated while targeting a
sleeping CPU - ie a CPU for which the redistributor:

GICR_WAKER.ProcessorSleep == 1

This runtime situation can happen if an SPI that has been
activated on a core is retargeted to a different core, it
becomes pending and the target core subsequently enters a
power state quiescing the respective redistributor.

When this situation is hit, the de-activation carried out
on the core that activated the SPI (through either ICC_EOIR1_EL1
or ICC_DIR_EL1 register writes) does not trigger a wake
requests for the sleeping GIC redistributor even if the SPI
is pending.

Work around the erratum by de-activating the SPI using the
redistributor GICD_ICACTIVER register if the runtime
conditions require it (ie the IRQ was retargeted between
activation and de-activation).

Bug: 292459437
Change-Id: Ide915b8c925a631a7fc9ccebca19d9175def162e
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704155034.148262-1-lpieralisi@kernel.org
(cherry picked from commit 6fe5c68ee6a1aae0ef291a56001e7888de547fa2 https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/irqchip-fixes)
Signed-off-by: Carlos Galo <carlosgalo@google.com>
2023-07-27 19:40:08 +00:00
wangshuai12
a89e2cbbc0 ANDROID: GKI: update xiaomi symbol list
1 function symbol(s) added
  'int __blk_mq_debugfs_rq_show(struct seq_file*, struct request*)'

Bug: 290730657
Change-Id: Ib3711e9e875e3d6ccc809a87c607fae149159a58
Signed-off-by: wangshuai12 <wangshuai12@xiaomi.corp-partner.google.com>
2023-07-27 15:11:16 +00:00
Hugh Dickins
371f8d901a UPSTREAM: mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about

  (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
                 but task is already holding lock:
  (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db

Invert those to the usual ordering.

Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1c7873e3364570ec89343ff4877e0f27a7b21a61)
Change-Id: I85f9cfb6ee8f3d9fefda5518c5637a7dff64bac3
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Suren Baghdasaryan
0d9960403c UPSTREAM: fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().

We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable.  For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.

A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.

Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults.  But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.

This fix can potentially regress some fork-heavy workloads.  Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fb49c455323ff8319a123dd312be9082c49a23a5)
Change-Id: Ic5aa9dc51a888b5b0319ec4ec6d2941424573ca0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Suren Baghdasaryan
e3601b25ae UPSTREAM: mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock.  This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified.  Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.

Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.

Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 33313a747e81af9f31d0d45de78c9397fa3655eb)
Change-Id: I3bb6a7bc8dd579e11f9c18cbc8e4a6e7279bbfb2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Suren Baghdasaryan
05f7c7fe72 UPSTREAM: mm: lock a vma before stack expansion
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.

Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c137381f71aec755fbf47cd4e9bd4dce752c054c)
Change-Id: I3e6a8c89c1fb7c0669e1232176bb04ea6b09bc0a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Greg Kroah-Hartman
c0ba567af1 ANDROID: GKI: bring back find_extend_vma()
In commit 8d7071af8907 ("mm: always expand the stack with the mmap write
lock held"), find_extend_vma() was no longer being used in the tree, so
it was removed.  Unfortunately some GKI external module is using this,
so bring it back to allow things to continue to work.

Bug: 161946584
Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held")
Change-Id: I6f1fb1fd8193625fe3dac0bbc5b0aff653b3d879
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 11:47:21 +00:00
Linus Torvalds
188ce9572f BACKPORT: mm: always expand the stack with the mmap write lock held
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream

This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.

For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks.  Let's see if any
strange users really wanted that.

It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma.  This makes it fairly
straightforward to convert the remaining architectures.

As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid.  So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[6.1: Patch drivers/iommu/io-pgfault.c instead]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: change in io-pgfault.c was done in iommu-sva.c]
Change-Id: Icdcdded08d7ad4eda8fae1120a3c8b3d957516c1
(cherry picked from commit 8d7071af890768438c14db6172cc8f9f4d04e184)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 11:47:21 +00:00
Linus Torvalds
74efdc0966 BACKPORT: execve: expand new process stack manually ahead of time
commit f313c51d26aa87e69633c9b46efb37a930faca71 upstream.

This is a small step towards a model where GUP itself would not expand
the stack, and any user that needs GUP to not look up existing mappings,
but actually expand on them, would have to do so manually before-hand,
and with the mm lock held for writing.

It turns out that execve() already did almost exactly that, except it
didn't take the mm lock at all (it's single-threaded so no locking
technically needed, but it could cause lockdep errors).  And it only did
it for the CONFIG_STACK_GROWSUP case, since in that case GUP has
obviously never expanded the stack downwards.

So just make that CONFIG_STACK_GROWSUP case do the right thing with
locking, and enable it generally.  This will eventually help GUP, and in
the meantime avoids a special case and the lockdep issue.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[6.1 Minor context from still having FOLL_FORCE flags set]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I24c652740dcfc674b0aef8e09ef72f09ad61254c
(cherry picked from commit f313c51d26aa87e69633c9b46efb37a930faca71)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 11:47:21 +00:00
jianzhou
c8ad906849 ANDROID: abi_gki_aarch64_qcom: ufshcd_mcq_poll_cqe_lock
Symbols added:
   ufshcd_mcq_poll_cqe_lock

Bug: 292490611
Change-Id: I0e26f360c56d302f9f980c9d43b7a3cc80d3a616
Signed-off-by: jianzhou <quic_jianzhou@quicinc.com>
2023-07-27 10:40:33 +00:00
Liam R. Howlett
1afccd4255 UPSTREAM: mm: make find_extend_vma() fail if write lock not held
commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream.

Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.

To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked".  That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.

Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: If12d2d68429b6d71393f02d5ed7e6939c3cd5405
(cherry picked from commit f440fa1ac955e2898893f9301568435eb5cdfc4b)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:05:44 +00:00
Linus Torvalds
4087cac574 UPSTREAM: powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
commit 2cd76c50d0b41cec5c87abfcdf25b236a2793fb6 upstream.

This is one of the simple cases, except there's no pt_regs pointer.
Which is fine, as lock_mm_and_find_vma() is set up to work fine with a
NULL pt_regs.

Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting,
so we can just use the helper without any extra work.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I5736f498b2f45625e46554520d3aeb679e680907
(cherry picked from commit 2cd76c50d0b41cec5c87abfcdf25b236a2793fb6)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:05:22 +00:00
Linus Torvalds
6c33246824 UPSTREAM: mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
commit a050ba1e7422f2cc60ff8bfde3f96d34d00cb585 upstream.

This does the simple pattern conversion of alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma()
helper.  They all have the regular fault handling pattern without odd
special cases.

The remaining architectures all have something that keeps us from a
straightforward conversion: ia64 and parisc have stacks that can grow
both up as well as down (and ia64 has special address region checks).

And m68k, microblaze, openrisc, sparc64, and um end up having extra
rules about only expanding the stack down a limited amount below the
user space stack pointer.  That is something that x86 used to do too
(long long ago), and it probably could just be skipped, but it still
makes the conversion less than trivial.

Note that this conversion was done manually and with the exception of
alpha without any build testing, because I have a fairly limited cross-
building environment.  The cases are all simple, and I went through the
changes several times, but...

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I93e4ce3cb077329e202699a16db576be3a40285b
(cherry picked from commit a050ba1e7422f2cc60ff8bfde3f96d34d00cb585)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:04:57 +00:00
Ben Hutchings
add0a1ea04 UPSTREAM: arm/mm: Convert to using lock_mm_and_find_vma()
commit 8b35ca3e45e35a26a21427f35d4093606e93ad0a upstream.

arm has an additional check for address < FIRST_USER_ADDRESS before
expanding the stack.  Since FIRST_USER_ADDRESS is defined everywhere
(generally as 0), move that check to the generic expand_downwards().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie1090f587090ef16de4bce224bbc52334bfe78fa
(cherry picked from commit 8b35ca3e45e35a26a21427f35d4093606e93ad0a)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:04:33 +00:00
Ben Hutchings
9f136450af UPSTREAM: riscv/mm: Convert to using lock_mm_and_find_vma()
commit 7267ef7b0b77f4ed23b7b3c87d8eca7bd9c2d007 upstream.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[6.1: Kconfig context]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I601c5e4625e0357be7043026359aa85e5a63ade1
(cherry picked from commit 7267ef7b0b77f4ed23b7b3c87d8eca7bd9c2d007)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:03:27 +00:00
Ben Hutchings
053053fc68 UPSTREAM: mips/mm: Convert to using lock_mm_and_find_vma()
commit 4bce37a68ff884e821a02a731897a8119e0c37b7 upstream.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie1ec8bd98c52086790adcd691370a76d135a333e
(cherry picked from commit 4bce37a68ff884e821a02a731897a8119e0c37b7)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:03:08 +00:00
Michael Ellerman
9cdce804c0 UPSTREAM: powerpc/mm: Convert to using lock_mm_and_find_vma()
commit e6fe228c4ffafdfc970cf6d46883a1f481baf7ea upstream.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ifeaee70ad1bdb9e583aaba137526cc49e2ecf8be
(cherry picked from commit e6fe228c4ffafdfc970cf6d46883a1f481baf7ea)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:02:41 +00:00
SeongJae Park
1016faf509 BACKPORT: arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()
commit 24be4d0b46bb0c3c1dc7bacd30957d6144a70dfc upstream.

Commit ae870a68b5d1 ("arm64/mm: Convert to using
lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if
CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the
ifdef.

As a result, building kernel without the config fails with undeclared
variable error as below:

    arch/arm64/mm/fault.c: In function 'do_page_fault':
    arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'?
      624 |  vma = lock_mm_and_find_vma(mm, addr, regs);
          |  ^~~
          |  vmap

Fix it by moving the declaration out of the ifdef.

Fixes: ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: Iba3153aa67f2dab347e4bc04a09c566b47cf4f63
(cherry picked from commit 24be4d0b46bb0c3c1dc7bacd30957d6144a70dfc)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:01:53 +00:00
Linus Torvalds
89298b8b3c BACKPORT: arm64/mm: Convert to using lock_mm_and_find_vma()
commit ae870a68b5d13d67cf4f18d47bb01ee3fee40acb upstream.

This converts arm64 to use the new page fault helper.  It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did.  Thanks to Suren for the fix and testing.

Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com>
Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: Ibda94ca9b3893b8961e1d6536c854c0aee559a6b
(cherry picked from commit ae870a68b5d13d67cf4f18d47bb01ee3fee40acb)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:01:26 +00:00
Linus Torvalds
cf70cb4f1f UPSTREAM: mm: make the page fault mmap locking killable
commit eda0047296a16d65a7f2bc60a408f70d178b2014 upstream.

This is done as a separate patch from introducing the new
lock_mm_and_find_vma() helper, because while it's an obvious change,
it's not what x86 used to do in this area.

We already abort the page fault on fatal signals anyway, so why should
we wait for the mmap lock only to then abort later? With the new helper
function that returns without the lock held on failure anyway, this is
particularly easy and straightforward.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I9730b4543265a20253cbfc02de135cc77927f821
(cherry picked from commit eda0047296a16d65a7f2bc60a408f70d178b2014)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 05:01:22 +00:00
xieliujie
544ae28cf6 ANDROID: Inherit "user-aware property" across rtmutex.
Since upstream commit 715f7f9ece ("locking/rtmutex: Squash !RT
tasks to DEFAULT_PRIO"), non-rt tasks do not inherit the
nice-priority values across rt_mutexes. This removes the minor
(and indirect) priority-inheritance that rt-mutexes provided for
CFS tasks.

Though without priority inheritance, time-bounded priority
inversion can occur between CFS tasks of different nice
priorities / cgroup limitations.  The proxy-execution efforts
are a work-in-progress to resolve this upstream, but in the
meantime it is left to vendor hooks to provide a near term
solution to avoid priority inversion between CFS tasks.

In our oem scheduler, if a  CFS thread has an "user-aware
property", we will always pick it even if it's vruntime is
bigger than the smallest one in runqueue. That's why the
trace_android_rvh_replace_next_task_fair vendorhook was added
previously in commit 53e809978443 ("ANDROID: vendor_hooks: Add
hooks for scheduler").

Thus for our oem scheduler, important CFS tasks(like
RenderThread) are marked with the "user-aware property" in their
struct task_struct. If those tasks are blocked on an rtmutex, we
want to allow the "user-aware property" to be inherited to lock
owner, so it will be selected to run immediately to release the
lock.

To support this, we need new hooks to map "user-aware property"
into different rtmutex_waiter prio and update the owner's
"user-aware property" if needed. Thus these additional vendor
hooks are needed.

In the future, once an generalized upstream solution for CFS
priority inheritance is in place, this will no longer be needed.

Bug: 290585456
Change-Id: I6521ed2086b147400a54da6b84a324baf16bc649
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-07-27 00:04:07 +00:00
Eric Biggers
5e4a5dc820 BACKPORT: blk-crypto: use dynamic lock class for blk_crypto_profile::lock
When a device-mapper device is passing through the inline encryption
support of an underlying device, calls to blk_crypto_evict_key() take
the blk_crypto_profile::lock of the device-mapper device, then take the
blk_crypto_profile::lock of the underlying device (nested).  This isn't
a real deadlock, but it causes a lockdep report because there is only
one lock class for all instances of this lock.

Lockdep subclasses don't really work here because the hierarchy of block
devices is dynamic and could have more than 2 levels.

Instead, register a dynamic lock class for each blk_crypto_profile, and
associate that with the lock.

This avoids false-positive lockdep reports like the following:

    ============================================
    WARNING: possible recursive locking detected
    6.4.0-rc5 #2 Not tainted
    --------------------------------------------
    fscryptctl/1421 is trying to acquire lock:
    ffffff80829ca418 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0x44/0x1c0

                   but task is already holding lock:
    ffffff8086b68ca8 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0xc8/0x1c0

                   other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&profile->lock);
      lock(&profile->lock);

                    *** DEADLOCK ***

     May be due to missing lock nesting notation

Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230610061139.212085-1-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Bug: 286427075
(cherry picked from commit 2fb48d88e77f29bf9d278f25bcfe82cf59a0e09b)
(added '#ifdef CONFIG_LOCKDEP' to keep the KMI tooling happy)
Change-Id: I21c0f941a36663c956a5c89324813bbaac0633ef
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-07-27 00:02:13 +00:00
Kyongho Cho
db2c29e53d ANDROID: ABI: update symbol list for Xclipse GPU
1 function symbol(s) added
  'void ttm_tt_unpopulate(struct ttm_device*, struct ttm_tt*)'

Bug: 291101811
Change-Id: I0be29227b37734304f00fc7b8e2612a0fa6c3fff
Signed-off-by: Kyongho Cho <pullip.cho@samsung.com>
2023-07-26 21:11:54 +00:00
Kyongho Cho
7edb035c79 ANDROID: drm/ttm: export ttm_tt_unpopulate()
Xclipse GPU driver depends on TTM for graphics buffer allocation and
management. It is required by customers to add graphics memory swap
to improve overall memory efficiency. However TTM's swap feature can't
be used since it selects victim buffer by LRU and we can't choose a
specific buffer to swap.
Xclipse GPU driver implements its own swap feature by means of APIs of
TTM. But the problem is TTM's buffer allocations statistics in ttm_tt.c
which are local to that file. Whenever a graphic buffer is swapped out,
the size of total page allocation should be decreased but it is not
possible from the outside of ttm_tt.c.
If the statistics is not maintained well, TTM ends up swapping out TTM
buffers globally which is unexpected.

Bug: 291101811
Change-Id: I143c705834bcc196432c3ef59b49c9ec31f2e971
Signed-off-by: Kyongho Cho <pullip.cho@samsung.com>
2023-07-26 21:11:54 +00:00
lambert wang
b61f298c0d ANDROID: GKI: Add ABI symbol list(devlink) for MTK
17 function symbol(s) added
  'bool device_remove_file_self(struct device*, const struct device_attribute*)'
  'struct devlink* devlink_alloc_ns(const struct devlink_ops*, size_t, struct net*, struct device*)'
  'void devlink_flash_update_status_notify(struct devlink*, const char*, const char*, unsigned long, unsigned long)'
  'int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg*)'
  'int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg*, const char*)'
  'int devlink_fmsg_binary_put(struct devlink_fmsg*, const void*, u16)'
  'void devlink_free(struct devlink*)'
  'int devlink_health_report(struct devlink_health_reporter*, const char*, void*)'
  'struct devlink_health_reporter* devlink_health_reporter_create(struct devlink*, const struct devlink_health_reporter_ops*, u64, void*)'
  'void devlink_health_reporter_destroy(struct devlink_health_reporter*)'
  'void* devlink_health_reporter_priv(struct devlink_health_reporter*)'
  'void devlink_health_reporter_state_update(struct devlink_health_reporter*, enum devlink_health_reporter_state)'
  'void* devlink_priv(struct devlink*)'
  'struct devlink_region* devlink_region_create(struct devlink*, const struct devlink_region_ops*, u32, u64)'
  'void devlink_region_destroy(struct devlink_region*)'
  'void devlink_register(struct devlink*)'
  'void devlink_unregister(struct devlink*)'

type 'struct devlink' changed
  was only declared, is now fully defined

type 'struct devlink_linecard' changed
  was only declared, is now fully defined

Bug: 283707518
Change-Id: I686fd14c13863c27b3dfdb29cd7c6b6d5a0a3127
Signed-off-by: lambert wang <lambert.wang@mediatek.com>
Signed-off-by: iven yang <iven.yang@mediatek.com>
Signed-off-by: michael cai <michael.cai@mediatek.com>
2023-07-26 20:55:37 +00:00
lambert wang
ec419af28f ANDROID: devlink: Select CONFIG_NET_DEVLINK in Kconfig.gki
Select hidden Kconfig: NET_DEVLINK.

Required by device drivers to provide unified interface to expose
device info, capture coredump and perform device flash.

Bug: 283707518

Change-Id: I1cc5b7dce36c79549cd7f1d9b755f7bab3973f0e
Signed-off-by: michael cai <michael.cai@mediatek.com>
Signed-off-by: lambert wang <lambert.wang@mediatek.com>
2023-07-26 20:55:37 +00:00
Vincent Donnefort
1e114e6efa ANDROID: KVM: arm64: Fix memory ordering for pKVM module callbacks
Registration of module callbacks for the pKVM hypervisor is lockless
thanks to the use of a cmpxchg.

Problem, a CPU can speculatively execute an indirect branch and
speculatively read variables used in that branch. We then need to order
the memory access between variables potentially set in the driver init
(before the callback registration happen) and the call to that
registered callback.

e.g. in the case of the serial.

 CPU0:                                   CPU1:

   driver_init():                        hyp_serial_enabled()
     base_addr = 0xdeadbeef;               enabled = __hyp_putc
     barrier();                            barrier();
     ops->register_serial_driver(putc);    if (enabled)
                                                __hyp_putc(); /* read base_addr */

This is the same for the SMC and PSCI handler callbacks. The abort and
fault callbacks are not impacted: the driver init can only happen before
the kernel is deprivileged i.e. before the host stage-2 is in place and
then before any of those callbacks can be triggered.

Instead of a full barrier, we can use the acquire/release semantics:
relaxing cmpxchg to cmpxchg_release in the registration path and use a
load_acquire in hyp_serial_enabled().

Bug: 292470326
Change-Id: I4b5fe3713fe40cc5ab42ea0e9cdf54e8315dfb44
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
2023-07-26 15:00:04 +00:00
Linus Torvalds
3803ae4a28 BACKPORT: mm: introduce new 'lock_mm_and_find_vma()' page fault helper
commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream.

.. and make x86 use it.

This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.

We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.

That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.

So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.

Note: I say "common helper", but it really only handles the normal
stack-grows-down case.  Which is all architectures except for PA-RISC
and IA64.  So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.

It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.

But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: I6e16e6751245ac24adcbe78114bc57c726463acb
(cherry-picked from commit d6a5c7a1a6)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:34 +00:00
Peng Zhang
66b5ad3507 BACKPORT: maple_tree: fix potential out-of-bounds access in mas_wr_end_piv()
commit cd00dd2585c4158e81fdfac0bbcc0446afbad26d upstream.

Check the write offset end bounds before using it as the offset into the
pivot array.  This avoids a possible out-of-bounds access on the pivot
array if the write extends to the last slot in the node, in which case the
node maximum should be used as the end pivot.

akpm: this doesn't affect any current callers, but new users of mapletree
may encounter this problem if backported into earlier kernels, so let's
fix it in -stable kernels in case of this.

Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I992549af25fa9c22f587893d004002d2e004d317
(cherry-picked from commit 4e2ad53aba)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:29 +00:00
Thomas Gleixner
19dd4101e0 UPSTREAM: x86/smp: Cure kexec() vs. mwait_play_dead() breakage
commit d7893093a7417527c0d73c9832244e65c9d0114f upstream.

TLDR: It's a mess.

When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.

The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.

Cure this by bringing the offlined CPUs out of MWAIT into HLT.

Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.

That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.

A follow up change will put them into INIT, which protects at least against
NMI and SMI.

Fixes: ea53069231 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I80035e671b55732ac3d56c71dc53364e82238fe2
(cherry-picked from commit 0af4750eaa)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:29 +00:00
Thomas Gleixner
26260c4bd1 UPSTREAM: x86/smp: Use dedicated cache-line for mwait_play_dead()
commit f9c9987bf52f4e42e940ae217333ebb5a4c3b506 upstream.

Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.

But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().

This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.

Use a dedicated per CPU storage to prepare for that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I7cbfcec2d4e1bde18a9c45a7ccb7897ccaad7bd3
(cherry-picked from commit 6d3b2e0aef)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:27 +00:00
Thomas Gleixner
d8cb0365cb UPSTREAM: x86/smp: Remove pointless wmb()s from native_stop_other_cpus()
commit 2affa6d6db28855e6340b060b809c23477aa546e upstream.

The wmb()s before sending the IPIs are not synchronizing anything.

If at all then the apic IPI functions have to provide or act as appropriate
barriers.

Remove these cargo cult barriers which have no explanation of what they are
synchronizing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.378358382@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I7541e4c7c65f9bed9b1f28d6c858473986dd50b4
(cherry-picked from commit 50a1abc677)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:22 +00:00
Tony Battersby
6744547e95 UPSTREAM: x86/smp: Dont access non-existing CPUID leaf
commit 9b040453d4440659f33dc6f0aa26af418ebfe70b upstream.

stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel
CPUs return the content of the highest supported leaf when a non-existing
leaf is read, while AMD CPUs return all zeros for unsupported leafs.

So the result of the test on Intel CPUs is lottery.

While harmless it's incorrect and causes the conditional wbinvd() to be
issued where not required.

Check whether the leaf is supported before reading it.

[ tglx: Adjusted changelog ]

Fixes: 08f253ec37 ("x86/cpu: Clear SME feature flag when not in use")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
Link: https://lore.kernel.org/r/20230615193330.322186388@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Idc8aa8137c9044642f02ec157d18d035359f88ea
(cherry-picked from commit e47037d28b)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:22 +00:00
Thomas Gleixner
ba2ccba863 UPSTREAM: x86/smp: Make stop_other_cpus() more robust
commit 1f5e7eb7868e42227ac426c96d437117e6e06e8e upstream.

Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.

That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.

That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:

CPU0					CPU1

 stop_other_cpus()
   send_IPIs(REBOOT);			stop_this_cpu()
   while (num_online_cpus() > 1);         set_online(false);
   proceed... -> hang
				          wbinvd()

WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.

But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.

This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.

Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().

The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.

Fixes: 08f253ec37 ("x86/cpu: Clear SME feature flag when not in use")
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I7154624285f081ac2f54617fb7b9f9cdd6b4f2e0
(cherry-picked from commit edadebb349)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:22 +00:00
Borislav Petkov (AMD)
5c9836e66d UPSTREAM: x86/microcode/AMD: Load late on both threads too
commit a32b0f0db3f396f1c9be2fe621e77c09ec3d8e7d upstream.

Do the same as early loading - load on both threads.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230605141332.25948-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I857794a1b78974200aad02098a31c41576aed562
(cherry-picked from commit 94a69d6999)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-26 09:57:22 +00:00
Tony Luck
53048f151c BACKPORT: mm, hwpoison: when copy-on-write hits poison, take page offline
commit d302c2398ba269e788a4f37ae57c07a7fcabaa42 upstream.

Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.

It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.

Use memory_failure_queue() to request a call to memory_failure() for the
page with the error.

Also provide a stub version for CONFIG_MEMORY_FAILURE=n

Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Due to missing commits
  e591ef7d96d6e ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage")
  5033091de814a ("mm/hwpoison: introduce per-memory_block hwpoison counter")
  The impact of e591ef7d96d6e is its introduction of an additional flag in
  __get_huge_page_for_hwpoison() that serves as an indication a hwpoisoned
  hugetlb page should have its migratable bit cleared.
  The impact of 5033091de814a is contexual.
  Resolve by ignoring both missing commits. - jane]
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ica2c1970fe3cdfa9dc7d3f288e1e6a90378a9764
(cherry-picked from commit 84f077802e)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-26 09:56:58 +00:00
Tony Luck
a2dff37b0c UPSTREAM: mm, hwpoison: try to recover from copy-on write faults
commit a873dfe1032a132bf89f9e19a6ac44f5a0b78754 upstream.

Patch series "Copy-on-write poison recovery", v3.

Part 1 deals with the process that triggered the copy on write fault with
a store to a shared read-only page.  That process is send a SIGBUS with
the usual machine check decoration to specify the virtual address of the
lost page, together with the scope.

Part 2 sets up to asynchronously take the page with the uncorrected error
offline to prevent additional machine check faults.  H/t to Miaohe Lin
<linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com> for
pointing me to the existing function to queue a call to memory_failure().

On x86 there is some duplicate reporting (because the error is also
signalled by the memory controller as well as by the core that triggered
the machine check).  Console logs look like this:

This patch (of 2):

If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.

It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.

I wrapped that neatly into a test at:

  git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

just enable ACPI error injection and run:

  # ./einj_mem-uc -f copy-on-write

Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.

Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?

On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.

[tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin]
  Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Igned-off-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I7c35cd47de59611fcc0550b0a7fd4e3911bbb110
(cherry-picked from commit 4af5960d7c)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-26 09:56:57 +00:00
David Woodhouse
466448f55f BACKPORT: mm/mmap: Fix error return in do_vmi_align_munmap()
commit 6c26bd4384da24841bac4f067741bbca18b0fb74 upstream,

If mas_store_gfp() in the gather loop failed, the 'error' variable that
ultimately gets returned was not being set. In many cases, its original
value of -ENOMEM was still in place, and that was fine. But if VMAs had
been split at the start or end of the range, then 'error' could be zero.

Change to the 'error = foo(); if (error) goto …' idiom to fix the bug.

Also clean up a later case which avoided the same bug by *explicitly*
setting error = -ENOMEM right before calling the function that might
return -ENOMEM.

In a final cosmetic change, move the 'Point of no return' comment to
*after* the goto. That's been in the wrong place since the preallocation
was removed, and this new error path was added.

Fixes: 606c812eb1d5 ("mm/mmap: Fix error path in do_vmi_align_munmap()")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 42a018a796)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5da7b1e126968e174e733d45ff24439089de60af
2023-07-26 09:56:53 +00:00
Liam R. Howlett
41b30362e9 BACKPORT: mm/mmap: Fix error path in do_vmi_align_munmap()
commit 606c812eb1d5b5fb0dd9e330ca94b52d7c227830 upstream

The error unrolling was leaving the VMAs detached in many cases and
leaving the locked_vm statistic altered, and skipping the unrolling
entirely in the case of the vma tree write failing.

Fix the error path by re-attaching the detached VMAs and adding the
necessary goto for the failed vma tree write, and fix the locked_vm
statistic by only updating after the vma tree write succeeds.

Fixes: 763ecb0350 ("mm: remove the vma linked list")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ dwmw2: Strictly, the original patch wasn't *re-attaching* the
         detached VMAs. They *were* still attached but just had
         the 'detached' flag set, which is an optimisation. Which
         doesn't exist in 6.3, so drop that. Also drop the call
         to vma_start_write() which came in with the per-VMA
         locking in 6.4. ]
[ dwmw2 (6.1): It's do_mas_align_munmap() here. And has two call
         sites for the now-removed munmap_sidetree() function.
         Inline them both rather then trying to backport various
         dependencies with potentially subtle interactions. ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: added needed vma_start_write and vma_vma_mark_detached calls]
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I1e42347ecf9eb46077739a267ac00264f94fa59a
2023-07-26 09:56:51 +00:00
Mike Hommey
d45a054f9c UPSTREAM: HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651.
commit 5fe251112646d8626818ea90f7af325bab243efa upstream.

commit 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if
not necessary") put restarting communication behind that flag, and this
was apparently necessary on the T651, but the flag was not set for it.

Fixes: 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if not necessary")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Hommey <mh@glandium.org>
Link: https://lore.kernel.org/r/20230617230957.6mx73th4blv7owqk@glandium.org
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a536383ef0)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic57d1d450ee4474cff51efca3d9b9607de6693d7
2023-07-26 09:56:44 +00:00
Ludvig Michaelsson
0e477a82e6 UPSTREAM: HID: hidraw: fix data race on device refcount
commit 944ee77dc6ec7b0afd8ec70ffc418b238c92f12b upstream.

The hidraw_open() function increments the hidraw device reference
counter. The counter has no dedicated synchronization mechanism,
resulting in a potential data race when concurrently opening a device.

The race is a regression introduced by commit 8590222e4b ("HID:
hidraw: Replace hidraw device table mutex with a rwsem"). While
minors_rwsem is intended to protect the hidraw_table itself, by instead
acquiring the lock for writing, the reference counter is also protected.
This is symmetrical to hidraw_release().

Link: https://github.com/systemd/systemd/issues/27947
Fixes: 8590222e4b ("HID: hidraw: Replace hidraw device table mutex with a rwsem")
Cc: stable@vger.kernel.org
Signed-off-by: Ludvig Michaelsson <ludvig.michaelsson@yubico.com>
Link: https://lore.kernel.org/r/20230621-hidraw-race-v1-1-a58e6ac69bab@yubico.com
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I312349145e8f2d55ea2182b94a3b3293b839818d
(cherry picked from commit 879e79c3ae)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-26 09:56:38 +00:00
Oliver Hartkopp
af2d741bf3 UPSTREAM: can: isotp: isotp_sendmsg(): fix return error fix on TX path
commit e38910c0072b541a91954682c8b074a93e57c09b upstream.

With commit d674a8f123 ("can: isotp: isotp_sendmsg(): fix return
error on FC timeout on TX path") the missing correct return value in
the case of a protocol error was introduced.

But the way the error value has been read and sent to the user space
does not follow the common scheme to clear the error after reading
which is provided by the sock_error() function. This leads to an error
report at the following write() attempt although everything should be
working.

Fixes: d674a8f123 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path")
Reported-by: Carsten Schmidt <carsten.schmidt-achim@t-online.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I6cb85ee1e6fdc609991c383e4f6fc71ea3c68c3a
(cherry picked from commit e38910c0072b541a91954682c8b074a93e57c09b)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-26 09:56:38 +00:00
Zhang Shurong
5887040491 UPSTREAM: fbdev: fix potential OOB read in fast_imageblit()
commit c2d22806aecb24e2de55c30a06e5d6eb297d161d upstream.

There is a potential OOB read at fast_imageblit, for
"colortab[(*src >> 4)]" can become a negative value due to
"const char *s = image->data, *src".
This change makes sure the index for colortab always positive
or zero.

Similar commit:
https://patchwork.kernel.org/patch/11746067

Potential bug report:
https://groups.google.com/g/syzkaller-bugs/c/9ubBXKeKXf4/m/k-QXy4UgAAAJ

Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I8ae18dbee926cc8dcf5bac4dec584071e7bdb739
(cherry picked from commit c2d22806aecb24e2de55c30a06e5d6eb297d161d)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-26 09:56:38 +00:00
Cixi Geng
6c48edb9c9 ANDROID: GKI: add function symbols for unisoc
INFO: 10 function symbol(s) added
  'void drm_send_event_timestamp_locked(struct drm_device*, struct drm_pending_event*, ktime_t)'
  'int mipi_dsi_set_maximum_return_packet_size(struct mipi_dsi_device*, u16)'
  'int of_get_drm_display_mode(struct device_node*, struct drm_display_mode*, u32*, int)'
  'int regmap_get_reg_stride(struct regmap*)'
  'struct regulator_dev* regulator_register(struct device*, const struct regulator_desc*, const struct regulator_config*)'
  'struct snd_kcontrol* snd_ctl_find_id(struct snd_card*, struct snd_ctl_elem_id*)'
  'int snd_info_get_line(struct snd_info_buffer*, char*, int)'
  'unsigned int snd_pcm_rate_bit_to_rate(unsigned int)'
  'unsigned int snd_pcm_rate_to_rate_bit(unsigned int)'
  'void tty_port_link_device(struct tty_port*, struct tty_driver*, unsigned int)'

Bug: 292812341
Change-Id: Ibaed96732ac53f824d4d12fb6ecad7bd63fcea8f
Signed-off-by: Cixi Geng <cixi.geng1@unisoc.com>
2023-07-26 04:29:22 +00:00