Commit Graph

15821 Commits

Author SHA1 Message Date
Andrey Konovalov
7d8cffdedc FROMGIT: kasan: prefix global functions with kasan_
Patch series "kasan: HW_TAGS tests support and fixes", v4.

This patchset adds support for running KASAN-KUnit tests with the
hardware tag-based mode and also contains a few fixes.

This patch (of 15):

There's a number of internal KASAN functions that are used across multiple
source code files and therefore aren't marked as static inline.  To avoid
littering the kernel function names list with generic function names,
prefix all such KASAN functions with kasan_.

As a part of this change:

- Rename internal (un)poison_range() to kasan_(un)poison() (no _range)
  to avoid name collision with a public kasan_unpoison_range().

- Rename check_memory_region() to kasan_check_range(), as it's a more
  fitting name.

Link: https://lkml.kernel.org/r/cover.1610733117.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I719cc93483d4ba288a634dba80ee6b7f2809cd26
Link: https://lkml.kernel.org/r/13777aedf8d3ebbf35891136e1f2287e2f34aaba.1610733117.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit bbd022cfe987e0ab2637fd8383750b729b6c0330
 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 172318110
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: Idb4978fb29289123157057f0ee7477885f5d5c6f
2021-02-07 13:41:41 -08:00
Alexander Potapenko
94a6873e86 FROMGIT: kasan: use error_report_end tracepoint
Make it possible to trace KASAN error reporting.  A good usecase is
watching for trace events from the userspace to detect and process memory
corruption reports from the kernel.

Link: https://lkml.kernel.org/r/20210121131915.1331302-4-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 99bbc9fba9b0c8956b817dd21ecb510da8b676dc
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I60ca062373d13f08ed4ed05bee341a8c28619407
2021-02-05 09:20:54 -08:00
Alexander Potapenko
fc3ec019c8 FROMGIT: kfence: use error_report_end tracepoint
Make it possible to trace KFENCE error reporting.  A good usecase is
watching for trace events from the userspace to detect and process memory
corruption reports from the kernel.

Link: https://lkml.kernel.org/r/20210121131915.1331302-3-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 2eb9559621418b844b29d2b56d9962ac019f600b
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I991cd1eee48d107bcdf5e1c0d9194e6a386b9bdf
2021-02-05 09:20:54 -08:00
Marco Elver
2da503f43b FROMGIT: kfence: show access type in report
Show the access type in KFENCE reports by plumbing through read/write
information from the page fault handler.  Update the documentation and
test accordingly.

Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Jörn Engel <joern@purestorage.com>
Reviewed-by: Jörn Engel <joern@purestorage.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit e29117c1fbf30d27d5afe41cf34263e1fd8e4f04
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2e9bb224292cf92ac828232c51cd57024ac56d7d
2021-02-05 09:20:54 -08:00
Marco Elver
be3b5ae235 FROMGIT: kfence: fix typo in test
Fix a typo/accidental copy-paste that resulted in the obviously
incorrect 'GFP_KERNEL * 2' expression.

Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 161c8770c371ca3992565d8f1db1ad4a88a562e9
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I4c26617c64fd2e6d410e6e793bb0eebf8fc87e55
2021-02-05 09:20:54 -08:00
Marco Elver
d15b326fe3 FROMGIT: kfence: add test suite
Add KFENCE test suite, testing various error detection scenarios. Makes
use of KUnit for test organization. Since KFENCE's interface to obtain
error reports is via the console, the test verifies that KFENCE outputs
expected reports to the console.

Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit d6364119849bb0432e9a46e9699519ea9ff1bb77
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I733090d4109a795c078fe8090c46b19cdfe9413f
2021-02-05 09:20:54 -08:00
Alexander Potapenko
263969e007 FROMGIT: kfence, kasan: make KFENCE compatible with KASAN
Make KFENCE compatible with KASAN. Currently this helps test KFENCE
itself, where KASAN can catch potential corruptions to KFENCE state, or
other corruptions that may be a result of freepointer corruptions in the
main allocators.

Link: https://lkml.kernel.org/r/20201103175841.3495947-7-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 8ab944ae627dc9fb165bff68acc465751a0b8de2
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2f862c2e514e7fcff50a019048c8f0d22f46e6c4
2021-02-05 09:20:53 -08:00
Alexander Potapenko
2019e66b4e FROMGIT: mm, kfence: insert KFENCE hooks for SLUB
Inserts KFENCE hooks into the SLUB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 5aa4d4f541f500cd63f814ae39bea55024ed5c6b
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iacfc22088087cb920107a595f71b09e21d5f2f47
2021-02-05 09:20:53 -08:00
Alexander Potapenko
6ba57f3a0c BACKPORT: mm, kfence: insert KFENCE hooks for SLAB
Inserts KFENCE hooks into the SLAB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

[glider: resolved minor API change in mm/slab_common.c]
Bug: 177201466
(cherry picked from commit 840c0553e89413319d67971a321bcc07114da9b8
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iab2ba9c7b06b9a234d93ba892be639941861f8ab
2021-02-05 09:20:53 -08:00
Alexander Popov
1e95fcd132 FROMGIT: mm/slab: rerform init_on_free earlier
Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
go to the heap quarantine not being erased.

Lets move init_on_free clearing before calling kasan_slab_free().  In that
case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
behavior.

Link: https://lkml.kernel.org/r/20201210183729.1261524-1-alex.popov@linux.com
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 177201466
(cherry picked from commit a32d654db543843a5ffb248feaec1a909718addd
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2bf5e70c1524619526efd792bbdd959b813af1e4
2021-02-05 09:20:53 -08:00
Marco Elver
1ac855fd1f FROMGIT: kfence: use pt_regs to generate stack trace on faults
Instead of removing the fault handling portion of the stack trace based on
the fault handler's name, just use struct pt_regs directly.

Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it
through to kfence_report_error() for out-of-bounds, use-after-free, or
invalid access errors, where pt_regs is used to generate the stack trace.

If the kernel is a DEBUG_KERNEL, also show registers for more information.

Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 54a5abe9b5d542ee71836439cc662efe178c8211
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I3a60060b24f0efb4faee2e6c953973bc1263e8d1
2021-02-05 09:20:53 -08:00
Marco Elver
752081e03f FROMGIT: kfence: add missing copyright and description headers
Add missing copyright and description headers to KFENCE source files.

Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit b86d1ed1155ce1d2420057bfbdcc62b9fd53c1d6
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I95eb6756baaa0d8ca1dbc656708667372cff546d
2021-02-05 09:20:53 -08:00
Marco Elver
33ad66179a FROMGIT: kfence: add option to use KFENCE without static keys
For certain usecases, specifically where the sample interval is always
set to a very low value such as 1ms, it can make sense to use a dynamic
branch instead of static branches due to the overhead of toggling a
static branch.

Therefore, add a new Kconfig option to remove the static branches and
instead check kfence_allocation_gate if a KFENCE allocation should be
set up.

Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Jörn Engel <joern@purestorage.com>
Reviewed-by: Jörn Engel <joern@purestorage.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit c01761611b325c1e4ec7d3e236cc9db003cb82fd
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I68a112a8ff68fa24742b198e036f130a9757c27f
2021-02-05 09:20:52 -08:00
Marco Elver
97d9142889 FROMGIT: kfence: fix potential deadlock due to wake_up()
Lockdep reports that we may deadlock when calling wake_up() in
__kfence_alloc(), because we may already hold base->lock.  This can happen
if debug objects are enabled:

    ...
    __kfence_alloc+0xa0/0xbc0 mm/kfence/core.c:710
    kfence_alloc include/linux/kfence.h:108 [inline]
    ...
    kmem_cache_zalloc include/linux/slab.h:672 [inline]
    fill_pool+0x264/0x5c0 lib/debugobjects.c:171
    __debug_object_init+0x7a/0xd10 lib/debugobjects.c:560
    debug_object_init lib/debugobjects.c:615 [inline]
    debug_object_activate+0x32c/0x3e0 lib/debugobjects.c:701
    debug_timer_activate kernel/time/timer.c:727 [inline]
    __mod_timer+0x77d/0xe30 kernel/time/timer.c:1048
    ...

Therefore, switch to an open-coded wait loop.  The difference to before is
that the waiter wakes up and rechecks the condition after 1 jiffy;
however, given the infrequency of kfence allocations, the difference is
insignificant.

Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
Reported-by: syzbot+8983d6d4f7df556be565@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Hillf Danton <hdanton@sina.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit c5fb1ab1a3c6d0ee02d1054a10d51ffcac57aed5
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iee40e9f216afbc3fce8e43c0e2a4bc807fdddf39
2021-02-05 09:20:52 -08:00
Marco Elver
ad318a7b61 FROMGIT: kfence: avoid stalling work queue task without allocations
To toggle the allocation gates, we set up a delayed work that calls
toggle_allocation_gate().  Here we use wait_event() to await an allocation
and subsequently disable the static branch again.  However, if the kernel
has stopped doing allocations entirely, we'd wait indefinitely, and stall
the worker task.  This may also result in the appropriate warnings if
CONFIG_DETECT_HUNG_TASK=y.

Therefore, introduce a 1 second timeout and use wait_event_timeout().  If
the timeout is reached, the static branch is disabled and a new delayed
work is scheduled to try setting up an allocation at a later time.

Note that, this scenario is very unlikely during normal workloads once the
kernel has booted and user space tasks are running.  It can, however,
happen during early boot after KFENCE has been enabled, when e.g.  running
tests that do not result in any allocations.

Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 177201466
(cherry picked from commit 80d4693491f6f20de01437319b081fdda2079e67
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2332ff8144b8bce5c4574b01ea2863e0e71e6124
2021-02-05 09:20:52 -08:00
Alexander Potapenko
adb54c78ab BACKPORT: mm: add Kernel Electric-Fence infrastructure
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.  This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

This patch (of 9):

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:

  ---+-----------+-----------+-----------+-----------+-----------+---
     | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
     | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
     | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
     | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
     | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
     | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
  ---+-----------+-----------+-----------+-----------+-----------+---

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.

For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).

Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

[glider: resolved minor conflict in init/main.c]
Bug: 177201466
(cherry picked from commit 2a8dede73c3496bbd917644657f3735a4f508cb9
    https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I6b474675cc9732c31118df53fa06c3997f577218
2021-02-05 09:20:52 -08:00
Suren Baghdasaryan
9e4d84273c ANDROID: Fix sparse warning in __handle_speculative_fault caused by SPF
SPF patchset introduced a sparse warning caused by the mismatch in
__handle_speculative_fault function's return type. Fix the return
type.

Fixes: 1c53717440 ("FROMLIST: mm: provide speculative fault infrastructure")

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I27c75f2b3729fa5d4b610f4a1829c9beba29735c
2021-02-04 23:48:50 +00:00
Suren Baghdasaryan
aef918d19a ANDROID: mm, oom: Fix select_bad_process customization
Patch 'ANDROID: mm, oom: Avoid killing tasks with negative ADJ scores'
does not handle a special case when oom_evaluate_task is aborted and
sets oc->chosen to -1. Check for this condition to avoid invalid memory
access.

Bug: 179177151
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id9a3f1b824c6a81d157782b8cb18115b3c577a50
2021-02-04 23:28:45 +00:00
Vinayak Menon
cf397c6c26 ANDROID: mm: sync rss in speculative page fault path
The speculative page fault path does not sync the
rss in task_struct to mm_struct leading to large
variance in the RSS values observed by userspace
tools and also in the OOM task dump.

Change-Id: Id45f1b9b0a51a9afffbaf8e65f5ef747d409d0d7
Bug: 179217427
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-02-04 22:40:58 +00:00
Vinayak Menon
4f9d16a68d ANDROID: zram: allow zram to allocate CMA pages
Though zram pages are movable, they aren't allowed to enter
MIGRATE_CMA pageblocks. zram is not seen to pin pages for
long which can cause an issue. Moreover allowing zram to
pick CMA pages can be helpful in cases seen where zram order
0 alloc fails when there are lots of free cma pages, resulting
in kswapd or direct reclaim not making enough progress.

Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>

Bug: 158645321
Link: https://lore.kernel.org/linux-mm/4c77bb100706b714213ff840d827a48e40ac9177.1604282969.git.cgoldswo@codeaurora.org/
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I406f92a4175367caec38ef8b8eaca7020ae09917
2021-02-03 22:54:39 +00:00
Greg Kroah-Hartman
39564d70ad This is the 5.10.12 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAVV8MACgkQONu9yGCS
 aT5EyxAAkIKzfWDIgtxBBui9zNb4af0nik/4Fv+0ynvvMFIJ+9OEh8vrrzASze3E
 6w8E5c1TxP2iXiW0/NQqU2UWmdVzO85zAeGMjZGSgzn4AtbZrBd8FIk3g5aNzGEJ
 xuqlVm+VOmdQ30Lr+yIOE/xwGDhGy+4cCMBQqGdMWk3Bnsk2QHBzSzyLZOJiK8M1
 9qTyMvtUdIVDFw5rqWQgtfNkcCfk7dMfjmD1bFVSFiJCnJbHE2Yr8y2MscSeLZ1V
 csBmg6K/JgEZFJFVamFKfGkAKQp2nI6YIUm3K0oJhp9BYYECJaH0irnkrT5F8rU8
 RBvxW+9E+SOmrHoEo9RTfGDnvU0hOrZolmPmj71puT6vHzw/S2npoAanWX+nWD6j
 dVTT77TKaSovmqp7+Lt9djsb3E9WzKHlIBJIcgcy/uyMpsllmHt6GROYBIa5gFJk
 LZY6zFrG9l04RYICBuuD6XNcqP56H/WnhBB8us3X5ui5x/3fI+RFBhf/UOXzxUnB
 KcBzRLCUFugvPdKeXGmjn0FCrj1vpj1/cbqLbDvETq9nF8qp/sXjPHbDpvNHyBOR
 MpzFgWnNrg2pYlJHidxpj2gog8jvEEdtOHeVW16HpVsvwMClJVcgaBF3US5mT8Zy
 nNohKtYPx6XjdddDb41NZsWxPHizN7FGnFeJOTZpH0YjNpTNS6c=
 =etoA
 -----END PGP SIGNATURE-----

Merge 5.10.12 into android12-5.10

Changes in 5.10.12
	gpio: mvebu: fix pwm .get_state period calculation
	Revert "mm/slub: fix a memory leak in sysfs_slab_add()"
	futex: Ensure the correct return value from futex_lock_pi()
	futex: Replace pointless printk in fixup_owner()
	futex: Provide and use pi_state_update_owner()
	rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
	futex: Use pi_state_update_owner() in put_pi_state()
	futex: Simplify fixup_pi_state_owner()
	futex: Handle faults correctly for PI futexes
	HID: wacom: Correct NULL dereference on AES pen proximity
	HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices
	media: Revert "media: videobuf2: Fix length check for single plane dmabuf queueing"
	media: v4l2-subdev.h: BIT() is not available in userspace
	RDMA/vmw_pvrdma: Fix network_hdr_type reported in WC
	iwlwifi: dbg: Don't touch the tlv data
	kernel/io_uring: cancel io_uring before task works
	io_uring: inline io_uring_attempt_task_drop()
	io_uring: add warn_once for io_uring_flush()
	io_uring: stop SQPOLL submit on creator's death
	io_uring: fix null-deref in io_disable_sqo_submit
	io_uring: do sqo disable on install_fd error
	io_uring: fix false positive sqo warning on flush
	io_uring: fix uring_flush in exit_files() warning
	io_uring: fix skipping disabling sqo on exec
	io_uring: dont kill fasync under completion_lock
	io_uring: fix sleeping under spin in __io_clean_op
	objtool: Don't fail on missing symbol table
	mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
	mm: fix a race on nr_swap_pages
	tools: Factor HOSTCC, HOSTLD, HOSTAR definitions
	printk: fix buffer overflow potential for print_text()
	printk: fix string termination for record_print_text()
	Linux 5.10.12

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6d96ec78494ebbc0daf4fdecfc13e522c6bd6b42
2021-01-30 14:29:02 +01:00
Zhaoyang Huang
f472a59aa1 mm: fix a race on nr_swap_pages
commit b50da6e9f42ade19141f6cf8870bb2312b055aa3 upstream.

The scenario on which "Free swap = -4kB" happens in my system, which is caused
by several get_swap_pages racing with each other and show_swap_cache_info
happens simutaniously. No need to add a lock on get_swap_page_of_type as we
remove "Presub/PosAdd" here.

ProcessA			ProcessB			ProcessC
ngoals = 1			ngoals = 1
avail = nr_swap_pages(1)	avail = nr_swap_pages(1)
nr_swap_pages(1) -= ngoals
				nr_swap_pages(0) -= ngoals
								nr_swap_pages = -1

Link: https://lkml.kernel.org/r/1607050340-4535-1-git-send-email-zhaoyang.huang@unisoc.com
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:55:19 +01:00
Hailong liu
c11f7749f1 mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
commit ce8f86ee94fabcc98537ddccd7e82cfd360a4dc5 upstream.

The trace point *trace_mm_page_alloc_zone_locked()* in __rmqueue() does
not currently cover all branches.  Add the missing tracepoint and check
the page before do that.

[akpm@linux-foundation.org: use IS_ENABLED() to suppress warning]

Link: https://lkml.kernel.org/r/20201228132901.41523-1-carver4lio@163.com
Signed-off-by: Hailong liu <liu.hailong6@zte.com.cn>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:55:19 +01:00
Wang Hai
bf5eb7d21a Revert "mm/slub: fix a memory leak in sysfs_slab_add()"
commit 757fed1d0898b893d7daa84183947c70f27632f3 upstream.

This reverts commit dde3c6b72a.

syzbot report a double-free bug. The following case can cause this bug.

 - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails,
   it does:

	out_free_cache:
		kmem_cache_free(kmem_cache, s);

 - but __kmem_cache_create() - at least for slub() - will have done

	sysfs_slab_add(s)
		-> sysfs_create_group() .. fails ..
		-> kobject_del(&s->kobj); .. which frees s ...

We can't remove the kmem_cache_free() in create_cache(), because other
error cases of __kmem_cache_create() do not free this.

So, revert the commit dde3c6b72a ("mm/slub: fix a memory leak in
sysfs_slab_add()") to fix this.

Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com
Fixes: dde3c6b72a ("mm/slub: fix a memory leak in sysfs_slab_add()")
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:55:16 +01:00
Andrey Konovalov
385eb1fe10 UPSTREAM: kasan, mm: fix resetting page_alloc tags for HW_TAGS
[ Upstream commit acb35b177c71d3d39b9a3b9ea213d926235066e3 ]

A previous commit added resetting KASAN page tags to
kernel_init_free_pages() to avoid false-positives due to accesses to
metadata with the hardware tag-based mode.

That commit did reset page tags before the metadata access, but didn't
restore them after.  As the result, KASAN fails to detect bad accesses
to page_alloc allocations on some configurations.

Fix this by recovering the tag after the metadata access.

Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com
Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 172318110
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: I7bf87862c7524cb0a8178c584e238d6e3d84bac0
2021-01-29 20:03:57 +00:00
Andrey Konovalov
5c96cfe8b9 UPSTREAM: kasan, mm: fix conflicts with init_on_alloc/free
[ Upstream commit ce5716c618524241a3cea821e18ee1e0d16f6c70 ]

A few places where SLUB accesses object's data or metadata were missed
in a previous patch.  This leads to false positives with hardware
tag-based KASAN when bulk allocations are used with init_on_alloc/free.

Fix the false-positives by resetting pointer tags during these accesses.

(The kasan_reset_tag call is removed from slab_alloc_node, as it's added
 into maybe_wipe_obj_freeptr.)

Link: https://linux-review.googlesource.com/id/I50dd32838a666e173fe06c3c5c766f2c36aae901
Link: https://lkml.kernel.org/r/093428b5d2ca8b507f4a79f92f9929b35f7fada7.1610731872.git.andreyknvl@google.com
Fixes: aa1ef4d7b3f67 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 172318110
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: I48c4d12a10e92e3e6a4bcd1c3a72c5903bfdbc71
2021-01-29 20:03:50 +00:00
Andrey Konovalov
5ec78398b3 UPSTREAM: kasan: fix HW_TAGS boot parameters
[ Upstream commit 76bc99e81a7cb78a78e058107e4b5b1d8ed3c874 ]

The initially proposed KASAN command line parameters are redundant.

This change drops the complex "kasan.mode=off/prod/full" parameter and
adds a simpler kill switch "kasan=off/on" instead.  The new parameter
together with the already existing ones provides a cleaner way to
express the same set of features.

The full set of parameters with this change:

  kasan=off/on             - whether KASAN is enabled
  kasan.fault=report/panic - whether to only print a report or also panic
  kasan.stacktrace=off/on  - whether to collect alloc/free stack traces

Default values:

  kasan=on
  kasan.fault=report
  kasan.stacktrace=on  (if CONFIG_DEBUG_KERNEL=y)
  kasan.stacktrace=off (otherwise)

Link: https://linux-review.googlesource.com/id/Ib3694ed90b1e8ccac6cf77dfd301847af4aba7b8
Link: https://lkml.kernel.org/r/4e9c4a4bdcadc168317deb2419144582a9be6e61.1610736745.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 172318110
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: I21e006ed49fca128ed18e4580af31b3666e74775
2021-01-29 20:03:43 +00:00
Kalesh Singh
62098d36eb UPSTREAM: mm/mremap.c: fix extent calculation
When `next < old_addr`, `next - old_addr` arithmetic underflows causing
`extent` to be incorrect.

Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`.

Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com
Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e05986ee7a5814bec0e0075d813daca3d46e4a9e)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I86af1f63d76455632e2e0ba2892bac14d29c360e
2021-01-29 18:27:32 +00:00
Kalesh Singh
dcceb19998 UPSTREAM: mm: speedup mremap on 1GB or larger regions
Android needs to move large memory regions for garbage collection.  The GC
requires moving physical pages of multi-gigabyte heap using mremap.
During this move, the application threads have to be paused for
correctness.  It is critical to keep this pause as short as possible to
avoid jitters during user interaction.

Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if
the source and destination addresses are PUD-aligned.  For
CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD
entries, since the PUD entry is “folded back” onto the PGD entry.  Add
HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't
supported/tested can turn this off by not selecting the config.

Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hassan Naveed <hnaveed@wavecomp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c49dd340180260c6239e453263a9a244da9a7c85)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ied000832cd646124bf7507ea5ee7bbabc214015d
2021-01-29 18:27:06 +00:00
Greg Kroah-Hartman
ba152773be This is the 5.10.11 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmARRrcACgkQONu9yGCS
 aT7n8xAA0lDCM7V1v/EAYw+7ZN1HNXCZhkOZzwrbFz2SSRnu0UGRbA6B9tcdHWA4
 FX3ZgNcEaut3HgT4TYa3xZfhq6tB4w1vfHXyvcuDlgVv+0+LGbPaZiH/pq6wIE00
 s3buqbtasRIQw4UEokos6ZJCiXHr9EmbCauQYViSf8YWR5tDjbIaqUB2TI/Cw4Io
 xYYqpnHJ51KItNvOPaFoUWYA6zrD7R963aAf+vmvBfDrgcs0UgH5OTO0XY6FCYKR
 ZgVMX/1pioWTAF8TMq2pzU55FP7SlohlcISC8NQrygq8JrKHVmd9c5U/MTd4ytaz
 nBaEizEA8Ewz3eTSSED173jJ1OMA06prfx1NQ4fO90U11eTj+IG9qkLePsB7UBe8
 7FEJ8lTKgWDDwC7UzctHaI7mIyzAX5EiItRjjlln++bnQZNjjd/TBVGqXu4FAGBP
 0yBamdBrNVLVmjJQ7Wudt2iZU/gMkt9R+frq24IZLjW+5XB3L3G3S3vAm+8IYthV
 OVh23SfD/w2kPJcvw4z8ZYeOigyDd7CgoLeVwrdH6jM9N1gEEElkt6gsJugPhnxH
 odRRfUaRp+3j6M6ZCf8ai2QzsE0l3Uu+mJHdRaWiXOMmbsIuDFrYW/iNlLyHy8i8
 EZ01/chEyjs7ek8TNdkbhpHow3wI3Uvpczt/BG7OxH/F+DpTmkE=
 =tvv2
 -----END PGP SIGNATURE-----

Merge 5.10.11 into android12-5.10

Changes in 5.10.11
	scsi: target: tcmu: Fix use-after-free of se_cmd->priv
	mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
	mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
	i2c: tegra: Wait for config load atomically while in ISR
	i2c: bpmp-tegra: Ignore unknown I2C_M flags
	platform/x86: i2c-multi-instantiate: Don't create platform device for INT3515 ACPI nodes
	platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
	ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
	ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
	ALSA: hda/via: Add minimum mute flag
	crypto: xor - Fix divide error in do_xor_speed()
	dm crypt: fix copy and paste bug in crypt_alloc_req_aead
	ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
	btrfs: don't get an EINTR during drop_snapshot for reloc
	btrfs: do not double free backref nodes on error
	btrfs: fix lockdep splat in btrfs_recover_relocation
	btrfs: don't clear ret in btrfs_start_dirty_block_groups
	btrfs: send: fix invalid clone operations when cloning from the same file and root
	fs: fix lazytime expiration handling in __writeback_single_inode()
	pinctrl: ingenic: Fix JZ4760 support
	mmc: core: don't initialize block size from ext_csd if not present
	mmc: sdhci-of-dwcmshc: fix rpmb access
	mmc: sdhci-xenon: fix 1.8v regulator stabilization
	mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
	dm: avoid filesystem lookup in dm_get_dev_t()
	dm integrity: fix a crash if "recalculate" used without "internal_hash"
	dm integrity: conditionally disable "recalculate" feature
	drm/atomic: put state on error path
	drm/syncobj: Fix use-after-free
	drm/amdgpu: remove gpu info firmware of green sardine
	drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case
	drm/i915/gt: Prevent use of engine->wa_ctx after error
	drm/i915: Check for rq->hwsp validity after acquiring RCU lock
	ASoC: Intel: haswell: Add missing pm_ops
	ASoC: rt711: mutex between calibration and power state changes
	SUNRPC: Handle TCP socket sends with kernel_sendpage() again
	HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device
	HID: sony: select CONFIG_CRC32
	dm integrity: select CRYPTO_SKCIPHER
	x86/hyperv: Fix kexec panic/hang issues
	scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
	scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
	scsi: qedi: Correct max length of CHAP secret
	scsi: scsi_debug: Fix memleak in scsi_debug_init()
	scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
	riscv: Fix kernel time_init()
	riscv: Fix sifive serial driver
	riscv: Enable interrupts during syscalls with M-Mode
	HID: logitech-dj: add the G602 receiver
	HID: Ignore battery for Elan touchscreen on ASUS UX550
	clk: tegra30: Add hda clock default rates to clock driver
	ALSA: hda/tegra: fix tegra-hda on tegra30 soc
	riscv: cacheinfo: Fix using smp_processor_id() in preemptible
	arm64: make atomic helpers __always_inline
	xen: Fix event channel callback via INTX/GSI
	x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery
	x86/xen: Fix xen_hvm_smp_init() when vector callback not available
	dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
	dts: phy: add GPIO number and active state used for phy reset
	riscv: defconfig: enable gpio support for HiFive Unleashed
	drm/amdgpu/psp: fix psp gfx ctrl cmds
	drm/amd/display: disable dcn10 pipe split by default
	HID: logitech-hidpp: Add product ID for MX Ergo in Bluetooth mode
	drm/amd/display: Fix to be able to stop crc calculation
	drm/nouveau/bios: fix issue shadowing expansion ROMs
	drm/nouveau/privring: ack interrupts the same way as RM
	drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields
	drm/nouveau/mmu: fix vram heap sizing
	drm/nouveau/kms/nv50-: fix case where notifier buffer is at offset 0
	io_uring: flush timeouts that should already have expired
	libperf tests: If a test fails return non-zero
	libperf tests: Fail when failing to get a tracepoint id
	RISC-V: Set current memblock limit
	RISC-V: Fix maximum allowed phsyical memory for RV32
	x86/xen: fix 'nopvspin' build error
	nfsd: Fixes for nfsd4_encode_read_plus_data()
	nfsd: Don't set eof on a truncated READ_PLUS
	gpiolib: cdev: fix frame size warning in gpio_ioctl()
	pinctrl: aspeed: g6: Fix PWMG0 pinctrl setting
	pinctrl: mediatek: Fix fallback call path
	RDMA/ucma: Do not miss ctx destruction steps in some cases
	btrfs: print the actual offset in btrfs_root_name
	scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression
	scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEM
	scsi: ufs: Fix tm request when non-fatal error happens
	crypto: omap-sham - Fix link error without crypto-engine
	bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
	powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
	powerpc: Fix alignment bug within the init sections
	arm64: entry: remove redundant IRQ flag tracing
	bpf: Reject too big ctx_size_in for raw_tp test run
	drm/amdkfd: Fix out-of-bounds read in kdf_create_vcrat_image_cpu()
	RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()
	RDMA/cma: Fix error flow in default_roce_mode_store
	printk: ringbuffer: fix line counting
	printk: fix kmsg_dump_get_buffer length calulations
	iov_iter: fix the uaccess area in copy_compat_iovec_from_user
	i2c: octeon: check correct size of maximum RECV_LEN packet
	drm/vc4: Unify PCM card's driver_name
	platform/x86: intel-vbtn: Drop HP Stream x360 Convertible PC 11 from allow-list
	platform/x86: hp-wmi: Don't log a warning on HPWMI_RET_UNKNOWN_COMMAND errors
	gpio: sifive: select IRQ_DOMAIN_HIERARCHY rather than depend on it
	ALSA: hda: Balance runtime/system PM if direct-complete is disabled
	xsk: Clear pool even for inactive queues
	selftests: net: fib_tests: remove duplicate log test
	can: dev: can_restart: fix use after free bug
	can: vxcan: vxcan_xmit: fix use after free bug
	can: peak_usb: fix use after free bugs
	perf evlist: Fix id index for heterogeneous systems
	i2c: sprd: depend on COMMON_CLK to fix compile tests
	iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
	iio: ad5504: Fix setting power-down state
	drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
	iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
	counter:ti-eqep: remove floor
	powerpc/64s: fix scv entry fallback flush vs interrupt
	cifs: do not fail __smb_send_rqst if non-fatal signals are pending
	irqchip/mips-cpu: Set IPI domain parent chip
	x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
	x86/topology: Make __max_die_per_package available unconditionally
	x86/mmx: Use KFPU_387 for MMX string operations
	x86/setup: don't remove E820_TYPE_RAM for pfn 0
	proc_sysctl: fix oops caused by incorrect command parameters
	mm: memcg/slab: optimize objcg stock draining
	mm: memcg: fix memcg file_dirty numa stat
	mm: fix numa stats for thp migration
	io_uring: iopoll requests should also wake task ->in_idle state
	io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
	io_uring: fix short read retries for non-reg files
	intel_th: pci: Add Alder Lake-P support
	stm class: Fix module init return on allocation failure
	serial: mvebu-uart: fix tx lost characters at power off
	ehci: fix EHCI host controller initialization sequence
	USB: ehci: fix an interrupt calltrace error
	usb: gadget: aspeed: fix stop dma register setting.
	USB: gadget: dummy-hcd: Fix errors in port-reset handling
	usb: udc: core: Use lock when write to soft_connect
	usb: bdc: Make bdc pci driver depend on BROKEN
	usb: cdns3: imx: fix writing read-only memory issue
	usb: cdns3: imx: fix can't create core device the second time issue
	xhci: make sure TRB is fully written before giving it to the controller
	xhci: tegra: Delay for disabling LFPS detector
	drivers core: Free dma_range_map when driver probe failed
	driver core: Fix device link device name collision
	driver core: Extend device_is_dependent()
	drm/i915: s/intel_dp_sink_dpms/intel_dp_set_power/
	drm/i915: Only enable DFP 4:4:4->4:2:0 conversion when outputting YCbCr 4:4:4
	x86/entry: Fix noinstr fail
	x86/cpu/amd: Set __max_die_per_package on AMD
	cls_flower: call nla_ok() before nla_next()
	netfilter: rpfilter: mask ecn bits before fib lookup
	tools: gpio: fix %llu warning in gpio-event-mon.c
	tools: gpio: fix %llu warning in gpio-watch.c
	drm/i915/hdcp: Update CP property in update_pipe
	sh: dma: fix kconfig dependency for G2_DMA
	sh: Remove unused HAVE_COPY_THREAD_TLS macro
	locking/lockdep: Cure noinstr fail
	ASoC: SOF: Intel: fix page fault at probe if i915 init fails
	octeontx2-af: Fix missing check bugs in rvu_cgx.c
	net: dsa: mv88e6xxx: also read STU state in mv88e6250_g1_vtu_getnext
	selftests/powerpc: Fix exit status of pkey tests
	sh_eth: Fix power down vs. is_opened flag ordering
	nvme-pci: refactor nvme_unmap_data
	nvme-pci: fix error unwind in nvme_map_data
	cachefiles: Drop superfluous readpages aops NULL check
	lightnvm: fix memory leak when submit fails
	skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
	kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
	kasan: fix incorrect arguments passing in kasan_add_zero_shadow
	tcp: fix TCP socket rehash stats mis-accounting
	net_sched: gen_estimator: support large ewma log
	udp: mask TOS bits in udp_v4_early_demux()
	ipv6: create multicast route with RTPROT_KERNEL
	net_sched: avoid shift-out-of-bounds in tcindex_set_parms()
	net_sched: reject silly cell_log in qdisc_get_rtab()
	ipv6: set multicast flag on the multicast route
	net: mscc: ocelot: allow offloading of bridge on top of LAG
	net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
	net: dsa: b53: fix an off by one in checking "vlan->vid"
	tcp: do not mess with cloned skbs in tcp_add_backlog()
	tcp: fix TCP_USER_TIMEOUT with zero window
	net: mscc: ocelot: Fix multicast to the CPU port
	net: core: devlink: use right genl user_ptr when handling port param get/set
	pinctrl: qcom: Allow SoCs to specify a GPIO function that's not 0
	pinctrl: qcom: No need to read-modify-write the interrupt status
	pinctrl: qcom: Properly clear "intr_ack_high" interrupts when unmasking
	pinctrl: qcom: Don't clear pending interrupts when enabling
	x86/sev: Fix nonistr violation
	tty: implement write_iter
	tty: fix up hung_up_tty_write() conversion
	net: systemport: free dev before on error path
	x86/sev-es: Handle string port IO to kernel memory properly
	tcp: Fix potential use-after-free due to double kfree()
	ASoC: SOF: Intel: hda: Avoid checking jack on system suspend
	drm/i915/hdcp: Get conn while content_type changed
	bpf: Local storage helpers should check nullness of owner ptr passed
	kernfs: implement ->read_iter
	kernfs: implement ->write_iter
	kernfs: wire up ->splice_read and ->splice_write
	interconnect: imx8mq: Use icc_sync_state
	fs/pipe: allow sendfile() to pipe again
	Commit 9bb48c82aced ("tty: implement write_iter") converted the tty layer to use write_iter. Fix the redirected_tty_write declaration also in n_tty and change the comparisons to use write_iter instead of write. also in n_tty and change the comparisons to use write_iter instead of write.
	mm: fix initialization of struct page for holes in memory layout
	Revert "mm: fix initialization of struct page for holes in memory layout"
	Linux 5.10.11

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I502239d06f9bcd68b59149376f3c796c64de5942
2021-01-27 12:12:33 +01:00
Linus Torvalds
1daa298a04 Revert "mm: fix initialization of struct page for holes in memory layout"
commit 377bf660d07a47269510435d11f3b65d53edca20 upstream.

This reverts commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399.

Chris Wilson reports that it causes boot problems:

 "We have half a dozen or so different machines in CI that are silently
  failing to boot, that we believe is bisected to this patch"

and the CI team confirmed that a revert fixed the issues.

The cause is unknown for now, so let's revert it.

Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/
Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:30 +01:00
Mike Rapoport
f2a79851c7 mm: fix initialization of struct page for holes in memory layout
commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399 upstream.

There could be struct pages that are not backed by actual physical
memory.  This can happen when the actual memory bank is not a multiple
of SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.

Such pages are currently initialized using init_unavailable_mem()
function that iterates through PFNs in holes in memblock.memory and if
there is a struct page corresponding to a PFN, the fields if this page
are set to default values and the page is marked as Reserved.

init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.

On a system that has firmware reserved holes in a zone above ZONE_DMA,
for instance in a configuration below:

	# grep -A1 E820 /proc/iomem
	7a17b000-7a216fff : Unknown E820 type
	7a217000-7bffffff : System RAM

unset zone link in struct page will trigger

	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.

Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.

Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:30 +01:00
Lecopzer Chen
fee5a83dfc kasan: fix incorrect arguments passing in kasan_add_zero_shadow
commit 5dabd1712cd056814f9ab15f1d68157ceb04e741 upstream.

kasan_remove_zero_shadow() shall use original virtual address, start and
size, instead of shadow address.

Link: https://lkml.kernel.org/r/20210103063847.5963-1-lecopzer@gmail.com
Fixes: 0207df4fa1 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:23 +01:00
Lecopzer Chen
ecd63f04e7 kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
commit a11a496ee6e2ab6ed850233c96b94caf042af0b9 upstream.

During testing kasan_populate_early_shadow and kasan_remove_zero_shadow,
if the shadow start and end address in kasan_remove_zero_shadow() is not
aligned to PMD_SIZE, the remain unaligned PTE won't be removed.

In the test case for kasan_remove_zero_shadow():

    shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000

    3-level page table:
      PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K

0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because in
kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the next
address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE.

In the correct condition, this should fallback to the next level
kasan_remove_pmd_table() but the condition flow always continue to skip
the unaligned part.

Fix by correcting the condition when next and addr are neither aligned.

Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com
Fixes: 0207df4fa1 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:23 +01:00
Shakeel Butt
371f3fbf4f mm: fix numa stats for thp migration
commit 5c447d274f3746fbed6e695e7b9a2d7bd8b31b71 upstream.

Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration.  Fix that.

For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment
there is no need to handle THP migration as kernel still does not have
write support for file THP but to be more future proof, this patch adds
the THP support for those stats as well.

Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:14 +01:00
Shakeel Butt
0dc3a130cc mm: memcg: fix memcg file_dirty numa stat
commit 8a8792f600abacd7e1b9bb667759dca1c153f64c upstream.

The kernel updates the per-node NR_FILE_DIRTY stats on page migration
but not the memcg numa stats.

That was not an issue until recently the commit 5f9a4f4a70 ("mm:
memcontrol: add the missing numa_stat interface for cgroup v2") exposed
numa stats for the memcg.

So fix the file_dirty per-memcg numa stat.

Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
Fixes: 5f9a4f4a70 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:14 +01:00
Roman Gushchin
26f54dac15 mm: memcg/slab: optimize objcg stock draining
commit 3de7d4f25a7438f09fef4e71ef111f1805cd8e7c upstream.

Imran Khan reported a 16% regression in hackbench results caused by the
commit f2fe7b09a5 ("mm: memcg/slab: charge individual slab objects
instead of pages").  The regression is noticeable in the case of a
consequent allocation of several relatively large slab objects, e.g.
skb's.  As soon as the amount of stocked bytes exceeds PAGE_SIZE,
drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
to a number of atomic operations in page_counter_uncharge().

The corresponding call graph is below (provided by Imran Khan):

  |__alloc_skb
  |    |
  |    |__kmalloc_reserve.isra.61
  |    |    |
  |    |    |__kmalloc_node_track_caller
  |    |    |    |
  |    |    |    |slab_pre_alloc_hook.constprop.88
  |    |    |     obj_cgroup_charge
  |    |    |    |    |
  |    |    |    |    |__memcg_kmem_charge
  |    |    |    |    |    |
  |    |    |    |    |    |page_counter_try_charge
  |    |    |    |    |
  |    |    |    |    |refill_obj_stock
  |    |    |    |    |    |
  |    |    |    |    |    |drain_obj_stock.isra.68
  |    |    |    |    |    |    |
  |    |    |    |    |    |    |__memcg_kmem_uncharge
  |    |    |    |    |    |    |    |
  |    |    |    |    |    |    |    |page_counter_uncharge
  |    |    |    |    |    |    |    |    |
  |    |    |    |    |    |    |    |    |page_counter_cancel
  |    |    |    |
  |    |    |    |
  |    |    |    |__slab_alloc
  |    |    |    |    |
  |    |    |    |    |___slab_alloc
  |    |    |    |    |
  |    |    |    |slab_post_alloc_hook

Instead of directly uncharging the accounted kernel memory, it's
possible to refill the generic page-sized per-cpu stock instead.  It's a
much faster operation, especially on a default hierarchy.  As a bonus,
__memcg_kmem_uncharge_page() will also get faster, so the freeing of
page-sized kernel allocations (e.g.  large kmallocs) will become faster.

A similar change has been done earlier for the socket memory by the
commit 475d0487a2 ("mm: memcontrol: use per-cpu stocks for socket
memory uncharging").

Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
Fixes: f2fe7b09a5 ("mm: memcg/slab: charge individual slab objects instead of pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Tested-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Michal Koutn <mkoutny@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:55:14 +01:00
Suren Baghdasaryan
b55d8223ca ANDROID: Fix sparse warning in wp_page_copy caused by SPF patchset
SPF patchset introduced a sparse warning caused by type mismatch in
wp_page_copy. Fix the variable type.

Fixes: 'FROMLIST: mm: prepare for FAULT_FLAG_SPECULATIVE'

Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id0db15db82bce0808d58f27bd72a1db0f63c72b7
2021-01-26 18:57:51 +00:00
Minchan Kim
20512940b8 FROMLIST: mm: failfast mode with __GFP_NORETRY in alloc_contig_range
Contiguous memory allocation can be stalled due to waiting
on page writeback and/or page lock which causes unpredictable
delay. It's a unavoidable cost for the requestor to get *big*
contiguous memory but it's expensive for *small* contiguous
memory(e.g., order-4) because caller could retry the request
in different range where would have easy migratable pages
without stalling.

This patch introduce __GFP_NORETRY as compaction gfp_mask in
alloc_contig_range so it will fail fast without blocking
when it encounters pages needed waiting.

Bug: 170340257
Bug: 120293424
Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m1362218ebb69e6e10c20d9361008b079745c4e6f
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I42ba8dd5aeb065d936978ab205e4baf84bf9a321
2021-01-25 12:21:02 -08:00
Minchan Kim
23ba990a3e FROMLIST: mm: cma: introduce gfp flag in cma_alloc instead of no_warn
The upcoming patch will introduce __GFP_NORETRY semantic
in alloc_contig_range which is a failfast mode of the API.
Instead of adding a additional parameter for gfp, replace
no_warn with gfp flag.

To keep old behaviors, it follows the rule below.

  no_warn 			gfp_flags

  false         		GFP_KERNEL
  true          		GFP_KERNEL|__GFP_NOWARN
  gfp & __GFP_NOWARN		GFP_KERNEL | (gfp & __GFP_NOWARN)

Bug: 170340257
Bug: 120293424
Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m36b144ff81fe0a8f0ecaf6813de4819ecc41f8fe
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I1ce020ab5d5fff34eb6464be4632ddef72fb43eb
2021-01-25 12:21:02 -08:00
Vinayak Menon
c9201630e8 ANDROID: mm: use raw seqcount variants in vm_write_*
write_seqcount_begin expects to be called from a non-preemptible
context to avoid preemption by a read section that can spin due
to an odd value. But the readers of vm_sequence never retries and
thus writers need not disable preemption. Use the non-lockdep
variant as lockdep checks are now in-built to write_seqcount_begin.

Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Change-Id: If4f0cddd7f0a79136495060d4acc1702abb46817
2021-01-22 18:02:43 +00:00
Patrick Daly
531f65ae67 ANDROID: mm: Fix sleeping while atomic during speculative page fault
A speculative page fault may race with a call to free_pagetables. If
free_pagetables is called first, *(vmf->pmd) may be empty.

__might_sleep()
__alloc_pages_nodemask()
pte_alloc_one(inline)
__pte_alloc()
pte_alloc_one_map(inline)
alloc_set_pte()
filemap_map_pages()
do_fault_around(inline)
do_read_fault(inline)
do_fault(inline)
handle_pte_fault()
mem_cgroup_exit_user_fault(inline)
__handle_speculative_fault()
do_page_fault()

As filemap_map_pages() holds an rcu_lock(), this triggers a
sleeping-while-atomic BUG(). As free_pagetables has already been called,
it is also a memory leak.

Fix this by skipping to pte_map_lock() to allow spf to detect that
the vma has changed, and a normal page fault should be taken instead.

Change-Id: I121ca4be99c908656db3a1dc88cfb3b64f01e2fb
Bug: 161210518
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-01-22 18:02:32 +00:00
Laurent Dufour
14624d3dc3 FROMLIST: mm: don't do swap readahead during speculative page fault
Vinayak Menon faced a panic because one thread was page faulting a page in
swap, while another one was mprotecting a part of the VMA leading to a VMA
split.
This raise a panic in swap_vma_readahead() because the VMA's boundaries
were not more matching the faulting address.

To avoid this, if the page is not found in the swap, the speculative page
fault is aborted to retry a regular page fault.

Change-Id: Ia9d99fb5fde7bd89f38966838d115b6c8c15c9db
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/patchwork/patch/1062665/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-01-22 18:02:21 +00:00
Laurent Dufour
caf05c003b FROMLIST: mm: add speculative page fault vmstats
Add speculative_pgfault vmstat counter to count successful speculative page
fault handling.

Also fixing a minor typo in include/linux/vm_event_item.h.

Change-Id: I0d3f3dc5195e1156d4b8edf83aff9d8d85904e8e
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-24-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-01-22 18:01:43 +00:00
Laurent Dufour
99e15a0799 FROMLIST: mm: speculative page fault handler return VMA
When the speculative page fault handler is returning VM_RETRY, there is a
chance that VMA fetched without grabbing the mmap_sem can be reused by the
legacy page fault handler.  By reusing it, we avoid calling find_vma()
again. To achieve, that we must ensure that the VMA structure will not be
freed in our back. This is done by getting the reference on it (get_vma())
and by assuming that the caller will call the new service
can_reuse_spf_vma() once it has grabbed the mmap_sem.

can_reuse_spf_vma() is first checking that the VMA is still in the RB tree
, and then that the VMA's boundaries matched the passed address and release
the reference on the VMA so that it can be freed if needed.

In the case the VMA is freed, can_reuse_spf_vma() will have returned false
as the VMA is no more in the RB tree.

In the architecture page fault handler, the call to the new service
reuse_spf_or_find_vma() should be made in place of find_vma(), this will
handle the check on the spf_vma and if needed call find_vma().

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Change-Id: Ia56dcf807e8bddf6788fd696dd80372db35476f0
Link: https://lore.kernel.org/lkml/1523975611-15978-23-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-01-22 18:01:34 +00:00
Laurent Dufour
736ae8bde8 FROMLIST: mm: adding speculative page fault failure trace events
This patch a set of new trace events to collect the speculative page fault
event failures.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Change-Id: I9dafba293edf40bdad4ae241d105ecdfb42579c1
Link: https://lore.kernel.org/lkml/1523975611-15978-20-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-01-22 18:01:25 +00:00
Peter Zijlstra
1c53717440 FROMLIST: mm: provide speculative fault infrastructure
Provide infrastructure to do a speculative fault (not holding
mmap_sem).

The not holding of mmap_sem means we can race against VMA
change/removal and page-table destruction. We use the SRCU VMA freeing
to keep the VMA around. We use the VMA seqcount to detect change
(including umapping / page-table deletion) and we use gup_fast() style
page-table walking to deal with page-table races.

Once we've obtained the page and are ready to update the PTE, we
validate if the state we started the fault with is still valid, if
not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
PTE and we're done.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

[Manage the newly introduced pte_spinlock() for speculative page
 fault to fail if the VMA is touched in our back]
[Rename vma_is_dead() to vma_has_changed() and declare it here]
[Fetch p4d and pud]
[Set vmd.sequence in __handle_mm_fault()]
[Abort speculative path when handle_userfault() has to be called]
[Add additional VMA's flags checks in handle_speculative_fault()]
[Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
[Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
[Remove warning comment about waiting for !seq&1 since we don't want
 to wait]
[Remove warning about no huge page support, mention it explictly]
[Don't call do_fault() in the speculative path as __do_fault() calls
 vma->vm_ops->fault() which may want to release mmap_sem]
[Only vm_fault pointer argument for vma_has_changed()]
[Fix check against huge page, calling pmd_trans_huge()]
[Use READ_ONCE() when reading VMA's fields in the speculative path]
[Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
 processing done in vm_normal_page()]
[Check that vma->anon_vma is already set when starting the speculative
 path]
[Check for memory policy as we can't support MPOL_INTERLEAVE case due to
 the processing done in mpol_misplaced()]
[Don't support VMA growing up or down]
[Move check on vm_sequence just before calling handle_pte_fault()]
[Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
[Add mem cgroup oom check]
[Use READ_ONCE to access p*d entries]
[Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
[Don't fetch pte again in handle_pte_fault() when running the speculative
 path]
[Check PMD against concurrent collapsing operation]
[Try spin lock the pte during the speculative path to avoid deadlock with
 other CPU's invalidating the TLB and requiring this CPU to catch the
 inter processor's interrupt]
[Move define of FAULT_FLAG_SPECULATIVE here]
[Introduce __handle_speculative_fault() and add a check against
 mm->mm_users in handle_speculative_fault() defined in mm.h]
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-19-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Change-Id: I6a29e6edd9779bd34a9f7f4f6034e041a8487f30
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-01-22 18:01:16 +00:00
Laurent Dufour
6d8fc36d9e FROMLIST: mm: protect mm_rb tree with a rwlock
This change is inspired by the Peter's proposal patch [1] which was
protecting the VMA using SRCU. Unfortunately, SRCU is not scaling well in
that particular case, and it is introducing major performance degradation
due to excessive scheduling operations.

To allow access to the mm_rb tree without grabbing the mmap_sem, this patch
is protecting it access using a rwlock.  As the mm_rb tree is a O(log n)
search it is safe to protect it using such a lock.  The VMA cache is not
protected by the new rwlock and it should not be used without holding the
mmap_sem.

To allow the picked VMA structure to be used once the rwlock is released, a
use count is added to the VMA structure. When the VMA is allocated it is
set to 1.  Each time the VMA is picked with the rwlock held its use count
is incremented. Each time the VMA is released it is decremented. When the
use count hits zero, this means that the VMA is no more used and should be
freed.

This patch is preparing for 2 kind of VMA access :
 - as usual, under the control of the mmap_sem,
 - without holding the mmap_sem for the speculative page fault handler.

Access done under the control the mmap_sem doesn't require to grab the
rwlock to protect read access to the mm_rb tree, but access in write must
be done under the protection of the rwlock too. This affects inserting and
removing of elements in the RB tree.

The patch is introducing 2 new functions:
 - vma_get() to find a VMA based on an address by holding the new rwlock.
 - vma_put() to release the VMA when its no more used.
These services are designed to be used when access are made to the RB tree
without holding the mmap_sem.

When a VMA is removed from the RB tree, its vma->vm_rb field is cleared and
we rely on the WMB done when releasing the rwlock to serialize the write
with the RMB done in a later patch to check for the VMA's validity.

When free_vma is called, the file associated with the VMA is closed
immediately, but the policy and the file structure remained in used until
the VMA's use count reach 0, which may happens later when exiting an
in progress speculative page fault.

[1] https://patchwork.kernel.org/patch/5108281/

Change-Id: I9ecc922b8efa4b28975cc6a8e9531284c24ac14e
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-18-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-01-22 18:00:57 +00:00
Laurent Dufour
a1dbf20e8e FROMLIST: mm: introduce __page_add_new_anon_rmap()
When dealing with speculative page fault handler, we may race with VMA
being split or merged. In this case the vma->vm_start and vm->vm_end
fields may not match the address the page fault is occurring.

This can only happens when the VMA is split but in that case, the
anon_vma pointer of the new VMA will be the same as the original one,
because in __split_vma the new->anon_vma is set to src->anon_vma when
*new = *vma.

So even if the VMA boundaries are not correct, the anon_vma pointer is
still valid.

If the VMA has been merged, then the VMA in which it has been merged
must have the same anon_vma pointer otherwise the merge can't be done.

So in all the case we know that the anon_vma is valid, since we have
checked before starting the speculative page fault that the anon_vma
pointer is valid for this VMA and since there is an anon_vma this
means that at one time a page has been backed and that before the VMA
is cleaned, the page table lock would have to be grab to clean the
PTE, and the anon_vma field is checked once the PTE is locked.

This patch introduce a new __page_add_new_anon_rmap() service which
doesn't check for the VMA boundaries, and create a new inline one
which do the check.

When called from a page fault handler, if this is not a speculative one,
there is a guarantee that vm_start and vm_end match the faulting address,
so this check is useless. In the context of the speculative page fault
handler, this check may be wrong but anon_vma is still valid as explained
above.

Change-Id: I72c47830181579f8c9618df879077d321653b5f1
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-17-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-01-22 18:00:48 +00:00
Laurent Dufour
10a5eb6be8 FROMLIST: mm: introduce __vm_normal_page()
When dealing with the speculative fault path we should use the VMA's field
cached value stored in the vm_fault structure.

Currently vm_normal_page() is using the pointer to the VMA to fetch the
vm_flags value. This patch provides a new __vm_normal_page() which is
receiving the vm_flags flags value as parameter.

Note: The speculative path is turned on for architecture providing support
for special PTE flag. So only the first block of vm_normal_page is used
during the speculative path.

Change-Id: I0f2c4ab1212fbca449bdf6e7993dafa0d41044bc
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-16-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-01-22 18:00:40 +00:00