android_kernel_xiaomi_sm8450/mm/Kconfig.debug
Minchan Kim 6e12c5b7d4 ANDROID: mm: introduce page_pinner
For CMA allocation, it's really critical to migrate a page but
sometimes it fails. One of the reasons is some driver holds a
page refcount for a long time so VM couldn't migrate the page
at that time.

The concern here is there is no way to find the who hold the
refcount of the page effectively. This patch introduces feature
to keep tracking page's pinner. All get_page sites are vulnerable
to pin a page for a long time but the cost to keep track it would
be significat since get_page is the most frequent kernel operation.
Furthermore, the page could be not user page but kernel page which
is not related to the page migration failure. So, this patch keeps
tracking only get_user_pages/follow_page with (FOLL_GET|PIN friends
because they are the very common APIs to pin user pages which could
cause migration failure and the less frequent than get_page so
runtime cost wouldn't be that big but could cover many cases
effectively.

This patch also introduces put_user_page API. It aims for attributing
"the pinner releases the page from now on" while it release the
page refcount. Thus, any user of get_user_pages/follow_page(FOLL_GET)
must use put_user_page as pair of those functions. Otherwise,
page_pinner will treat them long term pinner as false postive but
nothing should affect stability.

* $debugfs/page_pinner/threshold

It indicates threshold(microsecond) to flag long term pinning.
It's configurable(Default is 300000us). Once you write new value
to the threshold, old data will clear.

* $debugfs/page_pinner/longterm_pinner

It shows call sites where the duration of pinning was greater than
the threshold. Internally, it uses a static array to keep 4096
elements and overwrites old ones once overflow happens. Therefore,
you could lose some information.

example)
Page pinned ts 76953865787 us count 1
PFN 9856945 Block 9625 type Movable Flags 0x8000000000080014(uptodate|lru|swapbacked)
 __set_page_pinner+0x34/0xcc
 try_grab_page+0x19c/0x1a0
 follow_page_pte+0x1c0/0x33c
 follow_page_mask+0xc0/0xc8
 __get_user_pages+0x178/0x414
 __gup_longterm_locked+0x80/0x148
 internal_get_user_pages_fast+0x140/0x174
 pin_user_pages_fast+0x24/0x40
 CCC
 BBB
 AAA
 __arm64_sys_ioctl+0x94/0xd0
 el0_svc_common+0xa4/0x180
 do_el0_svc+0x28/0x88
 el0_svc+0x14/0x24

note: page_pinner doesn't guarantee attributing/unattributing are
atomic if they happen at the same time. It's just best effort so
false-positive could happen.

Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ife37ec360eef993d390b9c131732218a4dfd2f04
2021-04-30 09:13:34 -07:00

169 lines
6.1 KiB
Plaintext

# SPDX-License-Identifier: GPL-2.0-only
config PAGE_EXTENSION
bool "Extend memmap on extra space for more information on page"
help
Extend memmap on extra space for more information on page. This
could be used for debugging features that need to insert extra
field for every page. This extension enables us to save memory
by not allocating this extra memory according to boottime
configuration.
config DEBUG_PAGEALLOC
bool "Debug page memory allocations"
depends on DEBUG_KERNEL
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
select PAGE_POISONING if !ARCH_SUPPORTS_DEBUG_PAGEALLOC
help
Unmap pages from the kernel linear mapping after free_pages().
Depending on runtime enablement, this results in a small or large
slowdown, but helps to find certain types of memory corruption.
Also, the state of page tracking structures is checked more often as
pages are being allocated and freed, as unexpected state changes
often happen for same reasons as memory corruption (e.g. double free,
use-after-free). The error reports for these checks can be augmented
with stack traces of last allocation and freeing of the page, when
PAGE_OWNER is also selected and enabled on boot.
For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
fill the pages with poison patterns after free_pages() and verify
the patterns before alloc_pages(). Additionally, this option cannot
be enabled in combination with hibernation as that would result in
incorrect warnings of memory corruption after a resume because free
pages are not saved to the suspend image.
By default this option will have a small overhead, e.g. by not
allowing the kernel mapping to be backed by large pages on some
architectures. Even bigger overhead comes when the debugging is
enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
command line parameter.
config DEBUG_PAGEALLOC_ENABLE_DEFAULT
bool "Enable debug page memory allocations by default?"
depends on DEBUG_PAGEALLOC
help
Enable debug page memory allocations by default? This value
can be overridden by debug_pagealloc=off|on.
config PAGE_OWNER
bool "Track page owner"
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
select DEBUG_FS
select STACKTRACE
select STACKDEPOT
select PAGE_EXTENSION
help
This keeps track of what call chain is the owner of a page, may
help to find bare alloc_page(s) leaks. Even if you include this
feature on your build, it is disabled in default. You should pass
"page_owner=on" to boot parameter in order to enable it. Eats
a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
for user-space helper.
If unsure, say N.
config PAGE_PINNER
bool "Track page pinner"
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
select DEBUG_FS
select STACKTRACE
select STACKDEPOT
select PAGE_EXTENSION
help
This keeps track of what call chain is the pinner of a page, may
help to find page migration failures. Even if you include this
feature in your build, it is disabled by default. You should pass
"page_pinner=on" to boot parameter in order to enable it. Eats
a fair amount of memory if enabled.
If unsure, say N.
config PAGE_POISONING
bool "Poison pages after freeing"
help
Fill the pages with poison patterns after free_pages() and verify
the patterns before alloc_pages. The filling of the memory helps
reduce the risk of information leaks from freed data. This does
have a potential performance impact if enabled with the
"page_poison=1" kernel boot option.
Note that "poison" here is not the same thing as the "HWPoison"
for CONFIG_MEMORY_FAILURE. This is software poisoning only.
If you are only interested in sanitization of freed pages without
checking the poison pattern on alloc, you can boot the kernel with
"init_on_free=1" instead of enabling this.
If unsure, say N
config DEBUG_PAGE_REF
bool "Enable tracepoint to track down page reference manipulation"
depends on DEBUG_KERNEL
depends on TRACEPOINTS
help
This is a feature to add tracepoint for tracking down page reference
manipulation. This tracking is useful to diagnose functional failure
due to migration failures caused by page reference mismatches. Be
careful when enabling this feature because it adds about 30 KB to the
kernel code. However the runtime performance overhead is virtually
nil until the tracepoints are actually enabled.
config DEBUG_RODATA_TEST
bool "Testcase for the marking rodata read-only"
depends on STRICT_KERNEL_RWX
help
This option enables a testcase for the setting rodata read-only.
config ARCH_HAS_DEBUG_WX
bool
config DEBUG_WX
bool "Warn on W+X mappings at boot"
depends on ARCH_HAS_DEBUG_WX
depends on MMU
select PTDUMP_CORE
help
Generate a warning if any W+X mappings are found at boot.
This is useful for discovering cases where the kernel is leaving W+X
mappings after applying NX, as such mappings are a security risk.
Look for a message in dmesg output like this:
<arch>/mm: Checked W+X mappings: passed, no W+X pages found.
or like this, if the check failed:
<arch>/mm: Checked W+X mappings: failed, <N> W+X pages found.
Note that even if the check fails, your kernel is possibly
still fine, as W+X mappings are not a security hole in
themselves, what they do is that they make the exploitation
of other unfixed kernel bugs easier.
There is no runtime or memory usage effect of this option
once the kernel has booted up - it's a one time check.
If in doubt, say "Y".
config GENERIC_PTDUMP
bool
config PTDUMP_CORE
bool
config PTDUMP_DEBUGFS
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
depends on DEBUG_FS
depends on GENERIC_PTDUMP
select PTDUMP_CORE
help
Say Y here if you want to show the kernel pagetable layout in a
debugfs file. This information is only useful for kernel developers
who are working in architecture specific areas of the kernel.
It is probably not a good idea to enable this feature in a production
kernel.
If in doubt, say N.