License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 23:07:57 +09:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2016-07-27 07:26:24 +09:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/sched.h>
|
2017-02-09 02:51:29 +09:00
|
|
|
#include <linux/sched/mm.h>
|
2017-02-09 02:51:30 +09:00
|
|
|
#include <linux/sched/coredump.h>
|
2016-07-27 07:26:24 +09:00
|
|
|
#include <linux/mmu_notifier.h>
|
|
|
|
#include <linux/rmap.h>
|
|
|
|
#include <linux/swap.h>
|
|
|
|
#include <linux/mm_inline.h>
|
|
|
|
#include <linux/kthread.h>
|
|
|
|
#include <linux/khugepaged.h>
|
|
|
|
#include <linux/freezer.h>
|
|
|
|
#include <linux/mman.h>
|
|
|
|
#include <linux/hashtable.h>
|
|
|
|
#include <linux/userfaultfd_k.h>
|
|
|
|
#include <linux/page_idle.h>
|
|
|
|
#include <linux/swapops.h>
|
2016-07-27 07:26:32 +09:00
|
|
|
#include <linux/shmem_fs.h>
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
#include <asm/tlb.h>
|
|
|
|
#include <asm/pgalloc.h>
|
|
|
|
#include "internal.h"
|
|
|
|
|
|
|
|
enum scan_result {
|
|
|
|
SCAN_FAIL,
|
|
|
|
SCAN_SUCCEED,
|
|
|
|
SCAN_PMD_NULL,
|
|
|
|
SCAN_EXCEED_NONE_PTE,
|
2020-06-04 08:00:30 +09:00
|
|
|
SCAN_EXCEED_SWAP_PTE,
|
|
|
|
SCAN_EXCEED_SHARED_PTE,
|
2016-07-27 07:26:24 +09:00
|
|
|
SCAN_PTE_NON_PRESENT,
|
2020-04-07 12:06:04 +09:00
|
|
|
SCAN_PTE_UFFD_WP,
|
2016-07-27 07:26:24 +09:00
|
|
|
SCAN_PAGE_RO,
|
2016-07-27 07:26:46 +09:00
|
|
|
SCAN_LACK_REFERENCED_PAGE,
|
2016-07-27 07:26:24 +09:00
|
|
|
SCAN_PAGE_NULL,
|
|
|
|
SCAN_SCAN_ABORT,
|
|
|
|
SCAN_PAGE_COUNT,
|
|
|
|
SCAN_PAGE_LRU,
|
|
|
|
SCAN_PAGE_LOCK,
|
|
|
|
SCAN_PAGE_ANON,
|
|
|
|
SCAN_PAGE_COMPOUND,
|
|
|
|
SCAN_ANY_PROCESS,
|
|
|
|
SCAN_VMA_NULL,
|
|
|
|
SCAN_VMA_CHECK,
|
|
|
|
SCAN_ADDRESS_RANGE,
|
|
|
|
SCAN_SWAP_CACHE_PAGE,
|
|
|
|
SCAN_DEL_PAGE_LRU,
|
|
|
|
SCAN_ALLOC_HUGE_PAGE_FAIL,
|
|
|
|
SCAN_CGROUP_CHARGE_FAIL,
|
2016-07-27 07:26:32 +09:00
|
|
|
SCAN_TRUNCATED,
|
2019-09-24 07:38:00 +09:00
|
|
|
SCAN_PAGE_HAS_PRIVATE,
|
2016-07-27 07:26:24 +09:00
|
|
|
};
|
|
|
|
|
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include <trace/events/huge_memory.h>
|
|
|
|
|
2020-10-11 15:16:40 +09:00
|
|
|
static struct task_struct *khugepaged_thread __read_mostly;
|
|
|
|
static DEFINE_MUTEX(khugepaged_mutex);
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
/* default scan 8*512 pte (or vmas) every 30 second */
|
|
|
|
static unsigned int khugepaged_pages_to_scan __read_mostly;
|
|
|
|
static unsigned int khugepaged_pages_collapsed;
|
|
|
|
static unsigned int khugepaged_full_scans;
|
|
|
|
static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
|
|
|
|
/* during fragmentation poll the hugepage allocator once every minute */
|
|
|
|
static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
|
|
|
|
static unsigned long khugepaged_sleep_expire;
|
|
|
|
static DEFINE_SPINLOCK(khugepaged_mm_lock);
|
|
|
|
static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
|
|
|
|
/*
|
|
|
|
* default collapse hugepages if there is at least one pte mapped like
|
|
|
|
* it would have happened if the vma was large enough during page
|
|
|
|
* fault.
|
|
|
|
*/
|
|
|
|
static unsigned int khugepaged_max_ptes_none __read_mostly;
|
|
|
|
static unsigned int khugepaged_max_ptes_swap __read_mostly;
|
2020-06-04 08:00:30 +09:00
|
|
|
static unsigned int khugepaged_max_ptes_shared __read_mostly;
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
#define MM_SLOTS_HASH_BITS 10
|
|
|
|
static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
|
|
|
|
|
|
|
|
static struct kmem_cache *mm_slot_cache __read_mostly;
|
|
|
|
|
2019-09-24 07:38:30 +09:00
|
|
|
#define MAX_PTE_MAPPED_THP 8
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
/**
|
|
|
|
* struct mm_slot - hash lookup from mm to mm_slot
|
|
|
|
* @hash: hash collision list
|
|
|
|
* @mm_node: khugepaged scan list headed in khugepaged_scan.mm_head
|
|
|
|
* @mm: the mm that this information is valid for
|
|
|
|
*/
|
|
|
|
struct mm_slot {
|
|
|
|
struct hlist_node hash;
|
|
|
|
struct list_head mm_node;
|
|
|
|
struct mm_struct *mm;
|
2019-09-24 07:38:30 +09:00
|
|
|
|
|
|
|
/* pte-mapped THP in this mm */
|
|
|
|
int nr_pte_mapped_thp;
|
|
|
|
unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP];
|
2016-07-27 07:26:24 +09:00
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct khugepaged_scan - cursor for scanning
|
|
|
|
* @mm_head: the head of the mm list to scan
|
|
|
|
* @mm_slot: the current mm_slot we are scanning
|
|
|
|
* @address: the next address inside that to be scanned
|
|
|
|
*
|
|
|
|
* There is only the one khugepaged_scan instance of this cursor structure.
|
|
|
|
*/
|
|
|
|
struct khugepaged_scan {
|
|
|
|
struct list_head mm_head;
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
unsigned long address;
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct khugepaged_scan khugepaged_scan = {
|
|
|
|
.mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
|
|
|
|
};
|
|
|
|
|
2016-12-01 08:54:02 +09:00
|
|
|
#ifdef CONFIG_SYSFS
|
2016-07-27 07:26:24 +09:00
|
|
|
static ssize_t scan_sleep_millisecs_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_scan_sleep_millisecs);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
unsigned long msecs;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &msecs);
|
|
|
|
if (err || msecs > UINT_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
khugepaged_scan_sleep_millisecs = msecs;
|
|
|
|
khugepaged_sleep_expire = 0;
|
|
|
|
wake_up_interruptible(&khugepaged_wait);
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
static struct kobj_attribute scan_sleep_millisecs_attr =
|
|
|
|
__ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
|
|
|
|
scan_sleep_millisecs_store);
|
|
|
|
|
|
|
|
static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_alloc_sleep_millisecs);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
unsigned long msecs;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &msecs);
|
|
|
|
if (err || msecs > UINT_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
khugepaged_alloc_sleep_millisecs = msecs;
|
|
|
|
khugepaged_sleep_expire = 0;
|
|
|
|
wake_up_interruptible(&khugepaged_wait);
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
static struct kobj_attribute alloc_sleep_millisecs_attr =
|
|
|
|
__ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
|
|
|
|
alloc_sleep_millisecs_store);
|
|
|
|
|
|
|
|
static ssize_t pages_to_scan_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_pages_to_scan);
|
|
|
|
}
|
|
|
|
static ssize_t pages_to_scan_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
unsigned long pages;
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &pages);
|
|
|
|
if (err || !pages || pages > UINT_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
khugepaged_pages_to_scan = pages;
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
static struct kobj_attribute pages_to_scan_attr =
|
|
|
|
__ATTR(pages_to_scan, 0644, pages_to_scan_show,
|
|
|
|
pages_to_scan_store);
|
|
|
|
|
|
|
|
static ssize_t pages_collapsed_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_pages_collapsed);
|
|
|
|
}
|
|
|
|
static struct kobj_attribute pages_collapsed_attr =
|
|
|
|
__ATTR_RO(pages_collapsed);
|
|
|
|
|
|
|
|
static ssize_t full_scans_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_full_scans);
|
|
|
|
}
|
|
|
|
static struct kobj_attribute full_scans_attr =
|
|
|
|
__ATTR_RO(full_scans);
|
|
|
|
|
|
|
|
static ssize_t khugepaged_defrag_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
return single_hugepage_flag_show(kobj, attr, buf,
|
|
|
|
TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
|
|
|
|
}
|
|
|
|
static ssize_t khugepaged_defrag_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
return single_hugepage_flag_store(kobj, attr, buf, count,
|
|
|
|
TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
|
|
|
|
}
|
|
|
|
static struct kobj_attribute khugepaged_defrag_attr =
|
|
|
|
__ATTR(defrag, 0644, khugepaged_defrag_show,
|
|
|
|
khugepaged_defrag_store);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* max_ptes_none controls if khugepaged should collapse hugepages over
|
|
|
|
* any unmapped ptes in turn potentially increasing the memory
|
|
|
|
* footprint of the vmas. When max_ptes_none is 0 khugepaged will not
|
|
|
|
* reduce the available free memory in the system as it
|
|
|
|
* runs. Increasing max_ptes_none will instead potentially reduce the
|
|
|
|
* free memory in the system during the khugepaged scan.
|
|
|
|
*/
|
|
|
|
static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_max_ptes_none);
|
|
|
|
}
|
|
|
|
static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
unsigned long max_ptes_none;
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &max_ptes_none);
|
|
|
|
if (err || max_ptes_none > HPAGE_PMD_NR-1)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
khugepaged_max_ptes_none = max_ptes_none;
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
static struct kobj_attribute khugepaged_max_ptes_none_attr =
|
|
|
|
__ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
|
|
|
|
khugepaged_max_ptes_none_store);
|
|
|
|
|
|
|
|
static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_max_ptes_swap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
unsigned long max_ptes_swap;
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &max_ptes_swap);
|
|
|
|
if (err || max_ptes_swap > HPAGE_PMD_NR-1)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
khugepaged_max_ptes_swap = max_ptes_swap;
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kobj_attribute khugepaged_max_ptes_swap_attr =
|
|
|
|
__ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
|
|
|
|
khugepaged_max_ptes_swap_store);
|
|
|
|
|
2020-06-04 08:00:30 +09:00
|
|
|
static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", khugepaged_max_ptes_shared);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
unsigned long max_ptes_shared;
|
|
|
|
|
|
|
|
err = kstrtoul(buf, 10, &max_ptes_shared);
|
|
|
|
if (err || max_ptes_shared > HPAGE_PMD_NR-1)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
khugepaged_max_ptes_shared = max_ptes_shared;
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kobj_attribute khugepaged_max_ptes_shared_attr =
|
|
|
|
__ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
|
|
|
|
khugepaged_max_ptes_shared_store);
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
static struct attribute *khugepaged_attr[] = {
|
|
|
|
&khugepaged_defrag_attr.attr,
|
|
|
|
&khugepaged_max_ptes_none_attr.attr,
|
2020-06-04 08:00:30 +09:00
|
|
|
&khugepaged_max_ptes_swap_attr.attr,
|
|
|
|
&khugepaged_max_ptes_shared_attr.attr,
|
2016-07-27 07:26:24 +09:00
|
|
|
&pages_to_scan_attr.attr,
|
|
|
|
&pages_collapsed_attr.attr,
|
|
|
|
&full_scans_attr.attr,
|
|
|
|
&scan_sleep_millisecs_attr.attr,
|
|
|
|
&alloc_sleep_millisecs_attr.attr,
|
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct attribute_group khugepaged_attr_group = {
|
|
|
|
.attrs = khugepaged_attr,
|
|
|
|
.name = "khugepaged",
|
|
|
|
};
|
2016-12-01 08:54:02 +09:00
|
|
|
#endif /* CONFIG_SYSFS */
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
int hugepage_madvise(struct vm_area_struct *vma,
|
|
|
|
unsigned long *vm_flags, int advice)
|
|
|
|
{
|
|
|
|
switch (advice) {
|
|
|
|
case MADV_HUGEPAGE:
|
|
|
|
#ifdef CONFIG_S390
|
|
|
|
/*
|
|
|
|
* qemu blindly sets MADV_HUGEPAGE on all allocations, but s390
|
|
|
|
* can't handle this properly after s390_enable_sie, so we simply
|
|
|
|
* ignore the madvise to prevent qemu from causing a SIGSEGV.
|
|
|
|
*/
|
|
|
|
if (mm_has_pgste(vma->vm_mm))
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
*vm_flags &= ~VM_NOHUGEPAGE;
|
|
|
|
*vm_flags |= VM_HUGEPAGE;
|
|
|
|
/*
|
|
|
|
* If the vma become good for khugepaged to scan,
|
|
|
|
* register it here without waiting a page fault that
|
|
|
|
* may not happen any time soon.
|
|
|
|
*/
|
|
|
|
if (!(*vm_flags & VM_NO_KHUGEPAGED) &&
|
|
|
|
khugepaged_enter_vma_merge(vma, *vm_flags))
|
|
|
|
return -ENOMEM;
|
|
|
|
break;
|
|
|
|
case MADV_NOHUGEPAGE:
|
|
|
|
*vm_flags &= ~VM_HUGEPAGE;
|
|
|
|
*vm_flags |= VM_NOHUGEPAGE;
|
|
|
|
/*
|
|
|
|
* Setting VM_NOHUGEPAGE will prevent khugepaged from scanning
|
|
|
|
* this vma even if we leave the mm registered in khugepaged if
|
|
|
|
* it got registered before VM_NOHUGEPAGE was set.
|
|
|
|
*/
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int __init khugepaged_init(void)
|
|
|
|
{
|
|
|
|
mm_slot_cache = kmem_cache_create("khugepaged_mm_slot",
|
|
|
|
sizeof(struct mm_slot),
|
|
|
|
__alignof__(struct mm_slot), 0, NULL);
|
|
|
|
if (!mm_slot_cache)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
khugepaged_pages_to_scan = HPAGE_PMD_NR * 8;
|
|
|
|
khugepaged_max_ptes_none = HPAGE_PMD_NR - 1;
|
|
|
|
khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8;
|
2020-06-04 08:00:30 +09:00
|
|
|
khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2;
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void __init khugepaged_destroy(void)
|
|
|
|
{
|
|
|
|
kmem_cache_destroy(mm_slot_cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct mm_slot *alloc_mm_slot(void)
|
|
|
|
{
|
|
|
|
if (!mm_slot_cache) /* initialization failed */
|
|
|
|
return NULL;
|
|
|
|
return kmem_cache_zalloc(mm_slot_cache, GFP_KERNEL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void free_mm_slot(struct mm_slot *mm_slot)
|
|
|
|
{
|
|
|
|
kmem_cache_free(mm_slot_cache, mm_slot);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct mm_slot *get_mm_slot(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
|
|
|
|
hash_for_each_possible(mm_slots_hash, mm_slot, hash, (unsigned long)mm)
|
|
|
|
if (mm == mm_slot->mm)
|
|
|
|
return mm_slot;
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void insert_to_mm_slots_hash(struct mm_struct *mm,
|
|
|
|
struct mm_slot *mm_slot)
|
|
|
|
{
|
|
|
|
mm_slot->mm = mm;
|
|
|
|
hash_add(mm_slots_hash, &mm_slot->hash, (long)mm);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int khugepaged_test_exit(struct mm_struct *mm)
|
|
|
|
{
|
2020-10-16 12:13:00 +09:00
|
|
|
return atomic_read(&mm->mm_users) == 0;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
|
mm: thp: pass correct vm_flags to hugepage_vma_check()
khugepaged_enter_vma_merge() passes a stale vma->vm_flags to
hugepage_vma_check(). The argument vm_flags contains the latest value.
Therefore, it is necessary to pass this vm_flags into
hugepage_vma_check().
With this bug, madvise(MADV_HUGEPAGE) for mmap files in shmem fails to
put memory in huge pages. Here is an example of failed madvise():
/* mount /dev/shm with huge=advise:
* mount -o remount,huge=advise /dev/shm */
/* create file /dev/shm/huge */
#define HUGE_FILE "/dev/shm/huge"
fd = open(HUGE_FILE, O_RDONLY);
ptr = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
ret = madvise(ptr, FILE_SIZE, MADV_HUGEPAGE);
madvise() will return 0, but this memory region is never put in huge
page (check from /proc/meminfo: ShmemHugePages).
Link: http://lkml.kernel.org/r/20180629181752.792831-1-songliubraving@fb.com
Fixes: 02b75dc8160d ("mm: thp: register mm for khugepaged when merging vma for shmem")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 07:47:00 +09:00
|
|
|
static bool hugepage_vma_check(struct vm_area_struct *vma,
|
|
|
|
unsigned long vm_flags)
|
2018-08-18 07:45:26 +09:00
|
|
|
{
|
2021-07-01 10:47:50 +09:00
|
|
|
if (!transhuge_vma_enabled(vma, vm_flags))
|
2018-08-18 07:45:26 +09:00
|
|
|
return false;
|
2019-09-24 07:38:00 +09:00
|
|
|
|
2021-10-29 06:36:30 +09:00
|
|
|
if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
|
|
|
|
vma->vm_pgoff, HPAGE_PMD_NR))
|
|
|
|
return false;
|
|
|
|
|
2021-02-26 10:16:25 +09:00
|
|
|
/* Enabled via shmem mount options or sysfs settings. */
|
2021-10-29 06:36:30 +09:00
|
|
|
if (shmem_file(vma->vm_file))
|
|
|
|
return shmem_huge_enabled(vma);
|
2021-02-26 10:16:25 +09:00
|
|
|
|
|
|
|
/* THP settings require madvise. */
|
|
|
|
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
|
|
|
|
return false;
|
|
|
|
|
2021-10-29 06:36:30 +09:00
|
|
|
/* Only regular file is valid */
|
2021-02-26 10:16:25 +09:00
|
|
|
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
|
FROMLIST: mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
Transparent huge pages are supported for read-only non-shmem files,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY). Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().
Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.
This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.
Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.
The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.
Comparison of the performance characteristics of 4KB and 2MB-backed
libraries follows; the Android dex2oat tool was used to AOT compile an
example application on a single ARM core.
4KB Pages:
==========
count event_name # count / runtime
598,995,035,942 cpu-cycles # 1.800861 GHz
81,195,620,851 raw-stall-frontend # 244.112 M/sec
347,754,466,597 iTLB-loads # 1.046 G/sec
2,970,248,900 iTLB-load-misses # 0.854122% miss rate
Total test time: 332.854998 seconds.
2MB Pages:
==========
count event_name # count / runtime
592,872,663,047 cpu-cycles # 1.800358 GHz
76,485,624,143 raw-stall-frontend # 232.261 M/sec
350,478,413,710 iTLB-loads # 1.064 G/sec
803,233,322 iTLB-load-misses # 0.229182% miss rate
Total test time: 329.826087 seconds
A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:
/apex/com.android.art/lib64/libart.so
FilePmdMapped: 4096 kB
/apex/com.android.art/lib64/libart-compiler.so
FilePmdMapped: 2048 kB
Bug: 158135888
Link: https://lore.kernel.org/patchwork/patch/1408266/
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
Change-Id: I75c693a4b4e7526d374ef2c010bde3094233eef2
2021-03-24 08:29:26 +09:00
|
|
|
!inode_is_open_for_write(vma->vm_file->f_inode) &&
|
|
|
|
(vm_flags & VM_EXEC)) {
|
2021-10-29 06:36:30 +09:00
|
|
|
struct inode *inode = vma->vm_file->f_inode;
|
|
|
|
|
|
|
|
return S_ISREG(inode->i_mode);
|
2021-02-26 10:16:25 +09:00
|
|
|
}
|
|
|
|
|
2018-08-18 07:45:26 +09:00
|
|
|
if (!vma->anon_vma || vma->vm_ops)
|
|
|
|
return false;
|
2020-04-02 13:07:52 +09:00
|
|
|
if (vma_is_temporary_stack(vma))
|
2018-08-18 07:45:26 +09:00
|
|
|
return false;
|
mm: thp: pass correct vm_flags to hugepage_vma_check()
khugepaged_enter_vma_merge() passes a stale vma->vm_flags to
hugepage_vma_check(). The argument vm_flags contains the latest value.
Therefore, it is necessary to pass this vm_flags into
hugepage_vma_check().
With this bug, madvise(MADV_HUGEPAGE) for mmap files in shmem fails to
put memory in huge pages. Here is an example of failed madvise():
/* mount /dev/shm with huge=advise:
* mount -o remount,huge=advise /dev/shm */
/* create file /dev/shm/huge */
#define HUGE_FILE "/dev/shm/huge"
fd = open(HUGE_FILE, O_RDONLY);
ptr = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
ret = madvise(ptr, FILE_SIZE, MADV_HUGEPAGE);
madvise() will return 0, but this memory region is never put in huge
page (check from /proc/meminfo: ShmemHugePages).
Link: http://lkml.kernel.org/r/20180629181752.792831-1-songliubraving@fb.com
Fixes: 02b75dc8160d ("mm: thp: register mm for khugepaged when merging vma for shmem")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 07:47:00 +09:00
|
|
|
return !(vm_flags & VM_NO_KHUGEPAGED);
|
2018-08-18 07:45:26 +09:00
|
|
|
}
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
int __khugepaged_enter(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
int wakeup;
|
|
|
|
|
|
|
|
mm_slot = alloc_mm_slot();
|
|
|
|
if (!mm_slot)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* __khugepaged_exit() must not run from under us */
|
2020-08-21 09:42:02 +09:00
|
|
|
VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
|
|
|
|
free_mm_slot(mm_slot);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&khugepaged_mm_lock);
|
|
|
|
insert_to_mm_slots_hash(mm, mm_slot);
|
|
|
|
/*
|
|
|
|
* Insert just behind the scanning cursor, to let the area settle
|
|
|
|
* down a little.
|
|
|
|
*/
|
|
|
|
wakeup = list_empty(&khugepaged_scan.mm_head);
|
|
|
|
list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
|
|
|
|
spin_unlock(&khugepaged_mm_lock);
|
|
|
|
|
2017-02-28 07:30:07 +09:00
|
|
|
mmgrab(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
if (wakeup)
|
|
|
|
wake_up_interruptible(&khugepaged_wait);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
|
|
|
|
unsigned long vm_flags)
|
|
|
|
{
|
|
|
|
unsigned long hstart, hend;
|
2018-08-18 07:45:26 +09:00
|
|
|
|
|
|
|
/*
|
2019-09-24 07:38:00 +09:00
|
|
|
* khugepaged only supports read-only files for non-shmem files.
|
|
|
|
* khugepaged does not yet work on special mappings. And
|
|
|
|
* file-private shmem THP is not supported.
|
2018-08-18 07:45:26 +09:00
|
|
|
*/
|
mm: thp: pass correct vm_flags to hugepage_vma_check()
khugepaged_enter_vma_merge() passes a stale vma->vm_flags to
hugepage_vma_check(). The argument vm_flags contains the latest value.
Therefore, it is necessary to pass this vm_flags into
hugepage_vma_check().
With this bug, madvise(MADV_HUGEPAGE) for mmap files in shmem fails to
put memory in huge pages. Here is an example of failed madvise():
/* mount /dev/shm with huge=advise:
* mount -o remount,huge=advise /dev/shm */
/* create file /dev/shm/huge */
#define HUGE_FILE "/dev/shm/huge"
fd = open(HUGE_FILE, O_RDONLY);
ptr = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
ret = madvise(ptr, FILE_SIZE, MADV_HUGEPAGE);
madvise() will return 0, but this memory region is never put in huge
page (check from /proc/meminfo: ShmemHugePages).
Link: http://lkml.kernel.org/r/20180629181752.792831-1-songliubraving@fb.com
Fixes: 02b75dc8160d ("mm: thp: register mm for khugepaged when merging vma for shmem")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 07:47:00 +09:00
|
|
|
if (!hugepage_vma_check(vma, vm_flags))
|
2016-07-27 07:26:24 +09:00
|
|
|
return 0;
|
2018-08-18 07:45:26 +09:00
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
|
|
|
|
hend = vma->vm_end & HPAGE_PMD_MASK;
|
|
|
|
if (hstart < hend)
|
|
|
|
return khugepaged_enter(vma, vm_flags);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void __khugepaged_exit(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
int free = 0;
|
|
|
|
|
|
|
|
spin_lock(&khugepaged_mm_lock);
|
|
|
|
mm_slot = get_mm_slot(mm);
|
|
|
|
if (mm_slot && khugepaged_scan.mm_slot != mm_slot) {
|
|
|
|
hash_del(&mm_slot->hash);
|
|
|
|
list_del(&mm_slot->mm_node);
|
|
|
|
free = 1;
|
|
|
|
}
|
|
|
|
spin_unlock(&khugepaged_mm_lock);
|
|
|
|
|
|
|
|
if (free) {
|
|
|
|
clear_bit(MMF_VM_HUGEPAGE, &mm->flags);
|
|
|
|
free_mm_slot(mm_slot);
|
|
|
|
mmdrop(mm);
|
|
|
|
} else if (mm_slot) {
|
|
|
|
/*
|
|
|
|
* This is required to serialize against
|
|
|
|
* khugepaged_test_exit() (which is guaranteed to run
|
|
|
|
* under mmap sem read mode). Stop here (after we
|
|
|
|
* return all pagetables will be destroyed) until
|
|
|
|
* khugepaged has finished working on the pagetables
|
2020-06-09 13:33:54 +09:00
|
|
|
* under the mmap_lock.
|
2016-07-27 07:26:24 +09:00
|
|
|
*/
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_write_lock(mm);
|
|
|
|
mmap_write_unlock(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_pte_page(struct page *page)
|
|
|
|
{
|
2020-06-04 08:00:23 +09:00
|
|
|
mod_node_page_state(page_pgdat(page),
|
|
|
|
NR_ISOLATED_ANON + page_is_file_lru(page),
|
|
|
|
-compound_nr(page));
|
2016-07-27 07:26:24 +09:00
|
|
|
unlock_page(page);
|
|
|
|
putback_lru_page(page);
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:00:23 +09:00
|
|
|
static void release_pte_pages(pte_t *pte, pte_t *_pte,
|
|
|
|
struct list_head *compound_pagelist)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
2020-06-04 08:00:23 +09:00
|
|
|
struct page *page, *tmp;
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
while (--_pte >= pte) {
|
|
|
|
pte_t pteval = *_pte;
|
2020-06-04 08:00:23 +09:00
|
|
|
|
|
|
|
page = pte_page(pteval);
|
|
|
|
if (!pte_none(pteval) && !is_zero_pfn(pte_pfn(pteval)) &&
|
|
|
|
!PageCompound(page))
|
|
|
|
release_pte_page(page);
|
|
|
|
}
|
|
|
|
|
|
|
|
list_for_each_entry_safe(page, tmp, compound_pagelist, lru) {
|
|
|
|
list_del(&page->lru);
|
|
|
|
release_pte_page(page);
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:00:20 +09:00
|
|
|
static bool is_refcount_suitable(struct page *page)
|
|
|
|
{
|
|
|
|
int expected_refcount;
|
|
|
|
|
|
|
|
expected_refcount = total_mapcount(page);
|
|
|
|
if (PageSwapCache(page))
|
|
|
|
expected_refcount += compound_nr(page);
|
|
|
|
|
|
|
|
return page_count(page) == expected_refcount;
|
|
|
|
}
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
|
|
|
|
unsigned long address,
|
2020-06-04 08:00:23 +09:00
|
|
|
pte_t *pte,
|
|
|
|
struct list_head *compound_pagelist)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
|
|
|
struct page *page = NULL;
|
|
|
|
pte_t *_pte;
|
2020-06-04 08:00:30 +09:00
|
|
|
int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
|
2016-07-27 07:26:46 +09:00
|
|
|
bool writable = false;
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
|
|
|
|
_pte++, address += PAGE_SIZE) {
|
|
|
|
pte_t pteval = *_pte;
|
|
|
|
if (pte_none(pteval) || (pte_present(pteval) &&
|
|
|
|
is_zero_pfn(pte_pfn(pteval)))) {
|
|
|
|
if (!userfaultfd_armed(vma) &&
|
|
|
|
++none_or_zero <= khugepaged_max_ptes_none) {
|
|
|
|
continue;
|
|
|
|
} else {
|
|
|
|
result = SCAN_EXCEED_NONE_PTE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!pte_present(pteval)) {
|
|
|
|
result = SCAN_PTE_NON_PRESENT;
|
|
|
|
goto out;
|
|
|
|
}
|
2023-04-06 00:51:20 +09:00
|
|
|
if (pte_uffd_wp(pteval)) {
|
|
|
|
result = SCAN_PTE_UFFD_WP;
|
|
|
|
goto out;
|
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
page = vm_normal_page(vma, address, pteval);
|
|
|
|
if (unlikely(!page)) {
|
|
|
|
result = SCAN_PAGE_NULL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:00:23 +09:00
|
|
|
VM_BUG_ON_PAGE(!PageAnon(page), page);
|
|
|
|
|
2020-06-04 08:00:30 +09:00
|
|
|
if (page_mapcount(page) > 1 &&
|
|
|
|
++shared > khugepaged_max_ptes_shared) {
|
|
|
|
result = SCAN_EXCEED_SHARED_PTE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2018-03-23 08:17:28 +09:00
|
|
|
if (PageCompound(page)) {
|
2020-06-04 08:00:23 +09:00
|
|
|
struct page *p;
|
|
|
|
page = compound_head(page);
|
2018-03-23 08:17:28 +09:00
|
|
|
|
2020-06-04 08:00:23 +09:00
|
|
|
/*
|
|
|
|
* Check if we have dealt with the compound page
|
|
|
|
* already
|
|
|
|
*/
|
|
|
|
list_for_each_entry(p, compound_pagelist, lru) {
|
|
|
|
if (page == p)
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We can do it before isolate_lru_page because the
|
|
|
|
* page can't be freed from under us. NOTE: PG_lock
|
|
|
|
* is needed to serialize against split_huge_page
|
|
|
|
* when invoked from the VM.
|
|
|
|
*/
|
|
|
|
if (!trylock_page(page)) {
|
|
|
|
result = SCAN_PAGE_LOCK;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-06-04 08:00:20 +09:00
|
|
|
* Check if the page has any GUP (or other external) pins.
|
|
|
|
*
|
|
|
|
* The page table that maps the page has been already unlinked
|
|
|
|
* from the page table tree and this process cannot get
|
|
|
|
* an additinal pin on the page.
|
|
|
|
*
|
|
|
|
* New pins can come later if the page is shared across fork,
|
|
|
|
* but not from this process. The other process cannot write to
|
|
|
|
* the page, only trigger CoW.
|
2016-07-27 07:26:24 +09:00
|
|
|
*/
|
2020-06-04 08:00:20 +09:00
|
|
|
if (!is_refcount_suitable(page)) {
|
2016-07-27 07:26:24 +09:00
|
|
|
unlock_page(page);
|
|
|
|
result = SCAN_PAGE_COUNT;
|
|
|
|
goto out;
|
|
|
|
}
|
2020-06-04 08:00:23 +09:00
|
|
|
if (!pte_write(pteval) && PageSwapCache(page) &&
|
|
|
|
!reuse_swap_page(page, NULL)) {
|
2016-07-27 07:26:24 +09:00
|
|
|
/*
|
2020-06-04 08:00:23 +09:00
|
|
|
* Page is in the swap cache and cannot be re-used.
|
|
|
|
* It cannot be collapsed into a THP.
|
2016-07-27 07:26:24 +09:00
|
|
|
*/
|
2020-06-04 08:00:23 +09:00
|
|
|
unlock_page(page);
|
|
|
|
result = SCAN_SWAP_CACHE_PAGE;
|
|
|
|
goto out;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Isolate the page to avoid collapsing an hugepage
|
|
|
|
* currently in use by the VM.
|
|
|
|
*/
|
|
|
|
if (isolate_lru_page(page)) {
|
|
|
|
unlock_page(page);
|
|
|
|
result = SCAN_DEL_PAGE_LRU;
|
|
|
|
goto out;
|
|
|
|
}
|
2020-06-04 08:00:23 +09:00
|
|
|
mod_node_page_state(page_pgdat(page),
|
|
|
|
NR_ISOLATED_ANON + page_is_file_lru(page),
|
|
|
|
compound_nr(page));
|
2016-07-27 07:26:24 +09:00
|
|
|
VM_BUG_ON_PAGE(!PageLocked(page), page);
|
|
|
|
VM_BUG_ON_PAGE(PageLRU(page), page);
|
|
|
|
|
2020-06-04 08:00:23 +09:00
|
|
|
if (PageCompound(page))
|
|
|
|
list_add_tail(&page->lru, compound_pagelist);
|
|
|
|
next:
|
2016-07-27 07:26:46 +09:00
|
|
|
/* There should be enough young pte to collapse the page */
|
2016-07-27 07:26:24 +09:00
|
|
|
if (pte_young(pteval) ||
|
|
|
|
page_is_young(page) || PageReferenced(page) ||
|
|
|
|
mmu_notifier_test_young(vma->vm_mm, address))
|
2016-07-27 07:26:46 +09:00
|
|
|
referenced++;
|
2020-06-04 08:00:23 +09:00
|
|
|
|
|
|
|
if (pte_write(pteval))
|
|
|
|
writable = true;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
2021-05-05 10:33:46 +09:00
|
|
|
|
|
|
|
if (unlikely(!writable)) {
|
2016-07-27 07:26:24 +09:00
|
|
|
result = SCAN_PAGE_RO;
|
2021-05-05 10:33:46 +09:00
|
|
|
} else if (unlikely(!referenced)) {
|
|
|
|
result = SCAN_LACK_REFERENCED_PAGE;
|
|
|
|
} else {
|
|
|
|
result = SCAN_SUCCEED;
|
|
|
|
trace_mm_collapse_huge_page_isolate(page, none_or_zero,
|
|
|
|
referenced, writable, result);
|
|
|
|
return 1;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
out:
|
2020-06-04 08:00:23 +09:00
|
|
|
release_pte_pages(pte, _pte, compound_pagelist);
|
2016-07-27 07:26:24 +09:00
|
|
|
trace_mm_collapse_huge_page_isolate(page, none_or_zero,
|
|
|
|
referenced, writable, result);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
|
|
|
|
struct vm_area_struct *vma,
|
|
|
|
unsigned long address,
|
2020-06-04 08:00:23 +09:00
|
|
|
spinlock_t *ptl,
|
|
|
|
struct list_head *compound_pagelist)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
2020-06-04 08:00:23 +09:00
|
|
|
struct page *src_page, *tmp;
|
2016-07-27 07:26:24 +09:00
|
|
|
pte_t *_pte;
|
2017-05-13 07:47:03 +09:00
|
|
|
for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
|
|
|
|
_pte++, page++, address += PAGE_SIZE) {
|
2016-07-27 07:26:24 +09:00
|
|
|
pte_t pteval = *_pte;
|
|
|
|
|
|
|
|
if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
|
|
|
|
clear_user_highpage(page, address);
|
|
|
|
add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
|
|
|
|
if (is_zero_pfn(pte_pfn(pteval))) {
|
|
|
|
/*
|
|
|
|
* ptl mostly unnecessary.
|
|
|
|
*/
|
|
|
|
spin_lock(ptl);
|
|
|
|
/*
|
|
|
|
* paravirt calls inside pte_clear here are
|
|
|
|
* superfluous.
|
|
|
|
*/
|
|
|
|
pte_clear(vma->vm_mm, address, _pte);
|
|
|
|
spin_unlock(ptl);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
src_page = pte_page(pteval);
|
|
|
|
copy_user_highpage(page, src_page, address, vma);
|
2020-06-04 08:00:23 +09:00
|
|
|
if (!PageCompound(src_page))
|
|
|
|
release_pte_page(src_page);
|
2016-07-27 07:26:24 +09:00
|
|
|
/*
|
|
|
|
* ptl mostly unnecessary, but preempt has to
|
|
|
|
* be disabled to update the per-cpu stats
|
|
|
|
* inside page_remove_rmap().
|
|
|
|
*/
|
|
|
|
spin_lock(ptl);
|
|
|
|
/*
|
|
|
|
* paravirt calls inside pte_clear here are
|
|
|
|
* superfluous.
|
|
|
|
*/
|
|
|
|
pte_clear(vma->vm_mm, address, _pte);
|
|
|
|
page_remove_rmap(src_page, false);
|
|
|
|
spin_unlock(ptl);
|
|
|
|
free_page_and_swap_cache(src_page);
|
|
|
|
}
|
|
|
|
}
|
2020-06-04 08:00:23 +09:00
|
|
|
|
|
|
|
list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
|
|
|
|
list_del(&src_page->lru);
|
|
|
|
release_pte_page(src_page);
|
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
static void khugepaged_alloc_sleep(void)
|
|
|
|
{
|
|
|
|
DEFINE_WAIT(wait);
|
|
|
|
|
|
|
|
add_wait_queue(&khugepaged_wait, &wait);
|
|
|
|
freezable_schedule_timeout_interruptible(
|
|
|
|
msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
|
|
|
|
remove_wait_queue(&khugepaged_wait, &wait);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int khugepaged_node_load[MAX_NUMNODES];
|
|
|
|
|
|
|
|
static bool khugepaged_scan_abort(int nid)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
2016-07-29 07:46:32 +09:00
|
|
|
* If node_reclaim_mode is disabled, then no extra effort is made to
|
2016-07-27 07:26:24 +09:00
|
|
|
* allocate memory locally.
|
|
|
|
*/
|
2016-07-29 07:46:32 +09:00
|
|
|
if (!node_reclaim_mode)
|
2016-07-27 07:26:24 +09:00
|
|
|
return false;
|
|
|
|
|
|
|
|
/* If there is a count for this node already, it must be acceptable */
|
|
|
|
if (khugepaged_node_load[nid])
|
|
|
|
return false;
|
|
|
|
|
|
|
|
for (i = 0; i < MAX_NUMNODES; i++) {
|
|
|
|
if (!khugepaged_node_load[i])
|
|
|
|
continue;
|
2019-08-09 04:53:01 +09:00
|
|
|
if (node_distance(nid, i) > node_reclaim_distance)
|
2016-07-27 07:26:24 +09:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
|
|
|
|
static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
|
|
|
|
{
|
mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
After the previous patch, we can distinguish costly allocations that
should be really lightweight, such as THP page faults, with
__GFP_NORETRY. This means we don't need to recognize khugepaged
allocations via PF_KTHREAD anymore. We can also change THP page faults
in areas where madvise(MADV_HUGEPAGE) was used to try as hard as
khugepaged, as the process has indicated that it benefits from THP's and
is willing to pay some initial latency costs.
We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding
__GFP_NORETRY or __GFP_KSWAPD_RECLAIM is done where needed.
The patch effectively changes the current GFP_TRANSHUGE users as
follows:
* get_huge_zero_page() - the zero page lifetime should be relatively
long and it's shared by multiple users, so it's worth spending some
effort on it. We use GFP_TRANSHUGE, and __GFP_NORETRY is not added.
This also restores direct reclaim to this allocation, which was
unintentionally removed by commit e4a49efe4e7e ("mm: thp: set THP defrag
by default to madvise and add a stall-free defrag option")
* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency
is not an issue. So if khugepaged "defrag" is enabled (the default), do
reclaim via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the
PF_KTHREAD check from page alloc.
As a side-effect, khugepaged will now no longer check if the initial
compaction was deferred or contended. This is OK, as khugepaged sleep
times between collapsion attempts are long enough to prevent noticeable
disruption, so we should allow it to spend some effort.
* migrate_misplaced_transhuge_page() - already was masking out
__GFP_RECLAIM, so just convert to GFP_TRANSHUGE_LIGHT which is
equivalent.
* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise)
are now allocating without __GFP_NORETRY. Other vma's keep using
__GFP_NORETRY if direct reclaim/compaction is at all allowed (by default
it's allowed only for madvised vma's). The rest is conversion to
GFP_TRANSHUGE(_LIGHT).
[mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT]
Link: http://lkml.kernel.org/r/20160721073614.24395-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-29 07:49:25 +09:00
|
|
|
return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef CONFIG_NUMA
|
|
|
|
static int khugepaged_find_target_node(void)
|
|
|
|
{
|
|
|
|
static int last_khugepaged_target_node = NUMA_NO_NODE;
|
|
|
|
int nid, target_node = 0, max_value = 0;
|
|
|
|
|
|
|
|
/* find first node with max normal pages hit */
|
|
|
|
for (nid = 0; nid < MAX_NUMNODES; nid++)
|
|
|
|
if (khugepaged_node_load[nid] > max_value) {
|
|
|
|
max_value = khugepaged_node_load[nid];
|
|
|
|
target_node = nid;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* do some balance if several nodes have the same hit record */
|
|
|
|
if (target_node <= last_khugepaged_target_node)
|
|
|
|
for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES;
|
|
|
|
nid++)
|
|
|
|
if (max_value == khugepaged_node_load[nid]) {
|
|
|
|
target_node = nid;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
last_khugepaged_target_node = target_node;
|
|
|
|
return target_node;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
|
|
|
|
{
|
|
|
|
if (IS_ERR(*hpage)) {
|
|
|
|
if (!*wait)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
*wait = false;
|
|
|
|
*hpage = NULL;
|
|
|
|
khugepaged_alloc_sleep();
|
|
|
|
} else if (*hpage) {
|
|
|
|
put_page(*hpage);
|
|
|
|
*hpage = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *
|
2016-07-27 07:26:26 +09:00
|
|
|
khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
|
|
|
VM_BUG_ON_PAGE(*hpage, *hpage);
|
|
|
|
|
|
|
|
*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
|
|
|
|
if (unlikely(!*hpage)) {
|
|
|
|
count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
|
|
|
|
*hpage = ERR_PTR(-ENOMEM);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
prep_transhuge_page(*hpage);
|
|
|
|
count_vm_event(THP_COLLAPSE_ALLOC);
|
|
|
|
return *hpage;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static int khugepaged_find_target_node(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct page *alloc_khugepaged_hugepage(void)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(),
|
|
|
|
HPAGE_PMD_ORDER);
|
|
|
|
if (page)
|
|
|
|
prep_transhuge_page(page);
|
|
|
|
return page;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *khugepaged_alloc_hugepage(bool *wait)
|
|
|
|
{
|
|
|
|
struct page *hpage;
|
|
|
|
|
|
|
|
do {
|
|
|
|
hpage = alloc_khugepaged_hugepage();
|
|
|
|
if (!hpage) {
|
|
|
|
count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
|
|
|
|
if (!*wait)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
*wait = false;
|
|
|
|
khugepaged_alloc_sleep();
|
|
|
|
} else
|
|
|
|
count_vm_event(THP_COLLAPSE_ALLOC);
|
|
|
|
} while (unlikely(!hpage) && likely(khugepaged_enabled()));
|
|
|
|
|
|
|
|
return hpage;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
|
|
|
|
{
|
mm/khugepaged: fix filemap page_to_pgoff(page) != offset
There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.
collapse_file(old start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=old offset
new_page->mapping = mapping
xas_store(&xas, new_page)
filemap_fault
page = find_get_page(mapping, offset)
// if offset falls inside hpage then
// compound_head(page) == hpage
lock_page_maybe_drop_mmap()
__lock_page(page)
// collapse fails
xas_store(&xas, old page)
new_page->mapping = NULL
unlock_page(new_page)
collapse_file(new start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=new offset
new_page->mapping = mapping // mapping becomes valid again
// since compound_head(page) == hpage
// page_to_pgoff(page) got changed
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above. But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-10 12:07:59 +09:00
|
|
|
/*
|
|
|
|
* If the hpage allocated earlier was briefly exposed in page cache
|
|
|
|
* before collapse_file() failed, it is possible that racing lookups
|
|
|
|
* have not yet completed, and would then be unpleasantly surprised by
|
|
|
|
* finding the hpage reused for the same mapping at a different offset.
|
|
|
|
* Just release the previous allocation if there is any danger of that.
|
|
|
|
*/
|
|
|
|
if (*hpage && page_count(*hpage) > 1) {
|
|
|
|
put_page(*hpage);
|
|
|
|
*hpage = NULL;
|
|
|
|
}
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
if (!*hpage)
|
|
|
|
*hpage = khugepaged_alloc_hugepage(wait);
|
|
|
|
|
|
|
|
if (unlikely(!*hpage))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *
|
2016-07-27 07:26:26 +09:00
|
|
|
khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
|
|
|
VM_BUG_ON(!*hpage);
|
|
|
|
|
|
|
|
return *hpage;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
2020-06-09 13:33:54 +09:00
|
|
|
* If mmap_lock temporarily dropped, revalidate vma
|
|
|
|
* before taking mmap_lock.
|
2016-07-27 07:26:24 +09:00
|
|
|
* Return 0 if succeeds, otherwise return none-zero
|
|
|
|
* value (scan code).
|
|
|
|
*/
|
|
|
|
|
2016-09-20 06:44:01 +09:00
|
|
|
static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
|
|
|
|
struct vm_area_struct **vmap)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
|
|
|
struct vm_area_struct *vma;
|
|
|
|
unsigned long hstart, hend;
|
|
|
|
|
|
|
|
if (unlikely(khugepaged_test_exit(mm)))
|
|
|
|
return SCAN_ANY_PROCESS;
|
|
|
|
|
2016-09-20 06:44:01 +09:00
|
|
|
*vmap = vma = find_vma(mm, address);
|
2016-07-27 07:26:24 +09:00
|
|
|
if (!vma)
|
|
|
|
return SCAN_VMA_NULL;
|
|
|
|
|
|
|
|
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
|
|
|
|
hend = vma->vm_end & HPAGE_PMD_MASK;
|
|
|
|
if (address < hstart || address + HPAGE_PMD_SIZE > hend)
|
|
|
|
return SCAN_ADDRESS_RANGE;
|
mm: thp: pass correct vm_flags to hugepage_vma_check()
khugepaged_enter_vma_merge() passes a stale vma->vm_flags to
hugepage_vma_check(). The argument vm_flags contains the latest value.
Therefore, it is necessary to pass this vm_flags into
hugepage_vma_check().
With this bug, madvise(MADV_HUGEPAGE) for mmap files in shmem fails to
put memory in huge pages. Here is an example of failed madvise():
/* mount /dev/shm with huge=advise:
* mount -o remount,huge=advise /dev/shm */
/* create file /dev/shm/huge */
#define HUGE_FILE "/dev/shm/huge"
fd = open(HUGE_FILE, O_RDONLY);
ptr = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
ret = madvise(ptr, FILE_SIZE, MADV_HUGEPAGE);
madvise() will return 0, but this memory region is never put in huge
page (check from /proc/meminfo: ShmemHugePages).
Link: http://lkml.kernel.org/r/20180629181752.792831-1-songliubraving@fb.com
Fixes: 02b75dc8160d ("mm: thp: register mm for khugepaged when merging vma for shmem")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 07:47:00 +09:00
|
|
|
if (!hugepage_vma_check(vma, vma->vm_flags))
|
2016-07-27 07:26:24 +09:00
|
|
|
return SCAN_VMA_CHECK;
|
2020-07-24 13:15:34 +09:00
|
|
|
/* Anon VMA expected */
|
|
|
|
if (!vma->anon_vma || vma->vm_ops)
|
|
|
|
return SCAN_VMA_CHECK;
|
2016-07-27 07:26:24 +09:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Bring missing pages in from swap, to complete THP collapse.
|
|
|
|
* Only done if khugepaged_scan_pmd believes it is worthwhile.
|
|
|
|
*
|
|
|
|
* Called and returns without pte mapped or spinlocks held,
|
2020-06-09 13:33:54 +09:00
|
|
|
* but with mmap_lock held to protect against vma changes.
|
2016-07-27 07:26:24 +09:00
|
|
|
*/
|
|
|
|
|
|
|
|
static bool __collapse_huge_page_swapin(struct mm_struct *mm,
|
|
|
|
struct vm_area_struct *vma,
|
2021-01-15 00:33:49 +09:00
|
|
|
unsigned long haddr, pmd_t *pmd,
|
2016-07-27 07:26:46 +09:00
|
|
|
int referenced)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
2018-08-24 09:01:36 +09:00
|
|
|
int swapped_in = 0;
|
|
|
|
vm_fault_t ret = 0;
|
2021-01-15 00:33:49 +09:00
|
|
|
unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE);
|
|
|
|
|
|
|
|
for (address = haddr; address < end; address += PAGE_SIZE) {
|
|
|
|
struct vm_fault vmf = {
|
|
|
|
.vma = vma,
|
|
|
|
.address = address,
|
|
|
|
.pgoff = linear_page_index(vma, haddr),
|
|
|
|
.flags = FAULT_FLAG_ALLOW_RETRY,
|
|
|
|
.pmd = pmd,
|
|
|
|
.vma_flags = vma->vm_flags,
|
|
|
|
.vma_page_prot = vma->vm_page_prot,
|
|
|
|
};
|
|
|
|
|
|
|
|
vmf.pte = pte_offset_map(pmd, address);
|
2016-12-15 08:07:16 +09:00
|
|
|
vmf.orig_pte = *vmf.pte;
|
2021-01-15 00:33:49 +09:00
|
|
|
if (!is_swap_pte(vmf.orig_pte)) {
|
|
|
|
pte_unmap(vmf.pte);
|
2016-07-27 07:26:24 +09:00
|
|
|
continue;
|
2021-01-15 00:33:49 +09:00
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
swapped_in++;
|
2016-12-15 08:07:16 +09:00
|
|
|
ret = do_swap_page(&vmf);
|
2016-07-27 07:26:46 +09:00
|
|
|
|
2020-06-09 13:33:54 +09:00
|
|
|
/* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
|
2016-07-27 07:26:24 +09:00
|
|
|
if (ret & VM_FAULT_RETRY) {
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_lock(mm);
|
2021-01-15 00:33:49 +09:00
|
|
|
if (hugepage_vma_revalidate(mm, haddr, &vma)) {
|
2016-07-27 07:26:43 +09:00
|
|
|
/* vma is no longer available, don't continue to swapin */
|
2016-07-27 07:26:46 +09:00
|
|
|
trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
|
2016-07-27 07:26:24 +09:00
|
|
|
return false;
|
2016-07-27 07:26:43 +09:00
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
/* check if the pmd is still valid */
|
2021-01-15 00:33:49 +09:00
|
|
|
if (mm_find_pmd(mm, haddr) != pmd) {
|
2017-05-13 07:46:38 +09:00
|
|
|
trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
|
2016-07-27 07:26:24 +09:00
|
|
|
return false;
|
2017-05-13 07:46:38 +09:00
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
if (ret & VM_FAULT_ERROR) {
|
2016-07-27 07:26:46 +09:00
|
|
|
trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
|
2016-07-27 07:26:24 +09:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
2020-06-04 08:00:17 +09:00
|
|
|
|
|
|
|
/* Drain LRU add pagevec to remove extra pin on the swapped in pages */
|
|
|
|
if (swapped_in)
|
|
|
|
lru_add_drain();
|
|
|
|
|
2016-07-27 07:26:46 +09:00
|
|
|
trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 1);
|
2016-07-27 07:26:24 +09:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void collapse_huge_page(struct mm_struct *mm,
|
|
|
|
unsigned long address,
|
|
|
|
struct page **hpage,
|
2020-06-04 08:00:09 +09:00
|
|
|
int node, int referenced, int unmapped)
|
2016-07-27 07:26:24 +09:00
|
|
|
{
|
2020-06-04 08:00:23 +09:00
|
|
|
LIST_HEAD(compound_pagelist);
|
2016-07-27 07:26:24 +09:00
|
|
|
pmd_t *pmd, _pmd;
|
|
|
|
pte_t *pte;
|
|
|
|
pgtable_t pgtable;
|
|
|
|
struct page *new_page;
|
|
|
|
spinlock_t *pmd_ptl, *pte_ptl;
|
|
|
|
int isolated = 0, result = 0;
|
2016-09-20 06:44:01 +09:00
|
|
|
struct vm_area_struct *vma;
|
2018-12-28 17:38:09 +09:00
|
|
|
struct mmu_notifier_range range;
|
2016-07-27 07:26:24 +09:00
|
|
|
gfp_t gfp;
|
|
|
|
|
|
|
|
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
|
|
|
|
|
|
|
|
/* Only allocate from the target node */
|
2017-01-11 09:57:42 +09:00
|
|
|
gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
|
2016-07-27 07:26:24 +09:00
|
|
|
|
2016-07-27 07:26:26 +09:00
|
|
|
/*
|
2020-06-09 13:33:54 +09:00
|
|
|
* Before allocating the hugepage, release the mmap_lock read lock.
|
2016-07-27 07:26:26 +09:00
|
|
|
* The allocation can take potentially a long time if it involves
|
2020-06-09 13:33:54 +09:00
|
|
|
* sync compaction, and we do not need to hold the mmap_lock during
|
2016-07-27 07:26:26 +09:00
|
|
|
* that. We will recheck the vma after taking it again in write mode.
|
|
|
|
*/
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm);
|
2016-07-27 07:26:26 +09:00
|
|
|
new_page = khugepaged_alloc_page(hpage, gfp, node);
|
2016-07-27 07:26:24 +09:00
|
|
|
if (!new_page) {
|
|
|
|
result = SCAN_ALLOC_HUGE_PAGE_FAIL;
|
|
|
|
goto out_nolock;
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:02:24 +09:00
|
|
|
if (unlikely(mem_cgroup_charge(new_page, mm, gfp))) {
|
2016-07-27 07:26:24 +09:00
|
|
|
result = SCAN_CGROUP_CHARGE_FAIL;
|
|
|
|
goto out_nolock;
|
|
|
|
}
|
2020-06-04 08:02:04 +09:00
|
|
|
count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_lock(mm);
|
2016-09-20 06:44:01 +09:00
|
|
|
result = hugepage_vma_revalidate(mm, address, &vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
if (result) {
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
goto out_nolock;
|
|
|
|
}
|
|
|
|
|
|
|
|
pmd = mm_find_pmd(mm, address);
|
|
|
|
if (!pmd) {
|
|
|
|
result = SCAN_PMD_NULL;
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
goto out_nolock;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-06-09 13:33:54 +09:00
|
|
|
* __collapse_huge_page_swapin always returns with mmap_lock locked.
|
|
|
|
* If it fails, we release mmap_lock and jump out_nolock.
|
2016-07-27 07:26:24 +09:00
|
|
|
* Continuing to collapse causes inconsistency.
|
|
|
|
*/
|
2020-06-04 08:00:09 +09:00
|
|
|
if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
|
|
|
|
pmd, referenced)) {
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
goto out_nolock;
|
|
|
|
}
|
|
|
|
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
/*
|
|
|
|
* Prevent all access to pagetables with the exception of
|
|
|
|
* gup_fast later handled by the ptep_clear_flush and the VM
|
|
|
|
* handled by the anon_vma lock + PG_lock.
|
|
|
|
*/
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_write_lock(mm);
|
2016-09-20 06:44:01 +09:00
|
|
|
result = hugepage_vma_revalidate(mm, address, &vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
if (result)
|
|
|
|
goto out;
|
|
|
|
/* check if the pmd is still valid */
|
|
|
|
if (mm_find_pmd(mm, address) != pmd)
|
|
|
|
goto out;
|
|
|
|
|
2018-04-17 23:33:15 +09:00
|
|
|
vm_write_begin(vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
anon_vma_lock_write(vma->anon_vma);
|
|
|
|
|
2019-05-14 09:20:53 +09:00
|
|
|
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
|
mm/mmu_notifier: contextual information for event triggering invalidation
CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
a result of kernel activities (memory compression, reclaim, migration,
...).
Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening.
This patchset do the initial mechanical convertion of all the places that
calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
event as well as the vma if it is know (most invalidation happens against
a given vma). Passing down the vma allows the users of mmu notifier to
inspect the new vma page protection.
The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
should assume that every for the range is going away when that event
happens. A latter patch do convert mm call path to use a more appropriate
events for each call.
This is done as 2 patches so that no call site is forgotten especialy
as it uses this following coccinelle patch:
%<----------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
+enum mmu_notifier_event event,
+unsigned flags,
+struct vm_area_struct *vma,
struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }
@@
@@
-#define mmu_notifier_range_init(range, mm, start, end)
+#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)
@@
expression E1, E3, E4;
identifier I1;
@@
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, I1,
I1->vm_mm, E3, E4)
...>
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(..., struct vm_area_struct *VMA, ...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(...) {
struct vm_area_struct *VMA;
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN;
@@
FN(...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, NULL,
E2, E3, E4)
...> }
---------------------------------------------------------------------->%
Applied with:
spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
spatch --sp-file mmu-notifier.spatch --dir mm --in-place
Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:20:49 +09:00
|
|
|
address, address + HPAGE_PMD_SIZE);
|
2018-12-28 17:38:09 +09:00
|
|
|
mmu_notifier_invalidate_range_start(&range);
|
2019-11-06 14:16:48 +09:00
|
|
|
|
|
|
|
pte = pte_offset_map(pmd, address);
|
|
|
|
pte_ptl = pte_lockptr(mm, pmd);
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
|
|
|
|
/*
|
2022-09-08 03:01:43 +09:00
|
|
|
* This removes any huge TLB entry from the CPU so we won't allow
|
|
|
|
* huge and small TLB entries for the same virtual address to
|
|
|
|
* avoid the risk of CPU bugs in that area.
|
|
|
|
*
|
|
|
|
* Parallel fast GUP is fine since fast GUP will back off when
|
|
|
|
* it detects PMD is changed.
|
2016-07-27 07:26:24 +09:00
|
|
|
*/
|
|
|
|
_pmd = pmdp_collapse_flush(vma, address, pmd);
|
|
|
|
spin_unlock(pmd_ptl);
|
2018-12-28 17:38:09 +09:00
|
|
|
mmu_notifier_invalidate_range_end(&range);
|
2022-12-07 02:16:04 +09:00
|
|
|
tlb_remove_table_sync_one();
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
spin_lock(pte_ptl);
|
2020-06-04 08:00:23 +09:00
|
|
|
isolated = __collapse_huge_page_isolate(vma, address, pte,
|
|
|
|
&compound_pagelist);
|
2016-07-27 07:26:24 +09:00
|
|
|
spin_unlock(pte_ptl);
|
|
|
|
|
|
|
|
if (unlikely(!isolated)) {
|
|
|
|
pte_unmap(pte);
|
|
|
|
spin_lock(pmd_ptl);
|
|
|
|
BUG_ON(!pmd_none(*pmd));
|
|
|
|
/*
|
|
|
|
* We can only use set_pmd_at when establishing
|
|
|
|
* hugepmds and never for establishing regular pmds that
|
|
|
|
* points to regular pagetables. Use pmd_populate for that
|
|
|
|
*/
|
|
|
|
pmd_populate(mm, pmd, pmd_pgtable(_pmd));
|
|
|
|
spin_unlock(pmd_ptl);
|
|
|
|
anon_vma_unlock_write(vma->anon_vma);
|
2018-04-17 23:33:15 +09:00
|
|
|
vm_write_end(vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* All pages are isolated and locked so anon_vma rmap
|
|
|
|
* can't run anymore.
|
|
|
|
*/
|
|
|
|
anon_vma_unlock_write(vma->anon_vma);
|
|
|
|
|
2020-06-04 08:00:23 +09:00
|
|
|
__collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl,
|
|
|
|
&compound_pagelist);
|
2016-07-27 07:26:24 +09:00
|
|
|
pte_unmap(pte);
|
|
|
|
__SetPageUptodate(new_page);
|
|
|
|
pgtable = pmd_pgtable(_pmd);
|
|
|
|
|
|
|
|
_pmd = mk_huge_pmd(new_page, vma->vm_page_prot);
|
2017-11-30 02:01:01 +09:00
|
|
|
_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
/*
|
|
|
|
* spin_lock() below is not the equivalent of smp_wmb(), so
|
|
|
|
* this is needed to avoid the copy_huge_page writes to become
|
|
|
|
* visible after the set_pmd_at() write.
|
|
|
|
*/
|
|
|
|
smp_wmb();
|
|
|
|
|
|
|
|
spin_lock(pmd_ptl);
|
|
|
|
BUG_ON(!pmd_none(*pmd));
|
2020-06-04 08:01:57 +09:00
|
|
|
page_add_new_anon_rmap(new_page, vma, address, true);
|
2020-08-12 10:30:40 +09:00
|
|
|
lru_cache_add_inactive_or_unevictable(new_page, vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
pgtable_trans_huge_deposit(mm, pmd, pgtable);
|
|
|
|
set_pmd_at(mm, address, pmd, _pmd);
|
|
|
|
update_mmu_cache_pmd(vma, address, pmd);
|
|
|
|
spin_unlock(pmd_ptl);
|
2018-04-17 23:33:15 +09:00
|
|
|
vm_write_end(vma);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
*hpage = NULL;
|
|
|
|
|
|
|
|
khugepaged_pages_collapsed++;
|
|
|
|
result = SCAN_SUCCEED;
|
|
|
|
out_up_write:
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_write_unlock(mm);
|
2016-07-27 07:26:24 +09:00
|
|
|
out_nolock:
|
2020-06-04 08:02:04 +09:00
|
|
|
if (!IS_ERR_OR_NULL(*hpage))
|
|
|
|
mem_cgroup_uncharge(*hpage);
|
2016-07-27 07:26:24 +09:00
|
|
|
trace_mm_collapse_huge_page(mm, isolated, result);
|
|
|
|
return;
|
|
|
|
out:
|
|
|
|
goto out_up_write;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int khugepaged_scan_pmd(struct mm_struct *mm,
|
|
|
|
struct vm_area_struct *vma,
|
|
|
|
unsigned long address,
|
|
|
|
struct page **hpage)
|
|
|
|
{
|
|
|
|
pmd_t *pmd;
|
|
|
|
pte_t *pte, *_pte;
|
2020-06-04 08:00:30 +09:00
|
|
|
int ret = 0, result = 0, referenced = 0;
|
|
|
|
int none_or_zero = 0, shared = 0;
|
2016-07-27 07:26:24 +09:00
|
|
|
struct page *page = NULL;
|
|
|
|
unsigned long _address;
|
|
|
|
spinlock_t *ptl;
|
|
|
|
int node = NUMA_NO_NODE, unmapped = 0;
|
2016-07-27 07:26:46 +09:00
|
|
|
bool writable = false;
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
|
|
|
|
|
|
|
|
pmd = mm_find_pmd(mm, address);
|
|
|
|
if (!pmd) {
|
|
|
|
result = SCAN_PMD_NULL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
|
|
|
|
pte = pte_offset_map_lock(mm, pmd, address, &ptl);
|
|
|
|
for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
|
|
|
|
_pte++, _address += PAGE_SIZE) {
|
|
|
|
pte_t pteval = *_pte;
|
|
|
|
if (is_swap_pte(pteval)) {
|
|
|
|
if (++unmapped <= khugepaged_max_ptes_swap) {
|
2020-04-07 12:06:04 +09:00
|
|
|
/*
|
|
|
|
* Always be strict with uffd-wp
|
|
|
|
* enabled swap entries. Please see
|
|
|
|
* comment below for pte_uffd_wp().
|
|
|
|
*/
|
|
|
|
if (pte_swp_uffd_wp(pteval)) {
|
|
|
|
result = SCAN_PTE_UFFD_WP;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
continue;
|
|
|
|
} else {
|
|
|
|
result = SCAN_EXCEED_SWAP_PTE;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
|
|
|
|
if (!userfaultfd_armed(vma) &&
|
|
|
|
++none_or_zero <= khugepaged_max_ptes_none) {
|
|
|
|
continue;
|
|
|
|
} else {
|
|
|
|
result = SCAN_EXCEED_NONE_PTE;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!pte_present(pteval)) {
|
|
|
|
result = SCAN_PTE_NON_PRESENT;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
2020-04-07 12:06:04 +09:00
|
|
|
if (pte_uffd_wp(pteval)) {
|
|
|
|
/*
|
|
|
|
* Don't collapse the page if any of the small
|
|
|
|
* PTEs are armed with uffd write protection.
|
|
|
|
* Here we can also mark the new huge pmd as
|
|
|
|
* write protected if any of the small ones is
|
|
|
|
* marked but that could bring uknown
|
|
|
|
* userfault messages that falls outside of
|
|
|
|
* the registered range. So, just be simple.
|
|
|
|
*/
|
|
|
|
result = SCAN_PTE_UFFD_WP;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
if (pte_write(pteval))
|
|
|
|
writable = true;
|
|
|
|
|
|
|
|
page = vm_normal_page(vma, _address, pteval);
|
|
|
|
if (unlikely(!page)) {
|
|
|
|
result = SCAN_PAGE_NULL;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:00:30 +09:00
|
|
|
if (page_mapcount(page) > 1 &&
|
|
|
|
++shared > khugepaged_max_ptes_shared) {
|
|
|
|
result = SCAN_EXCEED_SHARED_PTE;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:00:23 +09:00
|
|
|
page = compound_head(page);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Record which node the original page is from and save this
|
|
|
|
* information to khugepaged_node_load[].
|
|
|
|
* Khupaged will allocate hugepage from the node has the max
|
|
|
|
* hit record.
|
|
|
|
*/
|
|
|
|
node = page_to_nid(page);
|
|
|
|
if (khugepaged_scan_abort(node)) {
|
|
|
|
result = SCAN_SCAN_ABORT;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
khugepaged_node_load[node]++;
|
|
|
|
if (!PageLRU(page)) {
|
|
|
|
result = SCAN_PAGE_LRU;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
if (PageLocked(page)) {
|
|
|
|
result = SCAN_PAGE_LOCK;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
if (!PageAnon(page)) {
|
|
|
|
result = SCAN_PAGE_ANON;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-06-04 08:00:20 +09:00
|
|
|
* Check if the page has any GUP (or other external) pins.
|
|
|
|
*
|
|
|
|
* Here the check is racy it may see totmal_mapcount > refcount
|
|
|
|
* in some cases.
|
|
|
|
* For example, one process with one forked child process.
|
|
|
|
* The parent has the PMD split due to MADV_DONTNEED, then
|
|
|
|
* the child is trying unmap the whole PMD, but khugepaged
|
|
|
|
* may be scanning the parent between the child has
|
|
|
|
* PageDoubleMap flag cleared and dec the mapcount. So
|
|
|
|
* khugepaged may see total_mapcount > refcount.
|
|
|
|
*
|
|
|
|
* But such case is ephemeral we could always retry collapse
|
|
|
|
* later. However it may report false positive if the page
|
|
|
|
* has excessive GUP pins (i.e. 512). Anyway the same check
|
|
|
|
* will be done again later the risk seems low.
|
2016-07-27 07:26:24 +09:00
|
|
|
*/
|
2020-06-04 08:00:20 +09:00
|
|
|
if (!is_refcount_suitable(page)) {
|
2016-07-27 07:26:24 +09:00
|
|
|
result = SCAN_PAGE_COUNT;
|
|
|
|
goto out_unmap;
|
|
|
|
}
|
|
|
|
if (pte_young(pteval) ||
|
|
|
|
page_is_young(page) || PageReferenced(page) ||
|
|
|
|
mmu_notifier_test_young(vma->vm_mm, address))
|
2016-07-27 07:26:46 +09:00
|
|
|
referenced++;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
2020-06-04 08:00:09 +09:00
|
|
|
if (!writable) {
|
2016-07-27 07:26:24 +09:00
|
|
|
result = SCAN_PAGE_RO;
|
2020-06-04 08:00:09 +09:00
|
|
|
} else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) {
|
|
|
|
result = SCAN_LACK_REFERENCED_PAGE;
|
|
|
|
} else {
|
|
|
|
result = SCAN_SUCCEED;
|
|
|
|
ret = 1;
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
out_unmap:
|
|
|
|
pte_unmap_unlock(pte, ptl);
|
|
|
|
if (ret) {
|
|
|
|
node = khugepaged_find_target_node();
|
2020-06-09 13:33:54 +09:00
|
|
|
/* collapse_huge_page will return with the mmap_lock released */
|
2020-06-04 08:00:09 +09:00
|
|
|
collapse_huge_page(mm, address, hpage, node,
|
|
|
|
referenced, unmapped);
|
2016-07-27 07:26:24 +09:00
|
|
|
}
|
|
|
|
out:
|
|
|
|
trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced,
|
|
|
|
none_or_zero, result, unmapped);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void collect_mm_slot(struct mm_slot *mm_slot)
|
|
|
|
{
|
|
|
|
struct mm_struct *mm = mm_slot->mm;
|
|
|
|
|
2018-10-05 15:45:47 +09:00
|
|
|
lockdep_assert_held(&khugepaged_mm_lock);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
if (khugepaged_test_exit(mm)) {
|
|
|
|
/* free mm_slot */
|
|
|
|
hash_del(&mm_slot->hash);
|
|
|
|
list_del(&mm_slot->mm_node);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Not strictly needed because the mm exited already.
|
|
|
|
*
|
|
|
|
* clear_bit(MMF_VM_HUGEPAGE, &mm->flags);
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* khugepaged_mm_lock actually not necessary for the below */
|
|
|
|
free_mm_slot(mm_slot);
|
|
|
|
mmdrop(mm);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-04-07 12:04:35 +09:00
|
|
|
#ifdef CONFIG_SHMEM
|
2019-09-24 07:38:30 +09:00
|
|
|
/*
|
|
|
|
* Notify khugepaged that given addr of the mm is pte-mapped THP. Then
|
|
|
|
* khugepaged should try to collapse the page table.
|
|
|
|
*/
|
|
|
|
static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
|
|
|
|
unsigned long addr)
|
|
|
|
{
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
|
|
|
|
VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
|
|
|
|
|
|
|
|
spin_lock(&khugepaged_mm_lock);
|
|
|
|
mm_slot = get_mm_slot(mm);
|
|
|
|
if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP))
|
|
|
|
mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr;
|
|
|
|
spin_unlock(&khugepaged_mm_lock);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Try to collapse a pte-mapped THP for mm at address haddr.
|
|
|
|
*
|
|
|
|
* This function checks whether all the PTEs in the PMD are pointing to the
|
|
|
|
* right THP. If so, retract the page table so the THP can refault in with
|
|
|
|
* as pmd-mapped.
|
|
|
|
*/
|
|
|
|
void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
|
|
|
|
{
|
|
|
|
unsigned long haddr = addr & HPAGE_PMD_MASK;
|
|
|
|
struct vm_area_struct *vma = find_vma(mm, haddr);
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
struct page *hpage;
|
2019-09-24 07:38:30 +09:00
|
|
|
pte_t *start_pte, *pte;
|
|
|
|
pmd_t *pmd, _pmd;
|
|
|
|
spinlock_t *ptl;
|
|
|
|
int count = 0;
|
|
|
|
int i;
|
2022-12-07 02:16:05 +09:00
|
|
|
struct mmu_notifier_range range;
|
2019-09-24 07:38:30 +09:00
|
|
|
|
|
|
|
if (!vma || !vma->vm_file ||
|
|
|
|
vma->vm_start > haddr || vma->vm_end < haddr + HPAGE_PMD_SIZE)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This vm_flags may not have VM_HUGEPAGE if the page was not
|
|
|
|
* collapsed by this mm. But we can still collapse if the page is
|
|
|
|
* the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
|
|
|
|
* will not fail the vma for missing VM_HUGEPAGE
|
|
|
|
*/
|
|
|
|
if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
|
|
|
|
return;
|
|
|
|
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
hpage = find_lock_page(vma->vm_file->f_mapping,
|
|
|
|
linear_page_index(vma, haddr));
|
|
|
|
if (!hpage)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!PageHead(hpage))
|
|
|
|
goto drop_hpage;
|
|
|
|
|
2019-09-24 07:38:30 +09:00
|
|
|
pmd = mm_find_pmd(mm, haddr);
|
|
|
|
if (!pmd)
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
goto drop_hpage;
|
2019-09-24 07:38:30 +09:00
|
|
|
|
2022-11-22 05:15:43 +09:00
|
|
|
vm_write_begin(vma);
|
Merge tag 'android12-5.10.160_r00' into android12-5.10
This is the merge of the upstream LTS release of 5.10.160 into the
android12-5.10 branch.
It contains the following commits:
003c389455eb Merge 5.10.160 into android12-5.10-lts
a2428a8dcb4f Linux 5.10.160
54c15f67cb72 ASoC: ops: Correct bounds check for second channel on SX controls
74b139c63f07 nvme-pci: clear the prp2 field when not used
77ebf88e0031 ASoC: cs42l51: Correct PGA Volume minimum value
4db1d19b74e0 can: mcba_usb: Fix termination command argument
683837f2f69d can: sja1000: fix size of OCR_MODE_MASK define
434b5236710f pinctrl: meditatek: Startup with the IRQs disabled
5cb4abb0caa5 libbpf: Use page size as max_entries when probing ring buffer map
50b5f6d4d9d2 ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx()
344739dc56f1 ASoC: fsl_micfil: explicitly clear CHnF flags
a49c1a730775 ASoC: fsl_micfil: explicitly clear software reset bit
75454b4bbfc7 io_uring: add missing item types for splice request
17f386e6b769 fuse: always revalidate if exclusive create
eb6313c12955 nfp: fix use-after-free in area_cache_get()
965d93fb39b9 vfs: fix copy_file_range() averts filesystem freeze protection
ed9673394979 vfs: fix copy_file_range() regression in cross-fs copies
970862a96c0d x86/smpboot: Move rcu_cpu_starting() earlier
32e45c58a05f Merge "Merge 5.10.159 into android12-5.10-lts" into android12-5.10-lts
d31626cbea61 ANDROID: usb: gadget: uvc: remove duplicate code in unbind
01ef2d0b53f3 Merge 5.10.159 into android12-5.10-lts
931578be6987 Linux 5.10.159
4fd6f84e0a0c can: esd_usb: Allow REC and TEC to return to zero
cf0e42310648 macsec: add missing attribute validation for offload
6b03e41767c7 net: mvneta: Fix an out of bounds check
8208d7e56b1e ipv6: avoid use-after-free in ip6_fragment()
3d59adad126d net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
a00444e25bbc xen/netback: fix build warning
87277bdf2c37 ethernet: aeroflex: fix potential skb leak in greth_init_rings()
cc668fddde42 tipc: call tipc_lxc_xmit without holding node_read_lock
4be43e46c3f9 net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
8e3f9ac00956 ipv4: Fix incorrect route flushing when table ID 0 is used
5211e5ff9ddc ipv4: Fix incorrect route flushing when source address is deleted
36e248269a16 tipc: Fix potential OOB in tipc_link_proto_rcv()
93aaa4bb72e3 net: hisilicon: Fix potential use-after-free in hix5hd2_rx()
296a50aa8b29 net: hisilicon: Fix potential use-after-free in hisi_femac_rx()
8d1aed7a117a net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq
a5cfbc199536 ip_gre: do not report erspan version on GRE interface
696e34d54ca1 net: stmmac: fix "snps,axi-config" node property parsing
ca26f45083d6 nvme initialize core quirks before calling nvme_init_subsystem
27eb2d7a1b99 NFC: nci: Bounds check struct nfc_target arrays
a2506b19d7a3 i40e: Disallow ip4 and ip6 l4_4_bytes
8329b65e34ef i40e: Fix for VF MAC address 0
215f3ac53b18 i40e: Fix not setting default xps_cpus after reset
146ebee8fcdb net: mvneta: Prevent out of bounds read in mvneta_config_rss()
e6860c889f4a xen-netfront: Fix NULL sring after live migration
3d3b30718ae3 net: encx24j600: Fix invalid logic in reading of MISTAT register
51ba1820e736 net: encx24j600: Add parentheses to fix precedence
42c319635c0c mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add()
4c693330cec2 selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload
bccda3ad0748 net: dsa: ksz: Check return value
e7b950458156 Bluetooth: Fix not cleanup led when bt_init fails
1717354d77f8 Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn()
80c69b31aa5b vmxnet3: correctly report encapsulated LRO packet
575a6266f63d af_unix: Get user_ns from in_skb in unix_diag_get_exact().
6c788c0a2506 drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420
de918d9738c7 igb: Allocate MSI-X vector when testing
6595c9208d97 e1000e: Fix TX dispatch condition
5ee6413d3dd9 gpio: amd8111: Fix PCI device reference count leak
b9aca69a6c82 drm/bridge: ti-sn65dsi86: Fix output polarity setting bug
b46e8c50c386 netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark
0a8e66e37573 ca8210: Fix crash by zero initializing data
27c71825ffc4 ieee802154: cc2520: Fix error return code in cc2520_hw_init()
a0418d0a6b2d netfilter: nft_set_pipapo: Actually validate intervals in fields after the first one
cb283cca1ddc rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
5c432383b687 rtc: mc146818-lib: fix locking in mc146818_set_time
5e26531d8113 rtc: cmos: Disable irq around direct invocation of cmos_interrupt()
fccee93eb20d mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
c42221efb115 can: af_can: fix NULL pointer dereference in can_rcv_filter
bc03f809da78 HID: core: fix shift-out-of-bounds in hid_report_raw_event
959a23a4d111 HID: hid-lg4ff: Add check for empty lbuf
4dde75945a9c HID: usbhid: Add ALWAYS_POLL quirk for some mice
11e95d85c3c9 drm/shmem-helper: Avoid vm_open error paths
6a4da05acd06 drm/shmem-helper: Remove errant put in error path
007f561f599f drm/vmwgfx: Don't use screen objects when SEV is active
3cb78c39252e KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
549b46f8130e Bluetooth: Fix crash when replugging CSR fake controllers
380d183e998b Bluetooth: btusb: Add debug message for CSR controllers
f1cf856123ce mm/gup: fix gup_pud_range() for dax
f1f7f36cf682 memcg: fix possible use-after-free in memcg_write_event_control()
32f01f0306a9 media: v4l2-dv-timings.c: fix too strict blanking sanity checks
043b2bc96ca2 Revert "ARM: dts: imx7: Fix NAND controller size-cells"
abfb8ae69bdc media: videobuf2-core: take mmap_lock in vb2_get_unmapped_area()
83632fc41449 xen/netback: don't call kfree_skb() with interrupts disabled
3eecd2bc10e0 xen/netback: do some code cleanup
49e07c0768db xen/netback: Ensure protocol headers don't fall in the non-linear area
db44a9443e58 rtc: mc146818: Reduce spinlock section in mc146818_set_time()
17293d630f5f rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ
acfd8ef683fb rtc: cmos: avoid UIP when reading alarm time
949bae02827e rtc: cmos: avoid UIP when writing alarm time
33ac73a41af6 rtc: mc146818-lib: extract mc146818_avoid_UIP
8bb5fe58305f rtc: mc146818-lib: fix RTC presence check
775d4661f145 rtc: Check return value from mc146818_get_time()
b9a5c470e075 rtc: mc146818-lib: change return values of mc146818_get_time()
94eaf9966e04 rtc: cmos: remove stale REVISIT comments
f5b51f855036 rtc: mc146818: Dont test for bit 0-5 in Register D
3736972360fa rtc: mc146818: Detect and handle broken RTCs
7c7075c88da4 rtc: mc146818: Prevent reading garbage
7f445ca2e0e5 mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
4a1cdb49d0f2 mm/khugepaged: fix GUP-fast interaction by sending IPI
cdfd3739b212 mm/khugepaged: take the right locks for page table retraction
1c0eec6a1d17 net: usb: qmi_wwan: add u-blox 0x1342 composition
a8c5ffb4dffd 9p/xen: check logical size for buffer size
ec36ebae3667 usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer
d9b53caf0191 fbcon: Use kzalloc() in fbcon_prepare_logo()
8b130c770d00 regulator: twl6030: fix get status of twl6032 regulators
f6f45e538328 ASoC: soc-pcm: Add NULL check in BE reparenting
688a45aff2b2 btrfs: send: avoid unaligned encoded writes when attempting to clone range
15c42ab8d43a ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event
d38e021416b2 regulator: slg51000: Wait after asserting CS pin
1331bcfcac18 9p/fd: Use P9_HDRSZ for header size
96b43f36a593 ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188
ddf58f59393b ASoC: wm8962: Wait for updated value of WM8962_CLOCKING1 register
dbd78abd696d ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation
bb1866cf1ee9 ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels
b1f40a0cdf00 ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name
5f9474d07b60 arm64: dts: rockchip: fix ir-receiver node names
060d58924af6 ARM: dts: rockchip: fix ir-receiver node names
3e0c4667713a arm: dts: rockchip: fix node name for hym8563 rtc
3ada63a87654 arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series
202ee063496e Revert "mmc: sdhci: Fix voltage switch delay"
0b0939466f8c ANDROID: gki_defconfig: add CONFIG_FUNCTION_ERROR_INJECTION
5ab4c6b8436b Merge 5.10.158 into android12-5.10-lts
592346d5dc9b Linux 5.10.158
cc1b4718cc42 ipc/sem: Fix dangling sem_array access in semtimedop race
d072a10c81d3 v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails
9ba389863ac6 proc: proc_skip_spaces() shouldn't think it is working on C strings
4aa32aaef6c1 proc: avoid integer type confusion in get_proc_long
5f2f77560591 block: unhash blkdev part inode when the part is deleted
a82869ac52f3 Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()
4e0d6c687c92 char: tpm: Protect tpm_pm_suspend with locks
5a6f935ef34e Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
f075cf139f55 ACPI: HMAT: Fix initiator registration for single-initiator systems
f3b76b4d38fd ACPI: HMAT: remove unnecessary variable initialization
63e72417a1ad i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set
df7613659872 i2c: npcm7xx: Fix error handling in npcm_i2c_init()
7462cd2443bc x86/pm: Add enumeration check before spec MSRs save/restore setup
5e3d4a68e2e1 x86/tsx: Add a feature bit for TSX control MSR support
b7f7a0402eb7 Revert "tty: n_gsm: avoid call of sleeping functions from atomic context"
481f9ed8ebdc ipv4: Fix route deletion when nexthop info is not specified
0b5394229eba ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference
4919503426c9 selftests: net: fix nexthop warning cleanup double ip typo
7ca14c5f24db selftests: net: add delete nexthop route warning test
f09ac62f0e3f Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
19d91d3798e7 parisc: Increase FRAME_WARN to 2048 bytes on parisc
fcf20da09974 xtensa: increase size of gcc stack frame check
a1877001ed6d parisc: Increase size of gcc stack frame check
a5c65cd56aed iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
10ed7655a17f iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
302edce1dd42 pinctrl: single: Fix potential division by zero
b50c96418972 ASoC: ops: Fix bounds check for _sx controls
a2efc465245e io_uring: don't hold uring_lock when calling io_run_task_work*
be111ebd8868 tracing: Free buffers when a used dynamic event is removed
648b92e57607 drm/i915: Never return 0 if not all requests retired
8649c023c427 drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
940b774069f1 mmc: sdhci: Fix voltage switch delay
ed1966245307 mmc: sdhci-sprd: Fix no reset data and command after voltage switch
ef767907e77d mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
46ee041cd655 mmc: core: Fix ambiguous TRIM and DISCARD arg
b79be962b567 mmc: mmc_test: Fix removal of debugfs file
d4fc344c0d9c net: stmmac: Set MAC's flow control register to reflect current settings
549e24409ac5 pinctrl: intel: Save and restore pins in "direct IRQ" mode
471fb7b735bf x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3
e858917ab785 nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
6ddf788400dd tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
c099d12c5502 error-injection: Add prompt for function error injection
26b6f927bb86 riscv: vdso: fix section overlapping under some conditions
2b1d8f27e205 net/mlx5: DR, Fix uninitialized var warning
c40db1e5f316 hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()
f06e0cd01eab hwmon: (coretemp) Check for null before removing sysfs attrs
d93522d04f84 net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
176ee6c673cc sctp: fix memory leak in sctp_stream_outq_migrate()
1c38c88acc96 packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
5f442e1d403e net: tun: Fix use-after-free in tun_detach()
5fa0fc5876b5 afs: Fix fileserver probe RTT handling
7ca81a161e40 net: hsr: Fix potential use-after-free
a1ba595e35aa tipc: re-fetch skb cb after tipc_msg_validate
4621bdfff5f8 dsa: lan9303: Correct stat name
45752af02475 net: ethernet: nixge: fix NULL dereference
e01c1542379f net/9p: Fix a potential socket leak in p9_socket_open
b080d4668f3f net: net_netdev: Fix error handling in ntb_netdev_init_module()
fe6bc99c27c2 net: phy: fix null-ptr-deref while probe() failed
0184ede0ec61 wifi: mac8021: fix possible oob access in ieee80211_get_rate_duration
e2ed90fd3ae0 wifi: cfg80211: don't allow multi-BSSID in S1G
9e6b79a3cd17 wifi: cfg80211: fix buffer overflow in elem comparison
6922948c2ec1 aquantia: Do not purge addresses when setting the number of rings
fa59d49a49b0 qlcnic: fix sleep-in-atomic-context bugs caused by msleep
d753f554f25d can: cc770: cc770_isa_probe(): add missing free_cc770dev()
e74746bf0453 can: sja1000_isa: sja1000_isa_probe(): add missing free_sja1000dev()
0d2f9d95d9fb net/mlx5e: Fix use-after-free when reverting termination table
2cb84ff34938 net/mlx5: Fix uninitialized variable bug in outlen_write()
b775f37d9439 e100: Fix possible use after free in e100_xmit_prepare
086f656e447b e100: switch from 'pci_' to 'dma_' API
971c55f0763b iavf: Fix error handling in iavf_init_module()
d389a4c69877 iavf: remove redundant ret variable
fd4960ea5362 fm10k: Fix error handling in fm10k_init_module()
dd425cec79ba i40e: Fix error handling in i40e_init_module()
f166c62cad79 ixgbevf: Fix resource leak in ixgbevf_init_module()
8f7047f41810 of: property: decrement node refcount in of_fwnode_get_reference_args()
be006212bd53 bpf: Do not copy spin lock field from user in bpf_selem_alloc
90907cd4d113 hwmon: (ibmpex) Fix possible UAF when ibmpex_register_bmc() fails
7649bba2633d hwmon: (i5500_temp) fix missing pci_disable_device()
dddfc03f044b hwmon: (ina3221) Fix shunt sum critical calculation
984fcd3ec1aa hwmon: (ltc2947) fix temperature scaling
8a549ab67245 libbpf: Handle size overflow for ringbuf mmap
cc140c729c68 ARM: at91: rm9200: fix usb device clock id
592724b14da7 scripts/faddr2line: Fix regression in name resolution on ppc64le
353c3aaaf3c4 bpf, perf: Use subprog name when reporting subprog ksymbol
d48f6a578405 iio: light: rpr0521: add missing Kconfig dependencies
5eb114f55b37 iio: health: afe4404: Fix oob read in afe4404_[read|write]_raw
b1756af172fb iio: health: afe4403: Fix oob read in afe4403_read_raw
01d7c41eac91 btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()
d3f5be824669 drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read"
00570fafc2bc drm/amdgpu: update drm_display_info correctly when the edid is read
44b204730bf3 drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code
1faf21bdd111 btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker
6050872f9f31 spi: spi-imx: Fix spi_bus_clk if requested clock is higher than input clock
7b020665d482 btrfs: free btrfs_path before copying inodes to userspace
d5b7a34379fa btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino
f3226d86f8ce Revert "xfrm: fix "disable_policy" on ipv4 early demux"
982d7f3eb8aa Merge 5.10.157 into android12-5.10-lts
37d3df60cb6a ANDROID: CRC ABI fixups in ip.h and ipv6.h
f4245f05389c Linux 5.10.157
4801672fb076 fuse: lock inode unconditionally in fuse_fallocate()
86f0082fb947 drm/i915: fix TLB invalidation for Gen12 video and compute engines
feb97cf45e77 drm/amdgpu: always register an MMU notifier for userptr
596b7d55d7c6 drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN
c86c1a7037cd btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs()
1581830c0eca btrfs: free btrfs_path before copying subvol info to userspace
0bdb8f7ef87d btrfs: free btrfs_path before copying fspath to userspace
24a37ba2cb66 btrfs: free btrfs_path before copying root refs to userspace
b56d6e55857b genirq: Take the proposed affinity at face value if force==true
9d90a2b98e6e irqchip/gic-v3: Always trust the managed affinity provided by the core code
e0d2c59ee995 genirq: Always limit the affinity to online CPUs
f8f80d532f78 genirq/msi: Shutdown managed interrupts with unsatifiable affinities
3eb6b89a4e9f wifi: wilc1000: validate number of channels
5a068535c007 wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_CHANNEL_LIST attribute
905f886eae4b wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_OPER_CHANNEL attribute
7c6535fb4d67 wifi: wilc1000: validate pairwise and authentication suite offsets
64b7f9a7ddfb dm integrity: clear the journal on suspend
d306f73079f3 dm integrity: flush the journal on suspend
79d9a1167978 gpu: host1x: Avoid trying to use GART on Tegra20
a7f30b5b8d7c net: usb: qmi_wwan: add Telit 0x103a composition
7e8eaa939eea tcp: configurable source port perturb table size
0acc008cf98e platform/x86: hp-wmi: Ignore Smart Experience App event
0964b77bab54 zonefs: fix zone report size in __zonefs_io_error()
a5937dae662b platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
52fb7bcea0c6 platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr()
4fa717ba2d25 xen/platform-pci: add missing free_irq() in error path
f45a5a6c9f6d xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
9bbb58747243 Input: soc_button_array - add Acer Switch V 10 to dmi_use_low_level_irq[]
4ea4316dffda Input: soc_button_array - add use_low_level_irq module parameter
c1620e996d0a Input: goodix - try resetting the controller when no config is set
f4db0509587a serial: 8250: 8250_omap: Avoid RS485 RTS glitch on ->set_termios()
7c3e39ccf5bd ASoC: Intel: bytcht_es8316: Add quirk for the Nanote UMPC-01
36e0b976196c Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode
ae9e0cc973fb binder: Gracefully handle BINDER_TYPE_FDA objects with num_fds=0
017de842533f binder: Address corner cases in deferred copy and fixup
2e3c27f24173 binder: fix pointer cast warning
c9d3f25a7f4e binder: defer copies of pre-patched txn data
5204296fc766 binder: read pre-translated fds from sender buffer
23e9d815fad8 binder: avoid potential data leakage when copying txn
22870431cd25 x86/ioremap: Fix page aligned size calculation in __ioremap_caller()
3fdeacf087ff KVM: x86: remove exit_int_info warning in svm_handle_exit
7e5cb13091e6 KVM: x86: nSVM: leave nested mode on vCPU free
d925dd3e444c mm: vmscan: fix extreme overreclaim and swap floods
a4a62a23fadc gcov: clang: fix the buffer overflow issue
e7f21d10e93e nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
f06b7e6a77c1 usb: dwc3: gadget: Clear ep descriptor last
cff7523ab8b8 usb: dwc3: gadget: Return -ESHUTDOWN on ep disable
a32635528d65 usb: dwc3: gadget: conditionally remove requests
ca3a08e9d9eb ceph: fix NULL pointer dereference for req->r_session
00c004c070f2 ceph: Use kcalloc for allocating multiple elements
69263bf781be ceph: fix possible NULL pointer dereference for req->r_session
8e137ace5333 ceph: put the requests/sessions when it fails to alloc memory
38993788f40c ceph: fix off by one bugs in unsafe_request_wait()
8a31ae7f7794 ceph: flush the mdlog before waiting on unsafe reqs
78b2f546f789 ceph: flush mdlog before umounting
d94ba7b3b7e7 ceph: make iterate_sessions a global symbol
9ac038d3c2f2 ceph: make ceph_create_session_msg a global symbol
8382cdf0ab5d usb: cdns3: Add support for DRD CDNSP
57112da86b1b mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
b5d770977b18 mmc: sdhci-brcmstb: Enable Clock Gating to save power
049194538cb8 mmc: sdhci-brcmstb: Re-organize flags
fbe955be268b nios2: add FORCE for vmlinuz.gz
c0a9c9973d24 init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash
456e895fd0b8 iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails
fa9efcbfbf77 iio: light: apds9960: fix wrong register for gesture gain
bd1b8041c2f6 arm64: dts: rockchip: lower rk3399-puma-haikou SD controller clock frequency
86ba9c859577 ext4: fix use-after-free in ext4_ext_shift_extents
350e98a08af1 usb: dwc3: exynos: Fix remove() function
d21d26e65b5f lib/vdso: use "grep -E" instead of "egrep"
c0cf8bc259e0 net: enetc: preserve TX ring priority across reconfiguration
de4dd4f9b3f6 net: enetc: cache accesses to &priv->si->hw
1f080b8caae9 net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled
1d840c5d673d s390/crashdump: fix TOD programmable field size
11052f118879 net: thunderx: Fix the ACPI memory leak
b034fe2a0800 nfc: st-nci: fix memory leaks in EVT_TRANSACTION
e14583073fc0 nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION
9cc863d52399 arcnet: fix potential memory leak in com20020_probe()
4d2be0cf27d9 net: arcnet: Fix RESET flag handling
e61b00374a6e s390/dasd: fix no record found for raw_track_access
aeebb0749972 ipv4: Fix error return code in fib_table_insert()
c0af4d005a26 dccp/tcp: Reset saddr on failure after inet6?_hash_connect().
b8e494240e69 netfilter: flowtable_offload: add missing locking
af9de5cdcb10 dma-buf: fix racing conflict of dma_heap_add()
c40b76dfa7e4 bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending()
f81e9c0510b0 regulator: twl6030: re-add TWL6032_SUBCLASS
32b944b9c4b2 NFC: nci: fix memory leak in nci_rx_data_packet()
68a7aec3f4b5 net: sched: allow act_ct to be built without NF_NAT
8e2664e12bc6 sfc: fix potential memleak in __ef100_hard_start_xmit()
6b638a16ead1 xfrm: Fix ignored return value in xfrm6_init()
c7788361a645 tipc: check skb_linearize() return value in tipc_disc_rcv()
4058e3b74ab3 tipc: add an extra conn_get in tipc_conn_alloc
e87a077d09c0 tipc: set con sock in tipc_conn_alloc
891daa95b0bb net/mlx5: Fix handling of entry refcount when command is not issued to FW
e06ff9f8fedf net/mlx5: Fix FW tracer timestamp calculation
5689eba90a20 netfilter: ipset: regression in ip_set_hash_ip.c
e62e62ea912a netfilter: ipset: Limit the maximal range of consecutive elements to add/delete
8dca384970ac Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register()
909186cf34de Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work()
f42802e14a87 macsec: Fix invalid error code set
72be055615e0 nfp: add port from netdev validation for EEPROM access
ce41e03cacaa nfp: fill splittable of devlink_port_attrs correctly
0b553ded3450 net: pch_gbe: fix pci device refcount leak while module exiting
2c59ef9ab63d net/qla3xxx: fix potential memleak in ql3xxx_send()
a24d5f6c8b7b net/mlx4: Check retval of mlx4_bitmap_init
da86a63479e5 net: ethernet: mtk_eth_soc: fix error handling in mtk_open()
756534f7cf53 ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties
290a71ff721b ARM: mxs: fix memory leak in mxs_machine_init()
5c97af75f53c netfilter: conntrack: Fix data-races around ct mark
459332f8dbfb 9p/fd: fix issue of list_del corruption in p9_fd_cancel()
26bb8f6aaae3 net: pch_gbe: fix potential memleak in pch_gbe_tx_queue()
398a860a4429 nfc/nci: fix race with opening and closing
3535c632e6d1 rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
23c03ee0eec4 rxrpc: Use refcount_t rather than atomic_t
bddde342c62e rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc
a2d5dba2fc69 net: liquidio: simplify if expression
8124a02e1717 ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl
b547bf71fa7e tee: optee: fix possible memory leak in optee_register_device()
b76c5a99f44a bus: sunxi-rsb: Support atomic transfers
0c059b7d2a6b regulator: core: fix UAF in destroy_regulator()
fcb2d286362b spi: dw-dma: decrease reference count in dw_spi_dma_init_mfld()
0b6441abfa5d regulator: core: fix kobject release warning and memory leak in regulator_register()
26d3d3ffa82b scsi: storvsc: Fix handling of srb_status and capacity change events
c34db0d6b88b ASoC: soc-pcm: Don't zero TDM masks in __soc_pcm_open()
4f6c7344ab26 ASoC: sgtl5000: Reset the CHIP_CLK_CTRL reg on remove
164a5b50d104 ASoC: hdac_hda: fix hda pcm buffer overflow issue
7cfb4b8579d3 ARM: dts: am335x-pcm-953: Define fixed regulators in root node
b7000254c125 af_key: Fix send_acquire race with pfkey_register
51969d679ba4 xfrm: replay: Fix ESN wrap around for GSO
497653f6d239 xfrm: fix "disable_policy" on ipv4 early demux
836bbdfcf8ef MIPS: pic32: treat port as signed integer
c0bb600f0768 RISC-V: vdso: Do not add missing symbols to version section in linker script
81cc6d8400ac arm64/syscall: Include asm/ptrace.h in syscall_wrapper header.
fa5f2c72d39f block, bfq: fix null pointer dereference in bfq_bio_bfqg()
d29bde868945 drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017)
f7ce6fb04e04 scsi: scsi_debug: Make the READ CAPACITY response compliant with ZBC
2574903ee260 scsi: ibmvfc: Avoid path failures during live migration
7fc62181c1d4 platform/x86: touchscreen_dmi: Add info for the RCA Cambio W101 v2 2-in-1
f54a11b6bf82 Revert "net: macsec: report real_dev features when HW offloading is enabled"
f4b8c0710ab6 selftests/bpf: Add verifier test for release_reference()
361a16509898 spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run
2c1ca23555ed wifi: mac80211: Fix ack frame idr leak when mesh has no route
8d39913158ad wifi: airo: do not assign -1 to unsigned char
8552e6048ec9 audit: fix undefined behavior in bit shift for AUDIT_BIT
1c9eb641d13e riscv: dts: sifive unleashed: Add PWM controlled LEDs
92ae6facd129 wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support
2fcc593b5047 wifi: mac80211: fix memory free error when registering wiphy fail
044bc6d3c2c0 ceph: avoid putting the realm twice when decoding snaps fails
d43219bb33d5 ceph: do not update snapshot context when there is no new snapshot
49c71b68141e iio: pressure: ms5611: fixed value compensation bug
879139bc7afb iio: ms5611: Simplify IO callback parameters
80c825e1e33b nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro
f4066fb91021 nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH
4f0cea018e03 drm/display: Don't assume dual mode adaptors support i2c sub-addressing
347f1793b573 bridge: switchdev: Fix memory leaks when changing VLAN protocol
89a7f155e6b2 bridge: switchdev: Notify about VLAN protocol changes
f5cbd86ebf28 ata: libata-core: do not issue non-internal commands once EH is pending
4034d06a4dbe ata: libata-scsi: simplify __ata_scsi_queuecmd()
03aabcb88aee scsi: scsi_transport_sas: Fix error handling in sas_phy_add()
d9b90a99f34d Merge 5.10.156 into android12-5.10-lts
25af5a11f1da Merge 5.10.155 into android12-5.10-lts
e5d2cd6ad886 ANDROID: abi preservation for fscrypt change in 5.10.154
5bc3ece38082 Revert "serial: 8250: Let drivers request full 16550A feature probing"
f466ca1247d7 Merge 5.10.154 into android12-5.10-lts
6d46ef50b123 Linux 5.10.156
7be134eb691f Revert "net: broadcom: Fix BCMGENET Kconfig"
957732a09c38 ntfs: check overflow when iterating ATTR_RECORDs
6322dda48334 ntfs: fix out-of-bounds read in ntfs_attr_find()
b825bfbbaafb ntfs: fix use-after-free in ntfs_attr_find()
294ef12dccc6 mm: fs: initialize fsdata passed to write_begin/write_end interface
a8e2fc8f7b41 9p/trans_fd: always use O_NONBLOCK read/write
a5da76df467a gfs2: Switch from strlcpy to strscpy
5fa30be7ba81 gfs2: Check sb_bsize_shift after reading superblock
f14858bc77c5 9p: trans_fd/p9_conn_cancel: drop client lock earlier
4154b6afa2bd kcm: close race conditions on sk_receive_queue
7deb7a9d33e4 kcm: avoid potential race in kcm_tx_work
35309be06b6f tcp: cdg: allow tcp_cdg_release() to be called multiple times
e929ec98c0c3 macvlan: enforce a consistent minimal mtu
95ebea5a15e4 uapi/linux/stddef.h: Add include guards
3f25add5ecf8 Input: i8042 - fix leaking of platform device on module removal
7d606ae1abcc kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
89ece5ff7dbe scsi: scsi_debug: Fix possible UAF in sdebug_add_host_helper()
75205f1b47a8 scsi: target: tcm_loop: Fix possible name leak in tcm_loop_setup_hba_bus()
6e9334436d78 net: use struct_group to copy ip/ipv6 header addresses
9fd7bdaffe0e stddef: Introduce struct_group() helper macro
47c3bdd95505 usbnet: smsc95xx: Fix deadlock on runtime resume
8208c266fe27 ring-buffer: Include dropped pages in counting dirty patches
36b5095b07ac net: fix a concurrency bug in l2tp_tunnel_register()
023435a095d2 nvme: ensure subsystem reset is single threaded
b9a5ecf24180 nvme: restrict management ioctls to admin
5e2f14d77223 perf/x86/intel/pt: Fix sampling using single range output
62634b43d3c4 misc/vmw_vmci: fix an infoleak in vmci_host_do_receive_datagram()
c1eb46a65b09 docs: update mediator contact information in CoC doc
4423866d31a0 mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put()
440653a180f5 mmc: sdhci-pci-o2micro: fix card detect fail issue caused by CD# debounce timeout
8e70b1413178 mmc: core: properly select voltage range without power cycle
05b0f6624dda firmware: coreboot: Register bus in module init
deda86a0d84d iommu/vt-d: Set SRE bit only when hardware has SRS cap
d2c7d8f58e9c scsi: zfcp: Fix double free of FSF request when qdio send fails
db744288af73 maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault()
24cc679abbf3 Input: iforce - invert valid length check when fetching device IDs
5f4611fe012f serial: 8250_lpss: Configure DMA also w/o DMA filter
8679087e9357 serial: 8250: Flush DMA Rx on RLSI
a5eaad87bfca serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs
f59f5a269ca5 dm ioctl: fix misbehavior if list_versions races with module loading
67a75a9480fc iio: pressure: ms5611: changed hardcoded SPI speed to value limited
d95b85c5084a iio: adc: mp2629: fix potential array out of bound access
46b8bc62c5ea iio: adc: mp2629: fix wrong comparison of channel
8dddf2699da2 iio: trigger: sysfs: fix possible memory leak in iio_sysfs_trig_init()
85d2a8b287a8 iio: adc: at91_adc: fix possible memory leak in at91_adc_allocate_trigger()
85cc1a2fd8bf usb: typec: mux: Enter safe mode only when pins need to be reconfigured
efaab055201b usb: chipidea: fix deadlock in ci_otg_del_timer
143ba5c2d2a7 usb: add NO_LPM quirk for Realforce 87U Keyboard
249cef723fee USB: serial: option: add Fibocom FM160 0x0111 composition
5c44c60358da USB: serial: option: add u-blox LARA-L6 modem
0e88a3cfa6ed USB: serial: option: add u-blox LARA-R6 00B modem
de707957d9d4 USB: serial: option: remove old LARA-R6 PID
878227a3ddb2 USB: serial: option: add Sierra Wireless EM9191
25c652811ddd USB: bcma: Make GPIO explicitly optional
eb3af3ea5bca speakup: fix a segfault caused by switching consoles
8cbaf4ed530e slimbus: stream: correct presence rate frequencies
15155f7c0e30 Revert "usb: dwc3: disable USB core PHY management"
100d1e53bb3b ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360
c7dcc8948279 ALSA: hda/realtek: fix speakers for Samsung Galaxy Book Pro
a80369c8ca50 ALSA: usb-audio: Drop snd_BUG_ON() from snd_usbmidi_output_open()
28a54854a959 tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
bb70fcae4115 tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
315b149f0822 tracing: Fix wild-memory-access in register_synth_event()
65ba7e7c2411 tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
5d4cc7bc1a8d tracing/ring-buffer: Have polling block on watermark
5fdebbeca5db ring_buffer: Do not deactivate non-existant pages
6a14828cadda ftrace: Fix null pointer dereference in ftrace_add_mod()
6ed60c60ec90 ftrace: Optimize the allocation for mcount entries
9569eed79bc0 ftrace: Fix the possible incorrect kernel message
5fc19c831320 cifs: add check for returning value of SMB2_set_info_init
0aeb0de528ec net: thunderbolt: Fix error handling in tbnet_init()
e13ef43813eb cifs: Fix wrong return value checking when GETFLAGS
9f00da9c866d net/x25: Fix skb leak in x25_lapb_receive_frame()
94822d23310a net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
3aeb13bc3db2 cifs: add check for returning value of SMB2_close_init
c24013273ed4 platform/x86/intel: pmc: Don't unconditionally attach Intel PMC when virtualized
9ed51414aef6 drbd: use after free in drbd_create_device()
6b23a4b25204 net: ena: Fix error handling in ena_init()
2d5a49550135 net: ionic: Fix error handling in ionic_init_module()
bb9924a6edd9 xen/pcpu: fix possible memory leak in register_pcpu()
d6a561bd4c53 bnxt_en: Remove debugfs when pci_register_driver failed
389738f5dbc5 net: caif: fix double disconnect client in chnl_net_open()
fb5ee1560bab net: macvlan: Use built-in RCU list checking
709aa1f73d3e mISDN: fix misuse of put_device() in mISDN_register_device()
417f2d2edf30 net: liquidio: release resources when liquidio driver open failed
4cba73f2d6fc net: hinic: Fix error handling in hinic_module_init()
083a2c9ef82e mISDN: fix possible memory leak in mISDN_dsp_element_register()
6b23993d5bef net: bgmac: Drop free_netdev() from bgmac_enet_remove()
1f6a73b25dab bpf: Initialize same number of free nodes for each pcpu_freelist
ef2ac07ab831 ata: libata-transport: fix error handling in ata_tdev_add()
7377a14598f6 ata: libata-transport: fix error handling in ata_tlink_add()
b5362dc1634d ata: libata-transport: fix error handling in ata_tport_add()
ac471468f7c1 ata: libata-transport: fix double ata_host_put() in ata_tport_add()
ac4f404c250b arm64: dts: imx8mn: Fix NAND controller size-cells
30ece7dbeeca arm64: dts: imx8mm: Fix NAND controller size-cells
f68a9efd7895 ARM: dts: imx7: Fix NAND controller size-cells
1d160dfb3fdf drm: Fix potential null-ptr-deref in drm_vblank_destroy_worker()
c47a823ea186 drm/drv: Fix potential memory leak in drm_dev_init()
c776a49d099c drm/panel: simple: set bpc field for logic technologies displays
777430aa4ddc pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map
bce3e6fe8ba7 parport_pc: Avoid FIFO port location truncation
a4b5423f88a1 siox: fix possible memory leak in siox_device_add()
0679f571d3de arm64: Fix bit-shifting UB in the MIDR_CPU_MODEL() macro
58636b5ff3f6 block: sed-opal: kmalloc the cmd/resp buffers
e27458b18b35 sctp: clear out_curr if all frag chunks of current msg are pruned
0b4c259b63ea sctp: remove the unnecessary sinfo_stream check in sctp_prsctp_prune_unsent
7360e7c29d27 ASoC: soc-utils: Remove __exit for snd_soc_util_exit()
e60f37a1d379 bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()
b8fe1a5aa733 tty: n_gsm: fix sleep-in-atomic-context bug in gsm_control_send
0a3160f4ffc7 serial: imx: Add missing .thaw_noirq hook
7e1f908e65c5 serial: 8250: omap: Flush PM QOS work on remove
d833cba201ad serial: 8250: omap: Fix unpaired pm_runtime_put_sync() in omap8250_remove()
b0b6ea651ecf serial: 8250_omap: remove wait loop from Errata i202 workaround
f14c312c2189 serial: 8250: omap: Fix missing PM runtime calls for omap8250_set_mctrl()
85cdbf04b435 serial: 8250: Remove serial_rs485 sanitization from em485
f5dedad4059b ASoC: tas2764: Fix set_tdm_slot in case of single slot
9e82d78fbe54 ASoC: tas2770: Fix set_tdm_slot in case of single slot
8d21554ec768 ASoC: core: Fix use-after-free in snd_soc_exit()
38ca9bd336c8 spi: stm32: Print summary 'callbacks suppressed' message
a180da5564b5 drm/amdgpu: disable BACO on special BEIGE_GOBY card
f3adf0adf306 drm/amd/pm: disable BACO entry/exit completely on several sienna cichlid cards
b0faeff69a0a drm/amd/pm: Read BIF STRAP also for BACO check
6958556285ec drm/amd/pm: support power source switch on Sienna Cichlid
7daab001a6f6 mmc: sdhci-esdhc-imx: use the correct host caps for MMC_CAP_8_BIT_DATA
65ac4d1807d2 spi: intel: Use correct mask for flash and protected regions
23793518a752 mtd: spi-nor: intel-spi: Disable write protection only if asked
a326fffdc78b ALSA: hda/realtek: fix speakers and micmute on HP 855 G8
24839d027c83 ASoC: codecs: jz4725b: Fix spelling mistake "Sourc" -> "Source", "Routee" -> "Route"
bd487932408d Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
ce75e9085988 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
16743c4bf3ef drm/imx: imx-tve: Fix return type of imx_tve_connector_mode_valid
df2747f295ac i2c: i801: add lis3lv02d's I2C address for Vostro 5568
959cb0fd6951 i2c: tegra: Allocate DMA memory for DMA engine
6cb657722e37 NFSv4: Retry LOCK on OLD_STATEID during delegation return
f0187227e2b8 drm/amd/display: Remove wrong pipe control lock
bb3edbd09287 ASoC: rt1308-sdw: add the default value of some registers
b1619f030776 selftests/intel_pstate: fix build for ARCH=x86_64
fdf680760629 selftests/futex: fix build for clang
c1f0defecbdc ASoC: codecs: jz4725b: fix capture selector naming
aeb7e8bc0d3e ASoC: codecs: jz4725b: use right control for Capture Volume
c87945c17385 ASoC: codecs: jz4725b: fix reported volume for Master ctl
9aae00961ab3 ASoC: codecs: jz4725b: add missed Line In power control bit
0b4d650f905c spi: intel: Fix the offset to get the 64K erase opcode
6910e7279f5d ASoC: wm8962: Add an event handler for TEMP_HP and TEMP_SPK
c7432616f6aa ASoC: mt6660: Keep the pm_runtime enables before component stuff in mt6660_i2c_probe
a47606064cc0 ASoC: wm8997: Revert "ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe"
f8f254c8b506 ASoC: wm5110: Revert "ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe"
c73aa2cc4156 ASoC: wm5102: Revert "ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe"
673a7341bdab Merge 5.10.153 into android12-5.10-lts
27b36ba7c21c Merge 5.10.152 into android12-5.10-lts
bf759deb0f59 Merge 5.10.151 into android12-5.10-lts
6b31c548a114 ANDROID: fix up struct sk_buf ABI breakage
bd66e91ad254 ANDROID: fix up CRC issue with struct tcp_sock
3905cfd1d672 Revert "serial: 8250: Toggle IER bits on only after irq has been set up"
41217963b1d9 Linux 5.10.155
0f544353fec8 io_uring: kill goto error handling in io_sqpoll_wait_sq()
154d744fbefc x86/cpu: Restore AMD's DE_CFG MSR after resume
e7294b01de40 mmc: sdhci-esdhc-imx: Convert the driver to DT-only
534762e261c8 net: tun: call napi_schedule_prep() to ensure we own a napi
367bc0fa988f dmaengine: at_hdmac: Check return code of dma_async_device_register
85f97c97efc5 dmaengine: at_hdmac: Fix impossible condition
f53a233eaad6 dmaengine: at_hdmac: Don't allow CPU to reorder channel enable
f4512855223c dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors
6be4ab08c863 dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware
a35dd5dd98b6 dmaengine: at_hdmac: Fix concurrency over the active list
0f603bf553a7 dmaengine: at_hdmac: Free the memset buf without holding the chan lock
7f07cecc7411 dmaengine: at_hdmac: Fix concurrency over descriptor
1582cc3b4805 dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()
9b69060a725d dmaengine: at_hdmac: Protect atchan->status with the channel lock
ee356822618e dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all
7078e935b410 dmaengine: at_hdmac: Fix premature completion of desc in issue_pending
ad4cbe8e9c3a dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending
24f9e93e506a dmaengine: at_hdmac: Don't start transactions at tx_submit level
4b51cce72ab7 dmaengine: at_hdmac: Fix at_lli struct definition
d37dfb9357e9 cert host tools: Stop complaining about deprecated OpenSSL functions
f8e0edeaa0f2 can: j1939: j1939_send_one(): fix missing CAN header initialization
0b692d41ee5c mm/memremap.c: map FS_DAX device memory as decrypted
03f9582a6a2e udf: Fix a slab-out-of-bounds write bug in udf_find_entry()
4ea3aa3b983b mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI
9c0accfa5a35 btrfs: selftests: fix wrong error check in btrfs_free_dummy_root()
8fa0c22ef824 platform/x86: hp_wmi: Fix rfkill causing soft blocked wifi
b5ee579fcb14 drm/i915/dmabuf: fix sg_table handling in map_dma_buf
4feedde5486c nilfs2: fix use-after-free bug of ns_writer on remount
1d4ff7306209 nilfs2: fix deadlock in nilfs_count_free_blocks()
344ddbd688d8 ata: libata-scsi: fix SYNCHRONIZE CACHE (16) command failure
516f9f23008b vmlinux.lds.h: Fix placement of '.data..decrypted' section
f6896fb69d50 ALSA: usb-audio: Add DSD support for Accuphase DAC-60
2032c2d32b2a ALSA: usb-audio: Add quirk entry for M-Audio Micro
a414a6d6ef3c ALSA: hda/realtek: Add Positivo C6300 model quirk
3a79f9568de0 ALSA: hda: fix potential memleak in 'add_widget_node'
380d64168da4 ALSA: hda/ca0132: add quirk for EVGA Z390 DARK
181cfff57bdc ALSA: hda/hdmi - enable runtime pm for more AMD display audio
ea6787e482ad mmc: sdhci-tegra: Fix SDHCI_RESET_ALL for CQHCI
0a8d4531a0d5 mmc: sdhci_am654: Fix SDHCI_RESET_ALL for CQHCI
3f558930add7 mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI
b55e64d0a3a3 mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI
4631cb040645 MIPS: jump_label: Fix compat branch range check
475fd3991a0d arm64: efi: Fix handling of misaligned runtime regions and drop warning
94ab8f88feb7 riscv: fix reserved memory setup
0cf9cb061493 riscv: Separate memory init from paging init
d7716240bca5 riscv: Enable CMA support
ecf78af5141f riscv: vdso: fix build with llvm
e56d18a976dd riscv: process: fix kernel info leakage
956e0216a199 net: macvlan: fix memory leaks of macvlan_common_newlink
59ec132386a0 ethernet: tundra: free irq when alloc ring failed in tsi108_open()
dd7beaec8b48 net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()
56d3b5531bf6 ethernet: s2io: disable napi when start nic failed in s2io_card_up()
05b222843457 net: atlantic: macsec: clear encryption keys from the stack
1a4e495edfe2 net: phy: mscc: macsec: clear encryption keys when freeing a flow
4ad684ba028c cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open()
38aa7ed8c2c3 net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()
fd52dd2d6e2f net: cpsw: disable napi in cpsw_ndo_open()
3b27e20601ab net/mlx5e: E-Switch, Fix comparing termination table instance
eb6fa0ac2a9c net/mlx5: Allow async trigger completion execution on single CPU systems
bdd282bba72d net: nixge: disable napi when enable interrupts failed in nixge_open()
5333cf1b7f68 net: marvell: prestera: fix memory leak in prestera_rxtx_switch_init()
cf4853880e24 perf stat: Fix printing os->prefix in CSV metrics output
3a4a3c3b1fe6 drivers: net: xgene: disable napi when register irq failed in xgene_enet_open()
0b7ee3d50f32 dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove()
6e2ffae69d17 dmaengine: pxa_dma: use platform_get_irq_optional
f31dd1585809 tipc: fix the msg->req tlv len check in tipc_nl_compat_name_table_dump_header
fbb4e8e6dc7b net: broadcom: Fix BCMGENET Kconfig
cb6d639bb1ef net: stmmac: dwmac-meson8b: fix meson8b_devm_clk_prepare_enable()
d68fa77ee3d0 can: af_can: fix NULL pointer dereference in can_rx_register()
a033b86c7f76 ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network
02f8dfee7580 tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent
f3aa8a7d9550 drm/vc4: Fix missing platform_unregister_drivers() call in vc4_drm_register()
bcb3bb10695f hamradio: fix issue of dev reference count leakage in bpq_device_event()
bc4591a86b8f net: lapbether: fix issue of dev reference count leakage in lapbeth_device_event()
2bf8b1c111ff KVM: s390: pv: don't allow userspace to set the clock under PV
a60cc64db72f KVM: s390x: fix SCK locking
fcbd2b336834 capabilities: fix undefined behavior in bit shift for CAP_TO_MASK
8aae24b0ed76 net: fman: Unregister ethernet device on removal
e2c5ee3b628f bnxt_en: fix potentially incorrect return value for ndo_rx_flow_steer
38147073c96d bnxt_en: Fix possible crash in bnxt_hwrm_set_coal()
3401f964028a net: tun: Fix memory leaks of napi_get_frags
adaa0f180de5 macsec: clear encryption keys from the stack after setting up offload
9dc7503bae33 macsec: fix detection of RXSCs when toggling offloading
7f4456f0119b macsec: fix secy->n_rx_sc accounting
3b05d9073ae2 macsec: delete new rxsc when offload fails
50868de7dc4e net: gso: fix panic on frag_list with mixed head alloc types
cedd4f01f67b bpf: Fix wrong reg type conversion in release_reference()
9069db2579e9 bpf: Add helper macro bpf_for_each_reg_in_vstate
95b6ec733752 bpf: Support for pointers beyond pkt_end.
8597b59e3d22 HID: hyperv: fix possible memory leak in mousevsc_probe()
8c80b2fca411 bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE
cc21dc48a78c bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues
e1e12180321f wifi: cfg80211: fix memory leak in query_regdb_file()
914cb94e738b wifi: cfg80211: silence a sparse RCU warning
72ea2fc29962 phy: stm32: fix an error code in probe
925bf1ba7604 hwspinlock: qcom: correct MMIO max register for newer SoCs
76eba54f0ddf fuse: fix readdir cache race
7bcea6c5c90a ANDROID: gki_defconfig: remove CONFIG_INIT_STACK_ALL_ZERO=y
d2bc3376cd31 Revert "serial: 8250: Fix restoring termios speed after suspend"
0b500f5b168c Merge 5.10.150 into android12-5.10-lts
f5b40c0eb9ea Linux 5.10.154
bf506e366da4 ipc: remove memcg accounting for sops objects in do_semtimedop()
c6678c8f4f3f wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker()
a6c57adec567 drm/i915/sdvo: Setup DDC fully before output init
b86830cc95af drm/i915/sdvo: Filter out invalid outputs more sensibly
9f3b8678080a drm/rockchip: dsi: Force synchronous probe
23f1fc7ce55f ext4,f2fs: fix readahead of verity data
e5cef906cb40 KVM: x86: emulator: update the emulation mode after CR0 write
ce9261accccd KVM: x86: emulator: introduce emulator_recalc_and_set_mode
c8a2fd7a715d KVM: x86: emulator: em_sysexit should update ctxt->mode
e0c7410378cd KVM: x86: Mask off reserved bits in CPUID.80000001H
9302ebc1c21d KVM: x86: Mask off reserved bits in CPUID.80000008H
cc40c5f3e921 KVM: x86: Mask off reserved bits in CPUID.8000001AH
bd64a88f364c KVM: x86: Mask off reserved bits in CPUID.80000006H
156451a67b93 ext4: fix BUG_ON() when directory entry has invalid rec_len
5370b965b7a9 ext4: fix warning in 'ext4_da_release_space'
c9598cf62953 parisc: Avoid printing the hardware path twice
98f836e80d21 parisc: Export iosapic_serial_irq() symbol for serial port driver
814af9a32b03 parisc: Make 8250_gsc driver dependend on CONFIG_PARISC
29d106d086d2 perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
98f6e7c33703 perf/x86/intel: Fix pebs event constraints for ICL
3be2d66822a0 efi: random: Use 'ACPI reclaim' memory for random seed
83294f7c7759 efi: random: reduce seed size to 32 bytes
f8e8cda869fd fuse: add file_modified() to fallocate
cdf01c807e97 capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
ff32d8a099dc tracing/histogram: Update document for KEYS_MAX size
533bfacbacb8 tools/nolibc/string: Fix memcmp() implementation
f100a0274861 kprobe: reverse kp->flags when arm_kprobe failed
bef08acbe560 tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
2bf33b5ea46d tcp/udp: Make early_demux back namespacified.
ea5f2fd4640e ftrace: Fix use-after-free for dynamic ftrace_ops
06de93a47cec btrfs: fix type of parameter generation in btrfs_get_dentry
e33ce54cef5d coresight: cti: Fix hang in cti_disable_hw()
015ac18be7de binder: fix UAF of alloc->vma in race with munmap()
836686e1a01d memcg: enable accounting of ipc resources
e4e4b24b42e7 mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times
818c36b988b8 tcp/udp: Fix memory leak in ipv6_renew_options().
29997a6fa60d fscrypt: fix keyring memory leak on mount failure
391cceee6d43 fscrypt: stop using keyrings subsystem for fscrypt_master_key
092401142b95 fscrypt: simplify master key locking
54c13d3520ef ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
a0e2577cf3cc block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
26ca2ac091b4 Bluetooth: L2CAP: Fix attempting to access uninitialized memory
6b6f94fb9a74 Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
bfd5e62f9a7e i2c: piix4: Fix adapter not be removed in piix4_remove()
fc3e2fa0a5fb arm64: dts: juno: Add thermal critical trip points
b743ecf29ca7 firmware: arm_scmi: Make Rx chan_setup fail on memory errors
29e8e9bfc2f2 firmware: arm_scmi: Suppress the driver's bind attributes
d7b1e2cbe0a4 ARM: dts: imx6qdl-gw59{10,13}: fix user pushbutton GPIO offset
160d8904b2b5 efi/tpm: Pass correct address to memblock_reserve
c40b4d604b3e i2c: xiic: Add platform module alias
5bf8c7798b1c drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
496eb203d046 HID: saitek: add madcatz variant of MMO7 mouse device ID
ff06067b7086 scsi: core: Restrict legal sdev_state transitions via sysfs
9edf20e5a1d8 ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
be6e22f54623 media: meson: vdec: fix possible refcount leak in vdec_probe()
c5fd54a65c35 media: dvb-frontends/drxk: initialize err to 0
7fdc58d8c213 media: cros-ec-cec: limit msg.len to CEC_MAX_MSG_SIZE
1609231f8676 media: s5p_cec: limit msg.len to CEC_MAX_MSG_SIZE
c46759e3703b media: rkisp1: Zero v4l2_subdev_format fields in when validating links
3144ce557440 media: rkisp1: Initialize color space on resizer sink and source pads
6b24d9c2acda s390/boot: add secure boot trailer
efc6420d65ae xhci-pci: Set runtime PM as default policy on all xHC 1.2 or later devices
37bb57908dd3 mtd: parsers: bcm47xxpart: Fix halfblock reads
85e458369c0f mtd: parsers: bcm47xxpart: print correct offset on read error
ec54104febdc fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
f8c86d782952 video/fbdev/stifb: Implement the stifb_fillrect() function
e975d7aecad7 mmc: sdhci-pci-core: Disable ES for ASUS BIOS on Jasper Lake
afeae13b8a3c mmc: sdhci-pci: Avoid comma separated statements
a06721767cfc mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus
59400c9b0d07 drm/msm/hdmi: fix IRQ lifetime
8225bdaec5b0 drm/msm/hdmi: Remove spurious IRQF_ONESHOT flag
5dbb47ee8976 ipv6: fix WARNING in ip6_route_net_exit_late()
1c89642e7f2b net, neigh: Fix null-ptr-deref in neigh_table_clear()
634f066d02bd net: mdio: fix undefined behavior in bit shift for __mdiobus_register
d9ec6e2fbd4a Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
cb1c012099ef Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
0a0dead4ad1a btrfs: fix ulist leaks in error paths of qgroup self tests
61e061281137 btrfs: fix inode list leak during backref walking at find_parent_nodes()
a52e24c7fcc3 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
81204283ea13 isdn: mISDN: netjet: fix wrong check of device registration
e77d213843e6 mISDN: fix possible memory leak in mISDN_register_device()
f06186e5271b rose: Fix NULL pointer dereference in rose_send_frame()
2c8d81bdb268 ipvs: fix WARNING in ip_vs_app_net_cleanup()
931f56d59c85 ipvs: fix WARNING in __ip_vs_cleanup_batch()
d69328cdb92f ipvs: use explicitly signed chars
b2d7a92aff0f netfilter: nf_tables: release flow rule object from commit path
3583826b443a net: tun: fix bugs for oversize packet when napi frags enabled
5960b9081bac net: sched: Fix use after free in red_enqueue()
24f9c41435a8 ata: pata_legacy: fix pdc20230_set_piomode()
c85ee1c3cbc6 net: fec: fix improper use of NETDEV_TX_BUSY
52438e734c15 nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()
0acfcd2aed4f nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()
9ae2c9a91ff0 nfc: nxp-nci: Fix potential memory leak in nxp_nci_send()
eecea068bf11 NFC: nxp-nci: remove unnecessary labels
e8c11ee2d07f nfc: fdp: Fix potential memory leak in fdp_nci_send()
31b83d6990c8 nfc: fdp: drop ftrace-like debugging messages
4e1e4485b252 RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()
d360e875c011 RDMA/core: Fix null-ptr-deref in ib_core_cleanup()
37a098fc9b42 net: dsa: Fix possible memory leaks in dsa_loop_init()
45aea4fbf61e nfs4: Fix kmemleak when allocate slot failed
f0f1c74fa670 NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
10c554d72275 NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
4813dd737dc4 NFSv4: Fix a potential state reclaim deadlock
7c4260f8f188 IB/hfi1: Correctly move list in sc_disable()
87ac93c8dd6d RDMA/cma: Use output interface for net_dev check
4dbb739eb29c KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
bb584caee895 KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
9faacf442d11 KVM: x86: Protect the unused bits in MSR exiting flags
5bdbccc79c86 x86/topology: Fix duplicated core ID within a package
6c31fc028a65 x86/topology: Fix multiple packages shown on a single-package system
f5ad52da145a x86/topology: Set cpu_die_id only if DIE_TYPE found
570fa3bcd2f9 KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)
e5d7c6786bef KVM: x86: Trace re-injected exceptions
8364786152d5 KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02
523e1dd9f8d4 KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01
028fcabd8a67 serial: ar933x: Deassert Transmit Enable on ->rs485_config()
e6da7808c955 serial: 8250: Let drivers request full 16550A feature probing
95aa34f72132 Linux 5.10.153
26a2b9c468de serial: Deassert Transmit Enable on probe in driver-specific way
4a230f65d6a8 serial: core: move RS485 configuration tasks from drivers into core
eb69c07eca22 can: rcar_canfd: rcar_canfd_handle_global_receive(): fix IRQ storm on global FIFO receive
d5924531dd8a arm64/kexec: Test page size support with new TGRAN range values
c911f03f8d44 arm64/mm: Fix __enable_mmu() for new TGRAN range values
d523384766fd scsi: sd: Revert "scsi: sd: Remove a local variable"
52a43b82006d arm64: Add AMPERE1 to the Spectre-BHB affected list
9889ca7efa12 net: enetc: survive memory pressure without crashing
fdba224ab028 net/mlx5: Fix crash during sync firmware reset
bbcc06933f35 net/mlx5: Fix possible use-after-free in async command interface
16376ba5cfd7 net/mlx5e: Do not increment ESN when updating IPsec ESN state
0d88359092dd nh: fix scope used to find saddr when adding non gw nh
3519b5ddac21 net: ehea: fix possible memory leak in ehea_register_port()
79631daa5a51 openvswitch: switch from WARN to pr_warn
00d6f33f6782 ALSA: aoa: Fix I2S device accounting
ce6fd1c382a3 ALSA: aoa: i2sbus: fix possible memory leak in i2sbus_add_dev()
97262705c0cb net: fec: limit register access on i.MX6UL
df67a8e625fc PM: domains: Fix handling of unavailable/disabled idle states
1f262d80882a net: ksz884x: fix missing pci_disable_device() on error in pcidev_init()
6170b4579f36 i40e: Fix flow-type by setting GL_HASH_INSET registers
9abae363af5c i40e: Fix VF hang when reset is triggered on another VF
23d5599058a0 i40e: Fix ethtool rx-flow-hash setting for X722
44affe7ede59 ipv6: ensure sane device mtu in tunnels
905f05c0ab19 media: vivid: set num_in/outputs to 0 if not supported
b6c7446d0a38 media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced'
683015ae1634 media: v4l2-dv-timings: add sanity checks for blanking values
147b8f1892aa media: vivid: dev->bitmap_cap wasn't freed in all cases
1cf51d51581c media: vivid: s_fbuf: add more sanity checks
3221c2701d19 PM: hibernate: Allow hybrid sleep to work with s2idle
0eb19ecbd0a9 can: mcp251x: mcp251x_can_probe(): add missing unregister_candev() in error path
6b2d07fc0b0a can: mscan: mpc5xxx: mpc5xxx_can_probe(): add missing put_clock() in error path
1634d5d39cfd tcp: fix indefinite deferral of RTO with SACK reneging
4f23cb2be530 tcp: fix a signed-integer-overflow bug in tcp_add_backlog()
49713d7c3858 tcp: minor optimization in tcp_add_backlog()
aab883bd60bc net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSY
c3edc6e80820 net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed
e2a28807b1ce kcm: annotate data-races around kcm->rx_wait
c325f92d8d9b kcm: annotate data-races around kcm->rx_psock
af7879529e5a atlantic: fix deadlock at aq_nic_stop
d7ccd49c4dd9 amd-xgbe: add the bit rate quirk for Molex cables
17350734fdca amd-xgbe: fix the SFP compliance codes check for DAC cables
b55d6ea965ba x86/unwind/orc: Fix unreliable stack dump with gcov
0ce1ef335300 net: hinic: fix the issue of double release MBOX callback of VF
6603843c80b1 net: hinic: fix the issue of CMDQ memory leaks
bb01910763f9 net: hinic: fix memory leak when reading function table
ce605b68db53 net: hinic: fix incorrect assignment issue in hinic_set_interrupt_cfg()
62f0a08e82a6 net: netsec: fix error handling in netsec_register_mdio()
32a3d4660b34 tipc: fix a null-ptr-deref in tipc_topsrv_accept
fb94152aae88 perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap()
bfce73088682 ALSA: ac97: fix possible memory leak in snd_ac97_dev_register()
2663b16c76d0 ASoC: qcom: lpass-cpu: Mark HDMI TX parity register as volatile
a52755729956 arc: iounmap() arg is volatile
648ac633e764 ASoC: qcom: lpass-cpu: mark HDMI TX registers as volatile
6571f6ca8a21 drm/msm: Fix return type of mdp4_lvds_connector_mode_valid
4953a989b72d media: v4l2: Fix v4l2_i2c_subdev_set_name function documentation
9d00384270b1 net: ieee802154: fix error return code in dgram_bind()
568e3812b177 mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
935a8b620210 mm/memory: add non-anonymous page check in the copy_present_page()
49db6cb81400 xen/gntdev: Prevent leaking grants
a3f2cc11d6b6 Xen/gntdev: don't ignore kernel unmapping error
467230b9ef40 s390/pci: add missing EX_TABLE entries to __pcistg_mio_inuser()/__pcilg_mio_inuser()
fe187c801a44 s390/futex: add missing EX_TABLE entry to __futex_atomic_op()
449070996ce6 perf auxtrace: Fix address filter symbol name match for modules
6f72a3977ba9 kernfs: fix use-after-free in __kernfs_remove
0bcd1ab3e8b3 counter: microchip-tcb-capture: Handle Signal1 read and Synapse
8bf037279b58 mmc: core: Fix kernel panic when remove non-standard SDIO card
5684808b269b mmc: sdhci_am654: 'select', not 'depends' REGMAP_MMIO
b686ffc0acb8 drm/msm/dp: fix IRQ lifetime
08c7375fa27a drm/msm/hdmi: fix memory corruption with too many bridges
21c4679af01f drm/msm/dsi: fix memory corruption with too many bridges
44a86d96fac8 scsi: qla2xxx: Use transport-defined speed mask for supported_speeds
c368f751da8e mac802154: Fix LQI recording
9ba2990f4e80 exec: Copy oldsighand->action under spin-lock
706215300411 fs/binfmt_elf: Fix memory leak in load_elf_binary()
d9ddfeb01fb9 fbdev: smscufx: Fix several use-after-free bugs
f19f1a75d378 iio: temperature: ltc2983: allocate iio channels once
af236da8552e iio: light: tsl2583: Fix module unloading
90ff5bef2bc7 tools: iio: iio_utils: fix digit calculation
678d2cc2041c xhci: Remove device endpoints from bandwidth list when freeing the device
3b250824b6d3 xhci: Add quirk to reset host back to default state at shutdown
63c7df3c818e mtd: rawnand: marvell: Use correct logic for nand-keep-config
228101fc832f usb: xhci: add XHCI_SPURIOUS_SUCCESS to ASM1042 despite being a V0.96 controller
2bc4f99ee243 usb: bdc: change state when port disconnected
e440957f9c8b usb: dwc3: gadget: Don't set IMI for no_interrupt
fb074d622ccc usb: dwc3: gadget: Stop processing more requests on IMI
c29fcef5791d USB: add RESET_RESUME quirk for NVIDIA Jetson devices in RCM
4cc7a360ec3b ALSA: rme9652: use explicitly signed char
895909230008 ALSA: au88x0: use explicitly signed char
2bf5b1631569 ALSA: Use del_timer_sync() before freeing timer
ca1034bff85a can: kvaser_usb: Fix possible completions during init_completion
370be31cde50 can: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before kfree_skb()
7d51b4c67cfb Linux 5.10.152
43d5109296fa udp: Update reuse->has_conns under reuseport_lock.
a50ed2d28727 mm: /proc/pid/smaps_rollup: fix no vma's null-deref
31b1570677e8 blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
e2f9b62ead9a mmc: core: Add SD card quirk for broken discard
3a260e9844c9 Makefile.debug: re-enable debug info for .S files
6ab2287b26f1 x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB
67dafece56b6 ACPI: video: Force backlight native for more TongFang devices
dcaf6313202a hv_netvsc: Fix race between VF offering and VF association message from host
da54c5f4b5b5 perf/x86/intel/pt: Relax address filter validation
79c3482fbe20 riscv: topology: fix default topology reporting
a6e770733dc4 arm64: topology: move store_cpu_topology() to shared code
cb1024d8a4d0 arm64: dts: qcom: sc7180-trogdor: Fixup modem memory region
f687e2111b6f fcntl: fix potential deadlocks for &fown_struct.lock
b1efc196446a fcntl: make F_GETOWN(EX) return 0 on dead owner task
ca4c49838278 perf: Skip and warn on unknown format 'configN' attrs
dea47fefa6aa perf pmu: Validate raw event with sysfs exported format bits
86e995f964f6 riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb
0e4c06ae7c54 riscv: Add machine name to kernel boot log and stack dump output
7fba4a389d07 mmc: sdhci-tegra: Use actual clock rate for SW tuning correction
3c6a888e3522 xen/gntdev: Accommodate VMA splitting
5232411f37d7 xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests
ea82edad0aee tracing: Do not free snapshot if tracer is on cmdline
bd6af07e7993 tracing: Simplify conditional compilation code in tracing_set_tracer()
4e3a15ca24b3 dmaengine: mxs: use platform_driver_register
1da5d2497046 dmaengine: mxs-dma: Remove the unused .id_table
1414e9bf3c30 drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb()
d74196bb278b iommu/vt-d: Clean up si_domain in the init_dmars() error path
ef11e8ec00b9 iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()
35c92435be76 net: phy: dp83822: disable MDI crossover status change interrupt
7aa3d623c11b net: sched: fix race condition in qdisc_graft()
2974f3b330ef net: hns: fix possible memory leak in hnae_ae_register()
3032e316e0a9 sfc: include vport_id in filter spec hash and equal()
ded86c4191a3 net: sched: sfb: fix null pointer access issue when sfb_init() fails
305aa36b628e net: sched: delete duplicate cleanup of backlog and qlen
ae48bee2830b net: sched: cake: fix null pointer access issue when cake_init() fails
2008ad08a2ae nvme-hwmon: kmalloc the NVME SMART log buffer
770b7e3a2c1f nvme-hwmon: consistently ignore errors from nvme_hwmon_init
67106ac27243 nvme-hwmon: Return error code when registration fails
bc17f727b005 nvme-hwmon: rework to avoid devm allocation
191d71c6357e ionic: catch NULL pointer issue on reconfig
ff7ba7667583 net: hsr: avoid possible NULL deref in skb_clone()
7286f8755104 cifs: Fix xid leak in cifs_ses_add_channel()
2d08311aa305 cifs: Fix xid leak in cifs_flock()
bf49d4fe4ab7 cifs: Fix xid leak in cifs_copy_file_range()
05cc22c0085e net: phy: dp83867: Extend RX strap quirk for SGMII mode
118f412bedc5 net/atm: fix proc_mpc_write incorrect return value
c8310a99e7e4 sfc: Change VF mac via PF as first preference if available.
39d10f0dfb72 HID: magicmouse: Do not set BTN_MOUSE on double report
ed5baf3d0a33 i40e: Fix DMA mappings leak
e558e1489384 tipc: fix an information leak in tipc_topsrv_kern_subscr
1f4ed95ce617 tipc: Fix recognition of trial period
fc8c6b8bb294 ACPI: extlog: Handle multiple records
57e157749ad9 btrfs: fix processing of delayed tree block refs during backref walking
590929ef6972 btrfs: fix processing of delayed data refs during backref walking
cc841a8a704c r8152: add PID for the Lenovo OneLink+ Dock
51b96ecaedc0 arm64: errata: Remove AES hwcap for COMPAT tasks
910ba49b3345 blk-wbt: call rq_qos_add() after wb_normal is initialized
392536023da1 block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init
ab6aaa821024 media: venus: dec: Handle the case where find_format fails
bce5808fc95d media: mceusb: set timeout to at least timeout provided
6d725672ce85 KVM: arm64: vgic: Fix exit condition in scan_its_table()
34db701dc65f kvm: Add support for arch compat vm ioctls
e55feb31df3f cpufreq: qcom: fix memory leak in error path
303d0f761431 ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS
6a2aadcb0186 ata: ahci-imx: Fix MODULE_ALIAS
d9f0159da05d hwmon/coretemp: Handle large core ID value
0fb04676c4fd x86/microcode/AMD: Apply the patch early on every logical thread
6dcf1f0802cc i2c: qcom-cci: Fix ordering of pm_runtime_xx and i2c_add_adapter
794ded0bc461 cpufreq: qcom: fix writes in read-only memory region
2723875e9d67 selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
0d65f040fdbb ocfs2: fix BUG when iput after ocfs2_mknod fails
b838dcfda164 ocfs2: clear dinode links count in case of error
c34d1b22fef3 Linux 5.10.151
ecad33121117 kbuild: Add skip_encoding_btf_enum64 option to pahole
c5006abb80e2 kbuild: Unify options for BTF generation for vmlinux and modules
f5f413cb3e8a kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21
06481cd9f7f6 kbuild: Quote OBJCOPY var to avoid a pahole call break the build
bbaea0f1cd33 bpf: Generate BTF_KIND_FLOAT when linking vmlinux
a10a57a224f3 Linux 5.10.150
243c8f42ba10 Revert "drm/amdgpu: make sure to init common IP before gmc"
8026d58b495a gcov: support GCC 12.1 and newer compilers
cbf2c43b36e0 f2fs: fix wrong condition to trigger background checkpoint correctly
7b19858803d7 thermal: intel_powerclamp: Use first online CPU as control_cpu
f039b43cbaea inet: fully convert sk->sk_rx_dst to RCU rules
67de22cb0b6c ext4: continue to expand file system when the target size doesn't reach
357db159e965 Revert "drm/amdgpu: use dirty framebuffer helper"
98ab15bfdcda Revert "drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega"
791489a5c563 net/ieee802154: don't warn zero-sized raw_sendmsg()
a96336a5f28b Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
dc54ff9fc4a4 net: ieee802154: return -EINVAL for unknown addr type
45c33966759e mm: hugetlb: fix UAF in hugetlb_handle_userfault
c378c479c517 io_uring/af_unix: defer registered files gc to io_uring release
67cbc8865a66 io_uring: correct pinned_vm accounting
904f881b5736 arm64: topology: fix possible overflow in amu_fie_setup()
b5dc2f25789d perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
9b4e849777a9 clk: bcm2835: Make peripheral PLLC critical
b8bbae3236ab usb: idmouse: fix an uninit-value in idmouse_open
d5bb45f47b37 nvmet-tcp: add bounds check on Transfer Tag
b79da0080d81 nvme: copy firmware_rev on each init
e6cc39db24a6 staging: rtl8723bs: fix a potential memory leak in rtw_init_cmd_priv()
3a5a34ed9d68 Revert "usb: storage: Add quirk for Samsung Fit flash"
acf0006f2b2b usb: musb: Fix musb_gadget.c rxstate overflow bug
91271a3e772e usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info()
782b3e71c957 md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
dbcca76435a6 HID: roccat: Fix use-after-free in roccat_read()
f00c049ede46 soundwire: intel: fix error handling on dai registration issues
f04a673d4a27 soundwire: cadence: Don't overwrite msg->buf during write commands
c263516c2c20 bcache: fix set_at_max_writeback_rate() for multiple attached devices
fcad2ac86399 ata: libahci_platform: Sanity check the DT child nodes number
19c010ae44f0 blk-throttle: prevent overflow while calculating wait time
1b3cebeca99e staging: vt6655: fix potential memory leak
89f305a71418 power: supply: adp5061: fix out-of-bounds read in adp5061_get_chg_type()
b2700f98b3f4 nbd: Fix hung when signal interrupts nbd_start_device_ioctl()
5942e5c63dc9 scsi: 3w-9xxx: Avoid disabling device if failing to enable it
48727117bd62 usb: host: xhci-plat: suspend/resume clks for brcm
c13d0d2f5a48 usb: host: xhci-plat: suspend and resume clocks
12d31182de8d clk: zynqmp: pll: rectify rate rounding in zynqmp_pll_round_rate
c2257c8a5015 media: cx88: Fix a null-ptr-deref bug in buffer_prepare()
d9e2585c3bce clk: zynqmp: Fix stack-out-of-bounds in strncpy`
70f8b48d0b61 btrfs: scrub: try to fix super block errors
8f554dd23c18 arm64: dts: imx8mq-librem5: Add bq25895 as max17055's power supply
451ce2521c21 kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT
017cabfb3f86 ARM: dts: imx6sx: add missing properties for sram
9d3ca48722d3 ARM: dts: imx6sll: add missing properties for sram
9735f2b62be9 ARM: dts: imx6sl: add missing properties for sram
2829b6ad30c2 ARM: dts: imx6qp: add missing properties for sram
0c3a0b3d5e9c ARM: dts: imx6dl: add missing properties for sram
2763a3b43ac3 ARM: dts: imx6q: add missing properties for sram
82e0d91484f7 ARM: dts: imx7d-sdb: config the max pressure for tsc2046
166feb964fc8 drm/amd/display: Remove interface for periodic interrupt 1
1bb6f4a8db5a drm/dp: Don't rewrite link config when setting phy test pattern
bb91c06b0be4 mmc: sdhci-msm: add compatible string check for sdm670
8a427a22839d drm/meson: explicitly remove aggregate driver at module unload time
1c7d957c5d81 drm/amdgpu: fix initial connector audio value
69130888b226 ASoC: SOF: pci: Change DMI match info to support all Chrome platforms
54f2585e2de0 platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
a9d6a7c9b685 platform/chrome: cros_ec: Notify the PM of wake events during resume
e29d20deaf9a drm: panel-orientation-quirks: Add quirk for Anbernic Win600
bfdb391d57df drm/vc4: vec: Fix timings for VEC modes
b70f8abc1a44 drm: bridge: dw_hdmi: only trigger hotplug event on link change
bbe2f6f90310 udmabuf: Set ubuf->sg = NULL if the creation of sg table fails
0a4fddc95c63 drm/amd/display: fix overflow on MIN_I64 definition
3959e8faf8bf gpu: lontium-lt9611: Fix NULL pointer dereference in lt9611_connector_init()
c28a8082b25c drm: Prevent drm_copy_field() to attempt copying a NULL pointer
e7d701800365 drm: Use size_t type for len variable in drm_copy_field()
3339a51bcd89 drm/nouveau/nouveau_bo: fix potential memory leak in nouveau_bo_alloc()
484400d433ca r8152: Rate limit overflow messages
0c108cf3ad38 Bluetooth: L2CAP: Fix user-after-free
65029aaedd15 net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
4851303c8539 wifi: rt2x00: correctly set BBP register 86 for MT7620
a01614447954 wifi: rt2x00: set SoC wmac clock register
5aa0461d1180 wifi: rt2x00: set VGC gain for both chains of MT7620
8d9c00979a7e wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620
27ed98e8a9b0 wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620
3d67986e7208 can: bcm: check the result of can_send() in bcm_can_tx()
7b674dce4162 Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
e25ca9af8a13 Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
b051d9bf98bd regulator: core: Prevent integer underflow
e01d96494a9d wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit()
be81c44242b2 xfrm: Update ipcomp_scratches with NULL when freed
9661724f6206 wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()
0958e487e81b tcp: annotate data-race around tcp_md5sig_pool_populated
129ca0db956e openvswitch: Fix overreporting of drops in dropwatch
4398e8a7fd6a openvswitch: Fix double reporting of drops in dropwatch
e3c9b9473453 bpftool: Clear errno after libcap's checks
50e45034c580 wifi: brcmfmac: fix invalid address access when enabling SCAN log level
bbacfcde5fff NFSD: fix use-after-free on source server when doing inter-server copy
3de402a5248a NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
1f730d4ae6f9 x86/entry: Work around Clang __bdos() bug
513943bf879d thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
708b9abe1b4a powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue
b434edb0e9d1 MIPS: BCM47XX: Cast memcmp() of function to (void *)
6c61a37ea70e ACPI: video: Add Toshiba Satellite/Portege Z830 quirk
0dd025483f15 rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
36d4ffbedff7 rcu: Back off upon fill_page_cache_func() allocation failure
278d8ba2b288 selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle
b60aa21e2f3a f2fs: fix to account FS_CP_DATA_IO correctly
0b8230d44ce7 f2fs: fix to avoid REQ_TIME and CP_TIME collision
ecbd95958c48 f2fs: fix race condition on setting FI_NO_EXTENT flag
110146ce8f84 ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
dce07e87ee1e thermal/drivers/qcom/tsens-v0_1: Fix MSM8939 fourth sensor hw_id
3a720eb89026 crypto: cavium - prevent integer overflow loading firmware
7bfa7d677353 crypto: marvell/octeontx - prevent integer overflows
cdd42eb4689b kbuild: rpm-pkg: fix breakage when V=1 is used
6d1aef17e7f2 kbuild: remove the target in signal traps when interrupted
8d76dd508093 tracing: kprobe: Make gen test module work in arm and riscv
c6512a6f0cb1 tracing: kprobe: Fix kprobe event gen test module on exit
9e6ba62d418d iommu/iova: Fix module config properly
426d5bc089e7 crypto: qat - fix DMA transfer direction
a43babc059a7 crypto: qat - use pre-allocated buffers in datapath
a91af5085027 crypto: qat - fix use of 'dma_map_single'
8a4ed09ed816 crypto: inside-secure - Change swab to swab32
d33935e66604 crypto: ccp - Release dma channels before dmaengine unrgister
a1354bdd191d crypto: akcipher - default implementation for setting a private key
2fee0dbfaeaa iommu/omap: Fix buffer overflow in debugfs
cfde58a8e41f cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
ab2485eb5dfa hwrng: imx-rngc - Moving IRQ handler registering after imx_rngc_irq_mask_clear()
d88b88514ef2 crypto: hisilicon/zip - fix mismatch in get/set sgl_sge_nr
25f134247372 crypto: sahara - don't sleep when in softirq
2d285164fbe4 powerpc: Fix SPE Power ISA properties for e500v1 platforms
2bde4e1e4f01 powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5
7ae8bed9087a x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
6315998170b4 powerpc/powernv: add missing of_node_put() in opal_export_attrs()
434db6d17b6b powerpc/pci_dn: Add missing of_node_put()
718e2d802388 powerpc/sysdev/fsl_msi: Add missing of_node_put()
592d283a656d powerpc/math_emu/efp: Include module.h
44c26ceffaa3 mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
b1616599c99a clk: ast2600: BCLK comes from EPLL
6d01017247ee clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
9b65fd651334 clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration
9a6087a438ef clk: baikal-t1: Add SATA internal ref clock buffer
5f143f3bc2e0 clk: baikal-t1: Add shared xGMAC ref/ptp clocks internal parent
823fd523912f clk: baikal-t1: Fix invalid xGMAC PTP clock divider
2f19a1050e1b clk: vc5: Fix 5P49V6901 outputs disabling when enabling FOD
92f52770a7af spmi: pmic-arb: correct duplicate APID to PPID mapping logic
a01c0c160049 dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup()
1dd5148445eb clk: mediatek: mt8183: mfgcfg: Propagate rate changes to parent
6e58f2469ec5 mfd: sm501: Add check for platform_driver_register()
3469dd8e22ff mfd: fsl-imx25: Fix check for platform_get_irq() errors
b425e03c9639 mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init()
f7b438863622 mfd: lp8788: Fix an error handling path in lp8788_probe()
08d40518033d mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq()
28868b940b53 mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe()
382a5fc49e6e fsi: core: Check error number after calling ida_simple_get
ed8e6011b953 clk: qcom: apss-ipq6018: mark apcs_alias0_core_clk as critical
884a788f0655 scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()
a9e5176ead6d scsi: libsas: Fix use-after-free bug in smp_execute_task_sg()
8f740c11d891 serial: 8250: Fix restoring termios speed after suspend
ab5a3e714437 firmware: google: Test spinlock on panic path to avoid lockups
95ac62e8545b staging: vt6655: fix some erroneous memory clean-up loops
878f9871668f phy: qualcomm: call clk_disable_unprepare in the error handling
9a56ade124d4 tty: serial: fsl_lpuart: disable dma rx/tx use flags in lpuart_dma_shutdown
572fb97fce35 serial: 8250: Toggle IER bits on only after irq has been set up
3fbfa5e3cc0d serial: 8250: Add an empty line and remove some useless {}
71ffe5111f0f drivers: serial: jsm: fix some leaks in probe
7efdd91d54cb usb: gadget: function: fix dangling pnp_string in f_printer.c
cc952e3bf61c xhci: Don't show warning for reinit on known broken suspend
dac769dd7dc8 IB: Set IOVA/LENGTH on IB_MR in core/uverbs layers
360386e11c8d RDMA/cm: Use SLID in the work completion as the DLID in responder side
a1263294b55c md/raid5: Ensure stripe_fill happens on non-read IO with journal
76694e9ce0b2 md: Replace snprintf with scnprintf
7bd5f3b4a805 mtd: rawnand: meson: fix bit map use in meson_nfc_ecc_correct()
f5325f3202b8 ata: fix ata_id_has_dipm()
f5a6fa1877f4 ata: fix ata_id_has_ncq_autosense()
3c34a91c8aa7 ata: fix ata_id_has_devslp()
fc61a0c8200a ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
e3917c85f41e RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.
3a9d7d8dcf98 mtd: rawnand: fsl_elbc: Fix none ECC mode
f87f72081132 mtd: devices: docg3: check the return value of devm_ioremap() in the probe
d06cc0e11d5b dyndbg: drop EXPORTed dynamic_debug_exec_queries
1d6598558914 dyndbg: let query-modname override actual module name
c0e206da44e5 dyndbg: fix module.dyndbg handling
5047bd3bd739 dyndbg: fix static_branch manipulation
af12e209a9d5 dmaengine: hisilicon: Add multi-thread support for a DMA channel
d3fd838536df dmaengine: hisilicon: Fix CQ head update
d5065ca461a4 dmaengine: hisilicon: Disable channels when unregister hisi_dma
f59861946fa5 fpga: prevent integer overflow in dfl_feature_ioctl_set_irq()
7ba19a60c74f misc: ocxl: fix possible refcount leak in afu_ioctl()
cf3bb86edd8f RDMA/rxe: Fix the error caused by qp->sk
cdce36a88def RDMA/rxe: Fix "kernel NULL pointer dereference" error
2630cc88327a media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init
40aa0999a3e4 media: meson: vdec: add missing clk_disable_unprepare on error in vdec_hevc_start()
551b87976a0c tty: xilinx_uartps: Fix the ignore_status
28cdf6c6fb7a media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop
1f683bff1a9c HSI: omap_ssi_port: Fix dma_map_sg error check
962f22e7f769 HSI: omap_ssi: Fix refcount leak in ssi_probe
70f0a0a27d79 clk: tegra20: Fix refcount leak in tegra20_clock_init
c01bfd23cc13 clk: tegra: Fix refcount leak in tegra114_clock_init
f487137a53b1 clk: tegra: Fix refcount leak in tegra210_clock_init
59e90c4d9861 clk: sprd: Hold reference returned by of_get_parent()
57141b1dd689 clk: berlin: Add of_node_put() for of_get_parent()
dc190b46c63f clk: qoriq: Hold reference returned by of_get_parent()
baadc6f58fa8 clk: oxnas: Hold reference returned by of_get_parent()
b95f4f905461 clk: meson: Hold reference returned by of_get_parent()
beec2f02555c usb: common: debug: Check non-standard control requests
9d965a22f657 usb: common: move function's kerneldoc next to its definition
20b63631a38a usb: common: add function to get interval expressed in us unit
c1ef8c66a362 usb: common: Parse for USB SSP genXxY
ffffb159e1e5 usb: ch9: Add USB 3.2 SSP attributes
aa7aada4b7b8 iio: ABI: Fix wrong format of differential capacitance channel ABI.
b9a0526cd02b iio: inkern: only release the device node when done with it
44ec4b04fc99 iio: adc: at91-sama5d2_adc: disable/prepare buffer on suspend/resume
513c72d76df6 iio: adc: at91-sama5d2_adc: lock around oversampling and sample freq
d259b90f0c3d iio: adc: at91-sama5d2_adc: check return status for pressure and touch
bc2b97e177a9 iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX
5b9bb0cbd9e7 ARM: dts: exynos: fix polarity of VBUS GPIO of Origen
657de36c72f5 arm64: ftrace: fix module PLTs with mcount
40e966a404c7 ARM: Drop CMDLINE_* dependency on ATAGS
477dbf9d1bd5 ARM: dts: exynos: correct s5k6a3 reset polarity on Midas family
5bbd3dd7f923 soc/tegra: fuse: Drop Kconfig dependency on TEGRA20_APB_DMA
09c35f1520e7 ia64: export memory_add_physaddr_to_nid to fix cxl build error
e31c0e14cfad ARM: dts: kirkwood: lsxl: remove first ethernet port
df4f05b35634 ARM: dts: kirkwood: lsxl: fix serial line
43faaedf3a7f ARM: dts: turris-omnia: Fix mpp26 pin name and comment
d5c2051898fd soc: qcom: smem_state: Add refcounting for the 'state->of_node'
39781c98ad46 soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
1d312c12c91f memory: of: Fix refcount leak bug in of_lpddr3_get_ddr_timings()
daaec4b3fe22 memory: of: Fix refcount leak bug in of_get_ddr_timings()
fde46754d548 memory: pl353-smc: Fix refcount leak bug in pl353_smc_probe()
2c442b0c0624 ALSA: hda/hdmi: Don't skip notification handling during PM operation
f182de42d786 ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe
37e3e01c9a78 ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe
fb2356969935 ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe
c1b269dda1e7 ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe
71704c2e1b2c mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe()
c940636d9c74 ALSA: dmaengine: increment buffer pointer atomically
4993c1511d66 ASoC: da7219: Fix an error handling path in da7219_register_dai_clks()
ef59819976da drm/msm/dp: correct 1.62G link rate at dp_catalog_ctrl_config_msa()
598d8f7d86f1 drm/msm/dpu: index dpu_kms->hw_vbif using vbif_idx
a9a60d640572 ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API
ad0b8ed172a1 mmc: au1xmmc: Fix an error handling path in au1xmmc_probe()
1f340e1c1c74 drm/omap: dss: Fix refcount leak bugs
cbe37857dda1 ALSA: hda: beep: Simplify keep-power-at-enable behavior
f0fb0817ebce ASoC: rsnd: Add check for rsnd_mod_power_on
877e92e9b1bd drm/bridge: megachips: Fix a null pointer dereference bug
c577b4e97227 drm: fix drm_mipi_dbi build errors
804d8e59f34f platform/x86: msi-laptop: Fix resource cleanup
c21c08fab716 platform/x86: msi-laptop: Fix old-ec check for backlight registering
b77755f58ede ASoC: tas2764: Fix mute/unmute
2e6b64df54cd ASoC: tas2764: Drop conflicting set_bias_level power setting
c2c6022e1004 ASoC: tas2764: Allow mono streams
868fc93b615b platform/chrome: fix memory corruption in ioctl
84da5cdf43d2 platform/chrome: fix double-free in chromeos_laptop_prepare()
5e25bfcd12d8 drm:pl111: Add of_node_put() when breaking out of for_each_available_child_of_node()
ad06d6bed5f2 drm/dp_mst: fix drm_dp_dpcd_read return value checks
3f5889fd6500 drm/bridge: parade-ps8640: Fix regulator supply order
45120fa5e522 drm/mipi-dsi: Detach devices when removing the host
050b65050741 drm/bridge: Avoid uninitialized variable warning
7839f2b3495b drm: bridge: adv7511: fix CEC power down control register offset
29f50bcf0f8b net: mvpp2: fix mvpp2 debugfs leak
6cb54f21623d once: add DO_ONCE_SLOW() for sleepable contexts
67cb80a9d2c8 net/ieee802154: reject zero-sized raw_sendmsg()
6cc0e2afc6a1 bnx2x: fix potential memory leak in bnx2x_tpa_stop()
da349221c4d2 net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
d9e25dc053f6 spi: Ensure that sg_table won't be used after being freed
96a3ddb87031 tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
f65955340e00 sctp: handle the error returned from sctp_auth_asoc_init_active_key
2a1d03632085 mISDN: fix use-after-free bugs in l1oip timer handlers
b4a5905fd2ef vhost/vsock: Use kvmalloc/kvfree for larger packets.
d2b5dc3a5394 wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
17196f2f98ab spi: s3c64xx: Fix large transfers with DMA
b284e1fe15c4 netfilter: nft_fib: Fix for rpath check with VRF devices
b384e8fb1606 Bluetooth: hci_core: Fix not handling link timeouts propertly
129f01116b8c i2c: mlxbf: support lock mechanism
534909fe3c92 spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe
9da61e7b5993 spi: dw: Fix PM disable depth imbalance in dw_spi_bt1_probe
1ef5798638bd x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
6ed7b05a3592 x86/microcode/AMD: Track patch allocation size explicitly
07299e52e5b9 wifi: ath11k: fix number of VHT beamformee spatial streams
d7cc0d51ffcb Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure
ed403bcd979d bpf: Ensure correct locking around vulnerable function find_vpid()
2a1c29dc9b7e net: fs_enet: Fix wrong check in do_pd_setup
795954d75197 wifi: rtl8xxxu: Remove copy-paste leftover in gen2_update_rate_mask
226e6f241258 wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration
0a60ac7a0dad bpf: btf: fix truncated last_member_type_id in btf_struct_resolve
8398a45d3d72 spi: meson-spicc: do not rely on busy flag in pow2 clk ops
351cf55595d3 wifi: rtl8xxxu: Fix skb misuse in TX queue selection
1e911790576f spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime()
7b83d11d48ff spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume()
557600830515 selftests/xsk: Avoid use-after-free on ctx
c823df067941 wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse()
ea1b6b54098c Bluetooth: btusb: mediatek: fix WMT failure during runtime suspend
07194ccbb14c Bluetooth: btusb: fix excessive stack usage
cdadf95435ff Bluetooth: btusb: Fine-tune mt7663 mechanism.
294395caacf1 x86/resctrl: Fix to restore to original value when re-enabling hardware prefetch register
029a1de92ce2 spi: mt7621: Fix an error message in mt7621_spi_probe()
2afb93e4e416 bpftool: Fix a wrong type cast in btf_dumper_int
61905bbb6116 wifi: mac80211: allow bw change during channel switch in mesh
75652070667f leds: lm3601x: Don't use mutex after it was destroyed
08faf07717be wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state()
e060c4b9f33c nfsd: Fix a memory leak in an error handling path
730191a098d8 objtool: Preserve special st_shndx indexes in elf_update_symbol
84837738d406 ARM: 9247/1: mm: set readonly for MT_MEMORY_RO with ARM_LPAE
f1d6edeaa8d0 ARM: 9244/1: dump: Fix wrong pg_level in walk_pmd()
da2aecef866b MIPS: SGI-IP27: Fix platform-device leak in bridge_platform_create()
0c667858c026 MIPS: SGI-IP27: Free some unused memory
35984456983b sh: machvec: Use char[] for section boundaries
6e4be747f15f userfaultfd: open userfaultfds with O_RDONLY
28d9b3973307 selinux: use "grep -E" instead of "egrep"
d11e09953cc0 smb3: must initialize two ACL struct fields to zero
abd13b21004d drm/i915: Fix watermark calculations for gen12+ MC CCS modifier
fd37286f392a drm/i915: Fix watermark calculations for gen12+ RC CCS modifier
5d6093c49c09 drm/nouveau: fix a use-after-free in nouveau_gem_prime_import_sg_table()
57f1a89a8e4e drm/nouveau/kms/nv140-: Disable interlacing
d0febad83e29 staging: greybus: audio_helper: remove unused and wrong debugfs usage
ceeb8d4a43ac KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS
83fe0b009bd0 KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"
085ca1d33b19 KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
bda8120e5b10 media: cedrus: Set the platform driver data earlier
dbdd3b1448e5 efi: libstub: drop pointless get_memory_map() call
68158654b583 thunderbolt: Explicitly enable lane adapter hotplug events at startup
fc08f8438172 tracing: Disable interrupt or preemption before acquiring arch_spinlock_t
0cf6c09dafee ring-buffer: Fix race between reset page and reading page
588f02f8b9d9 ring-buffer: Add ring_buffer_wake_waiters()
586f02c500b2 ring-buffer: Check pending waiters when doing wake ups as well
6617e5132c44 ring-buffer: Have the shortest_full queue be the shortest not longest
4a3bbd40e452 ring-buffer: Allow splice to read previous partially read pages
f2ca4609d0c3 ftrace: Properly unset FTRACE_HASH_FL_MOD
846f041203b9 livepatch: fix race between fork and KLP transition
2189756eabbb ext4: update 'state->fc_regions_size' after successful memory allocation
2cfb769d60a2 ext4: fix potential memory leak in ext4_fc_record_regions()
c9ce7766dc4e ext4: fix potential memory leak in ext4_fc_record_modified_inode()
d575fb52c466 ext4: fix miss release buffer head in ext4_fc_write_inode
74d2a398d2d8 ext4: place buffer head allocation before handle start
fbb0e601bd51 ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate
0e1764ad71ab ext4: don't increase iversion counter for ea_inodes
483831ad0440 ext4: fix check for block being out of directory size
ac66db1a4365 ext4: make ext4_lazyinit_thread freezable
f34ab9516276 ext4: fix null-ptr-deref in ext4_write_info
fb98cb61efff ext4: avoid crash when inline data creation follows DIO write
e65506ff181f jbd2: add miss release buffer head in fc_do_one_pass()
1d4d16daec2a jbd2: fix potential use-after-free in jbd2_fc_wait_bufs
7a33dde572fc jbd2: fix potential buffer head reference count leak
eea3e455a3ae jbd2: wake up journal waiters in FIFO order, not LIFO
ba52e685d29b hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero
bdcb1d7cf285 hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO
d621a87064fa hardening: Clarify Kconfig text for auto-var-init
4a8e8bf28070 f2fs: fix to do sanity check on summary info
73fb4bd2c055 f2fs: fix to do sanity check on destination blkaddr during recovery
12014eaf1b3f f2fs: increase the limit for reserve_root
47b5ffe86332 btrfs: fix race between quota enable and quota rescan ioctl
e50472949604 fbdev: smscufx: Fix use-after-free in ufx_ops_open()
9931bd05bb8d scsi: qedf: Populate sysfs attributes for vport
102c4b6e8c4b powerpc/boot: Explicitly disable usage of SPE instructions
7db60fd46e0c powercap: intel_rapl: Use standard Energy Unit for SPR Dram RAPL domain
9119a92ad93e PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge
a3c08c021778 mm/mmap: undo ->mmap() when arch_validate_flags() fails
7d551b7d6114 block: fix inflight statistics of part0
0a129790893b drm/udl: Restore display mode on resume
f134f261d76a drm/virtio: Check whether transferred 2D BO is shmem
303436e301ba nvme-pci: set min_align_mask before calculating max_hw_sectors
6a73e6edcbf3 UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
1a053f597f42 riscv: Pass -mno-relax only on lld < 15.0.0
d15dca1d46ab riscv: Make VM_WRITE imply VM_READ
d8c6f9b2e194 riscv: Allow PROT_WRITE-only mmap()
a6dcc6cfa293 parisc: fbdev/stifb: Align graphics memory size to 4MB
2ce9fab94b8d RISC-V: Make port I/O string accessors actually work
ffb571e1232f regulator: qcom_rpm: Fix circular deferral regression
85909424a1f5 hwmon: (gsc-hwmon) Call of_node_get() before of_find_xxx API
8ef0e1c0ae50 ASoC: wcd934x: fix order of Slimbus unprepare/disable
9b2c82af65f7 ASoC: wcd9335: fix order of Slimbus unprepare/disable
1c20d672e3a5 platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure
6b7ae4a904a4 quota: Check next/prev free block number after reading from quota file
5b1a56beb6b8 HID: multitouch: Add memory barriers
bfe60d7641b0 fs: dlm: handle -EBUSY first in lock arg validation
0b2d8e4db40c fs: dlm: fix race between test_bit() and queue_work()
057d5838c795 mmc: sdhci-sprd: Fix minimum clock limit
448fffc1aea6 can: kvaser_usb_leaf: Fix CAN state after restart
a3776e09b361 can: kvaser_usb_leaf: Fix TX queue out of sync after restart
0f8c88978da4 can: kvaser_usb_leaf: Fix overread with an invalid command
5d1cb7bfad21 can: kvaser_usb: Fix use of uninitialized completion
b239a0993aa2 usb: add quirks for Lenovo OneLink+ Dock
afbbf305dbac iio: pressure: dps310: Reset chip after timeout
9daadd1d1015 iio: pressure: dps310: Refactor startup procedure
ae49d80400e6 iio: adc: ad7923: fix channel readings for some variants
ea4dcd3d6acc iio: ltc2497: Fix reading conversion results
30e1bd0d3e66 iio: dac: ad5593r: Fix i2c read protocol requirements
9312e04b6c6b cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message
64f23e5430d3 cifs: destage dirty pages before re-reading them for cache=none
50d3d895375c mtd: rawnand: atmel: Unmap streaming DMA mappings
e8eb44eeee59 ALSA: hda/realtek: Add Intel Reference SSID to support headset keys
4491fbd0a79c ALSA: hda/realtek: Add quirk for ASUS GV601R laptop
4285d06d1296 ALSA: hda/realtek: Correct pin configs for ASUS G533Z
768cd2cd1ae6 ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530
3e29645fbaa6 ALSA: usb-audio: Fix NULL dererence at error path
bc1d16d282bc ALSA: usb-audio: Fix potential memory leaks
ef1658bc482c ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
026fcb6336d6 ALSA: oss: Fix potential deadlock at unregistration
Also update the .xml file to handle the few ABI changes in this merge
that required an update due to private pointers changing types and ABI
padding structures being used to preserve the ABI:
Leaf changes summary: 4 artifacts changed (1 filtered out)
Changed leaf types summary: 4 (1 filtered out) leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct fscrypt_info at fscrypt_private.h:195:1' changed:
type size hasn't changed
there are data member changes:
type 'key*' of 'fscrypt_info::ci_master_key' changed:
pointer type changed from: 'key*' to: 'fscrypt_master_key*'
5197 impacted interfaces
'struct sk_buff at skbuff.h:717:1' changed:
type size hasn't changed
there are data member changes:
data member u64 android_kabi_reserved1 at offset 1472 (in bits) became anonymous data member 'union {struct {__u8 scm_io_uring; __u8 android_kabi_reserved1_padding1; __u16 android_kabi_reserved1_padding2; __u32 android_kabi_reserved1_padding3;}; struct {u64 android_kabi_reserved1;}; union {};}'
5197 impacted interfaces
'struct super_block at fs.h:1450:1' changed:
type size hasn't changed
there are data member changes:
type 'key*' of 'super_block::s_master_keys' changed:
pointer type changed from: 'key*' to: 'fscrypt_keyring*'
5197 impacted interfaces
'struct tcp_sock at tcp.h:146:1' changed:
type size hasn't changed
one impacted interface
Change-Id: I6f2a7b91e1df96bede8aafa944a04b3e08ed33a1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-01-21 20:10:21 +09:00
|
|
|
|
mm/khugepaged: take the right locks for page table retraction
commit 8d3c106e19e8d251da31ff4cc7462e4565d65084 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[manual backport: this code was refactored from two copies into a common
helper between 5.15 and 6.0]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-07 02:16:06 +09:00
|
|
|
/*
|
|
|
|
* We need to lock the mapping so that from here on, only GUP-fast and
|
|
|
|
* hardware page walks can access the parts of the page tables that
|
|
|
|
* we're operating on.
|
|
|
|
*/
|
|
|
|
i_mmap_lock_write(vma->vm_file->f_mapping);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This spinlock should be unnecessary: Nobody else should be accessing
|
|
|
|
* the page tables under spinlock protection here, only
|
|
|
|
* lockless_pages_from_mm() and the hardware page walker can access page
|
|
|
|
* tables while all the high-level locks are held in write mode.
|
|
|
|
*/
|
2019-09-24 07:38:30 +09:00
|
|
|
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
|
|
|
|
|
|
|
|
/* step 1: check all mapped PTEs are to the right huge page */
|
|
|
|
for (i = 0, addr = haddr, pte = start_pte;
|
|
|
|
i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) {
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
/* empty pte, skip */
|
|
|
|
if (pte_none(*pte))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* page swapped out, abort */
|
|
|
|
if (!pte_present(*pte))
|
|
|
|
goto abort;
|
|
|
|
|
|
|
|
page = vm_normal_page(vma, addr, *pte);
|
|
|
|
|
|
|
|
/*
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
* Note that uprobe, debugger, or MAP_PRIVATE may change the
|
|
|
|
* page table, but the new page will not be a subpage of hpage.
|
2019-09-24 07:38:30 +09:00
|
|
|
*/
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
if (hpage + i != page)
|
2019-09-24 07:38:30 +09:00
|
|
|
goto abort;
|
|
|
|
count++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 2: adjust rmap */
|
|
|
|
for (i = 0, addr = haddr, pte = start_pte;
|
|
|
|
i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) {
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
if (pte_none(*pte))
|
|
|
|
continue;
|
|
|
|
page = vm_normal_page(vma, addr, *pte);
|
|
|
|
page_remove_rmap(page, false);
|
|
|
|
}
|
|
|
|
|
|
|
|
pte_unmap_unlock(start_pte, ptl);
|
|
|
|
|
|
|
|
/* step 3: set proper refcount and mm_counters. */
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
if (count) {
|
2019-09-24 07:38:30 +09:00
|
|
|
page_ref_sub(hpage, count);
|
|
|
|
add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 4: collapse pmd */
|
2022-12-23 05:41:50 +09:00
|
|
|
/* we make no change to anon, but protect concurrent anon page lookup */
|
|
|
|
if (vma->anon_vma)
|
|
|
|
anon_vma_lock_write(vma->anon_vma);
|
|
|
|
|
2022-12-07 02:16:05 +09:00
|
|
|
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, haddr,
|
|
|
|
haddr + HPAGE_PMD_SIZE);
|
|
|
|
mmu_notifier_invalidate_range_start(&range);
|
2020-08-07 15:26:15 +09:00
|
|
|
_pmd = pmdp_collapse_flush(vma, haddr, pmd);
|
2022-11-22 05:15:43 +09:00
|
|
|
vm_write_end(vma);
|
2019-09-24 07:38:30 +09:00
|
|
|
mm_dec_nr_ptes(mm);
|
2022-12-07 02:16:04 +09:00
|
|
|
tlb_remove_table_sync_one();
|
2022-12-07 02:16:05 +09:00
|
|
|
mmu_notifier_invalidate_range_end(&range);
|
2019-09-24 07:38:30 +09:00
|
|
|
pte_free(mm, pmd_pgtable(_pmd));
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
|
2022-12-23 05:41:50 +09:00
|
|
|
if (vma->anon_vma)
|
|
|
|
anon_vma_unlock_write(vma->anon_vma);
|
mm/khugepaged: take the right locks for page table retraction
commit 8d3c106e19e8d251da31ff4cc7462e4565d65084 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[manual backport: this code was refactored from two copies into a common
helper between 5.15 and 6.0]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-07 02:16:06 +09:00
|
|
|
i_mmap_unlock_write(vma->vm_file->f_mapping);
|
|
|
|
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
drop_hpage:
|
|
|
|
unlock_page(hpage);
|
|
|
|
put_page(hpage);
|
2019-09-24 07:38:30 +09:00
|
|
|
return;
|
|
|
|
|
|
|
|
abort:
|
|
|
|
pte_unmap_unlock(start_pte, ptl);
|
2022-11-22 05:15:43 +09:00
|
|
|
vm_write_end(vma);
|
mm/khugepaged: take the right locks for page table retraction
commit 8d3c106e19e8d251da31ff4cc7462e4565d65084 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[manual backport: this code was refactored from two copies into a common
helper between 5.15 and 6.0]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-07 02:16:06 +09:00
|
|
|
i_mmap_unlock_write(vma->vm_file->f_mapping);
|
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:18 +09:00
|
|
|
goto drop_hpage;
|
2019-09-24 07:38:30 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
|
|
|
|
{
|
|
|
|
struct mm_struct *mm = mm_slot->mm;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (likely(mm_slot->nr_pte_mapped_thp == 0))
|
|
|
|
return 0;
|
|
|
|
|
2020-06-09 13:33:25 +09:00
|
|
|
if (!mmap_write_trylock(mm))
|
2019-09-24 07:38:30 +09:00
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
if (unlikely(khugepaged_test_exit(mm)))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++)
|
|
|
|
collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i]);
|
|
|
|
|
|
|
|
out:
|
|
|
|
mm_slot->nr_pte_mapped_thp = 0;
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_write_unlock(mm);
|
2019-09-24 07:38:30 +09:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-07-27 07:26:32 +09:00
|
|
|
static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
|
|
|
|
{
|
|
|
|
struct vm_area_struct *vma;
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
struct mm_struct *mm;
|
2016-07-27 07:26:32 +09:00
|
|
|
unsigned long addr;
|
|
|
|
pmd_t *pmd, _pmd;
|
|
|
|
|
|
|
|
i_mmap_lock_write(mapping);
|
|
|
|
vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
|
2019-09-24 07:38:30 +09:00
|
|
|
/*
|
|
|
|
* Check vma->anon_vma to exclude MAP_PRIVATE mappings that
|
|
|
|
* got written to. These VMAs are likely not worth investing
|
2020-06-09 13:33:51 +09:00
|
|
|
* mmap_write_lock(mm) as PMD-mapping is likely to be split
|
2019-09-24 07:38:30 +09:00
|
|
|
* later.
|
|
|
|
*
|
|
|
|
* Not that vma->anon_vma check is racy: it can be set up after
|
2020-06-09 13:33:54 +09:00
|
|
|
* the check but before we took mmap_lock by the fault path.
|
2019-09-24 07:38:30 +09:00
|
|
|
* But page lock would prevent establishing any new ptes of the
|
|
|
|
* page, so we are safe.
|
|
|
|
*
|
|
|
|
* An alternative would be drop the check, but check that page
|
|
|
|
* table is clear before calling pmdp_collapse_flush() under
|
|
|
|
* ptl. It has higher chance to recover THP for the VMA, but
|
mm/khugepaged: take the right locks for page table retraction
commit 8d3c106e19e8d251da31ff4cc7462e4565d65084 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[manual backport: this code was refactored from two copies into a common
helper between 5.15 and 6.0]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-07 02:16:06 +09:00
|
|
|
* has higher cost too. It would also probably require locking
|
|
|
|
* the anon_vma.
|
2019-09-24 07:38:30 +09:00
|
|
|
*/
|
2016-07-27 07:26:32 +09:00
|
|
|
if (vma->anon_vma)
|
|
|
|
continue;
|
|
|
|
addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
|
|
|
|
if (addr & ~HPAGE_PMD_MASK)
|
|
|
|
continue;
|
|
|
|
if (vma->vm_end < addr + HPAGE_PMD_SIZE)
|
|
|
|
continue;
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
mm = vma->vm_mm;
|
|
|
|
pmd = mm_find_pmd(mm, addr);
|
2016-07-27 07:26:32 +09:00
|
|
|
if (!pmd)
|
|
|
|
continue;
|
|
|
|
/*
|
2020-06-09 13:33:54 +09:00
|
|
|
* We need exclusive mmap_lock to retract page table.
|
2019-09-24 07:38:30 +09:00
|
|
|
*
|
|
|
|
* We use trylock due to lock inversion: we need to acquire
|
2020-06-09 13:33:54 +09:00
|
|
|
* mmap_lock while holding page lock. Fault path does it in
|
2019-09-24 07:38:30 +09:00
|
|
|
* reverse order. Trylock is a way to avoid deadlock.
|
2016-07-27 07:26:32 +09:00
|
|
|
*/
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
if (mmap_write_trylock(mm)) {
|
|
|
|
if (!khugepaged_test_exit(mm)) {
|
2022-12-07 02:16:05 +09:00
|
|
|
struct mmu_notifier_range range;
|
|
|
|
|
2022-11-22 05:15:43 +09:00
|
|
|
vm_write_begin(vma);
|
2022-12-07 02:16:05 +09:00
|
|
|
mmu_notifier_range_init(&range,
|
|
|
|
MMU_NOTIFY_CLEAR, 0,
|
|
|
|
NULL, mm, addr,
|
|
|
|
addr + HPAGE_PMD_SIZE);
|
|
|
|
mmu_notifier_invalidate_range_start(&range);
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
/* assume page table is clear */
|
|
|
|
_pmd = pmdp_collapse_flush(vma, addr, pmd);
|
2022-11-22 05:15:43 +09:00
|
|
|
vm_write_end(vma);
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
mm_dec_nr_ptes(mm);
|
2022-12-07 02:16:04 +09:00
|
|
|
tlb_remove_table_sync_one();
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
pte_free(mm, pmd_pgtable(_pmd));
|
2022-12-07 02:16:05 +09:00
|
|
|
mmu_notifier_invalidate_range_end(&range);
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
}
|
|
|
|
mmap_write_unlock(mm);
|
2019-09-24 07:38:30 +09:00
|
|
|
} else {
|
|
|
|
/* Try again later */
|
khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 15:26:22 +09:00
|
|
|
khugepaged_add_pte_mapped_thp(mm, addr);
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
i_mmap_unlock_write(mapping);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2019-09-24 07:38:00 +09:00
|
|
|
* collapse_file - collapse filemap/tmpfs/shmem pages into huge one.
|
2016-07-27 07:26:32 +09:00
|
|
|
*
|
|
|
|
* Basic scheme is simple, details are more complex:
|
mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-01 07:10:43 +09:00
|
|
|
* - allocate and lock a new huge page;
|
2017-12-05 04:56:08 +09:00
|
|
|
* - scan page cache replacing old pages with the new one
|
2019-09-24 07:38:00 +09:00
|
|
|
* + swap/gup in pages if necessary;
|
2016-07-27 07:26:32 +09:00
|
|
|
* + fill in gaps;
|
2017-12-05 04:56:08 +09:00
|
|
|
* + keep old pages around in case rollback is required;
|
|
|
|
* - if replacing succeeds:
|
2016-07-27 07:26:32 +09:00
|
|
|
* + copy data over;
|
|
|
|
* + free old pages;
|
mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-01 07:10:43 +09:00
|
|
|
* + unlock huge page;
|
2016-07-27 07:26:32 +09:00
|
|
|
* - if replacing failed;
|
|
|
|
* + put all pages back and unfreeze them;
|
2017-12-05 04:56:08 +09:00
|
|
|
* + restore gaps in the page cache;
|
mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-01 07:10:43 +09:00
|
|
|
* + unlock and free huge page;
|
2016-07-27 07:26:32 +09:00
|
|
|
*/
|
2019-09-24 07:37:57 +09:00
|
|
|
static void collapse_file(struct mm_struct *mm,
|
|
|
|
struct file *file, pgoff_t start,
|
2016-07-27 07:26:32 +09:00
|
|
|
struct page **hpage, int node)
|
|
|
|
{
|
2019-09-24 07:37:57 +09:00
|
|
|
struct address_space *mapping = file->f_mapping;
|
2016-07-27 07:26:32 +09:00
|
|
|
gfp_t gfp;
|
2017-12-05 04:56:08 +09:00
|
|
|
struct page *new_page;
|
2016-07-27 07:26:32 +09:00
|
|
|
pgoff_t index, end = start + HPAGE_PMD_NR;
|
|
|
|
LIST_HEAD(pagelist);
|
2017-12-05 04:56:08 +09:00
|
|
|
XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
|
2016-07-27 07:26:32 +09:00
|
|
|
int nr_none = 0, result = SCAN_SUCCEED;
|
2019-09-24 07:38:00 +09:00
|
|
|
bool is_shmem = shmem_file(file);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
2019-09-24 07:38:00 +09:00
|
|
|
VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
|
2016-07-27 07:26:32 +09:00
|
|
|
VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
|
|
|
|
|
|
|
|
/* Only allocate from the target node */
|
2017-01-11 09:57:42 +09:00
|
|
|
gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
|
2016-07-27 07:26:32 +09:00
|
|
|
|
|
|
|
new_page = khugepaged_alloc_page(hpage, gfp, node);
|
|
|
|
if (!new_page) {
|
|
|
|
result = SCAN_ALLOC_HUGE_PAGE_FAIL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-06-04 08:02:24 +09:00
|
|
|
if (unlikely(mem_cgroup_charge(new_page, mm, gfp))) {
|
2016-07-27 07:26:32 +09:00
|
|
|
result = SCAN_CGROUP_CHARGE_FAIL;
|
|
|
|
goto out;
|
|
|
|
}
|
2020-06-04 08:02:04 +09:00
|
|
|
count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
2018-12-01 07:10:50 +09:00
|
|
|
/* This will be less messy when we use multi-index entries */
|
|
|
|
do {
|
|
|
|
xas_lock_irq(&xas);
|
|
|
|
xas_create_range(&xas);
|
|
|
|
if (!xas_error(&xas))
|
|
|
|
break;
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
if (!xas_nomem(&xas, GFP_KERNEL)) {
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
} while (1);
|
|
|
|
|
2018-12-01 07:10:39 +09:00
|
|
|
__SetPageLocked(new_page);
|
2019-09-24 07:38:00 +09:00
|
|
|
if (is_shmem)
|
|
|
|
__SetPageSwapBacked(new_page);
|
2016-07-27 07:26:32 +09:00
|
|
|
new_page->index = start;
|
|
|
|
new_page->mapping = mapping;
|
|
|
|
|
|
|
|
/*
|
mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-01 07:10:43 +09:00
|
|
|
* At this point the new_page is locked and not up-to-date.
|
|
|
|
* It's safe to insert it into the page cache, because nobody would
|
|
|
|
* be able to map it or use it in another way until we unlock it.
|
2016-07-27 07:26:32 +09:00
|
|
|
*/
|
|
|
|
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_set(&xas, start);
|
|
|
|
for (index = start; index < end; index++) {
|
|
|
|
struct page *page = xas_next(&xas);
|
|
|
|
|
|
|
|
VM_BUG_ON(index != xas.xa_index);
|
2019-09-24 07:38:00 +09:00
|
|
|
if (is_shmem) {
|
|
|
|
if (!page) {
|
|
|
|
/*
|
|
|
|
* Stop if extent has been truncated or
|
|
|
|
* hole-punched, and is now completely
|
|
|
|
* empty.
|
|
|
|
*/
|
|
|
|
if (index == start) {
|
|
|
|
if (!xas_next_entry(&xas, end - 1)) {
|
|
|
|
result = SCAN_TRUNCATED;
|
|
|
|
goto xa_locked;
|
|
|
|
}
|
|
|
|
xas_set(&xas, index);
|
|
|
|
}
|
|
|
|
if (!shmem_charge(mapping->host, 1)) {
|
|
|
|
result = SCAN_FAIL;
|
2018-12-01 07:10:39 +09:00
|
|
|
goto xa_locked;
|
2018-12-01 07:10:25 +09:00
|
|
|
}
|
2019-09-24 07:38:00 +09:00
|
|
|
xas_store(&xas, new_page);
|
|
|
|
nr_none++;
|
|
|
|
continue;
|
2018-12-01 07:10:25 +09:00
|
|
|
}
|
2019-09-24 07:38:00 +09:00
|
|
|
|
|
|
|
if (xa_is_value(page) || !PageUptodate(page)) {
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
/* swap in or instantiate fallocated page */
|
|
|
|
if (shmem_getpage(mapping->host, index, &page,
|
|
|
|
SGP_NOHUGE)) {
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto xa_unlocked;
|
|
|
|
}
|
|
|
|
} else if (trylock_page(page)) {
|
|
|
|
get_page(page);
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
} else {
|
|
|
|
result = SCAN_PAGE_LOCK;
|
2018-12-01 07:10:39 +09:00
|
|
|
goto xa_locked;
|
2017-12-05 04:56:08 +09:00
|
|
|
}
|
2019-09-24 07:38:00 +09:00
|
|
|
} else { /* !is_shmem */
|
|
|
|
if (!page || xa_is_value(page)) {
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
page_cache_sync_readahead(mapping, &file->f_ra,
|
|
|
|
file, index,
|
2020-09-05 08:36:16 +09:00
|
|
|
end - index);
|
2019-09-24 07:38:00 +09:00
|
|
|
/* drain pagevecs to help isolate_lru_page() */
|
|
|
|
lru_add_drain();
|
|
|
|
page = find_lock_page(mapping, index);
|
|
|
|
if (unlikely(page == NULL)) {
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto xa_unlocked;
|
|
|
|
}
|
2019-12-01 10:57:19 +09:00
|
|
|
} else if (PageDirty(page)) {
|
|
|
|
/*
|
|
|
|
* khugepaged only works on read-only fd,
|
|
|
|
* so this page is dirty because it hasn't
|
|
|
|
* been flushed since first write. There
|
|
|
|
* won't be new dirty pages.
|
|
|
|
*
|
|
|
|
* Trigger async flush here and hope the
|
|
|
|
* writeback is done when khugepaged
|
|
|
|
* revisits this page.
|
|
|
|
*
|
|
|
|
* This is a one-off situation. We are not
|
|
|
|
* forcing writeback in loop.
|
|
|
|
*/
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
filemap_flush(mapping);
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto xa_unlocked;
|
2021-10-29 06:36:27 +09:00
|
|
|
} else if (PageWriteback(page)) {
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto xa_unlocked;
|
2019-09-24 07:38:00 +09:00
|
|
|
} else if (trylock_page(page)) {
|
|
|
|
get_page(page);
|
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
} else {
|
|
|
|
result = SCAN_PAGE_LOCK;
|
|
|
|
goto xa_locked;
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2018-04-11 08:36:56 +09:00
|
|
|
* The page must be locked, so we can drop the i_pages lock
|
2016-07-27 07:26:32 +09:00
|
|
|
* without racing with truncate.
|
|
|
|
*/
|
|
|
|
VM_BUG_ON_PAGE(!PageLocked(page), page);
|
mm,thp: recheck each page before collapsing file THP
In collapse_file(), for !is_shmem case, current check cannot guarantee
the locked page is up-to-date. Specifically, xas_unlock_irq() should
not be called before lock_page() and get_page(); and it is necessary to
recheck PageUptodate() after locking the page.
With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
may contain corrupted data. This is because khugepaged mistakenly
collapses some not up-to-date sub pages into a huge page, and assumes
the huge page is up-to-date. This will NOT corrupt data in the disk,
because the page is read-only and never written back. Fix this by
properly checking PageUptodate() after locking the page. This check
replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".
Also, move PageDirty() check after locking the page. Current khugepaged
should not try to collapse dirty file THP, because it is limited to
read-only .text. The only case we hit a dirty page here is when the
page hasn't been written since write. Bail out and retry when this
happens.
syzbot reported bug on previous version of this patch.
Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16 10:34:53 +09:00
|
|
|
|
|
|
|
/* make sure the page is up to date */
|
|
|
|
if (unlikely(!PageUptodate(page))) {
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
2018-12-01 07:10:47 +09:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If file was truncated then extended, or hole-punched, before
|
|
|
|
* we locked the first page, then a THP might be there already.
|
|
|
|
*/
|
|
|
|
if (PageTransCompound(page)) {
|
|
|
|
result = SCAN_PAGE_COMPOUND;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
2016-07-27 07:26:32 +09:00
|
|
|
|
|
|
|
if (page_mapping(page) != mapping) {
|
|
|
|
result = SCAN_TRUNCATED;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2021-10-29 06:36:27 +09:00
|
|
|
if (!is_shmem && (PageDirty(page) ||
|
|
|
|
PageWriteback(page))) {
|
mm,thp: recheck each page before collapsing file THP
In collapse_file(), for !is_shmem case, current check cannot guarantee
the locked page is up-to-date. Specifically, xas_unlock_irq() should
not be called before lock_page() and get_page(); and it is necessary to
recheck PageUptodate() after locking the page.
With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
may contain corrupted data. This is because khugepaged mistakenly
collapses some not up-to-date sub pages into a huge page, and assumes
the huge page is up-to-date. This will NOT corrupt data in the disk,
because the page is read-only and never written back. Fix this by
properly checking PageUptodate() after locking the page. This check
replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".
Also, move PageDirty() check after locking the page. Current khugepaged
should not try to collapse dirty file THP, because it is limited to
read-only .text. The only case we hit a dirty page here is when the
page hasn't been written since write. Bail out and retry when this
happens.
syzbot reported bug on previous version of this patch.
Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16 10:34:53 +09:00
|
|
|
/*
|
|
|
|
* khugepaged only works on read-only fd, so this
|
|
|
|
* page is dirty because it hasn't been flushed
|
|
|
|
* since first write.
|
|
|
|
*/
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2016-07-27 07:26:32 +09:00
|
|
|
if (isolate_lru_page(page)) {
|
|
|
|
result = SCAN_DEL_PAGE_LRU;
|
2018-12-01 07:10:39 +09:00
|
|
|
goto out_unlock;
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
|
2019-09-24 07:38:00 +09:00
|
|
|
if (page_has_private(page) &&
|
|
|
|
!try_to_release_page(page, GFP_KERNEL)) {
|
|
|
|
result = SCAN_PAGE_HAS_PRIVATE;
|
2020-05-28 14:20:43 +09:00
|
|
|
putback_lru_page(page);
|
2019-09-24 07:38:00 +09:00
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2016-07-27 07:26:32 +09:00
|
|
|
if (page_mapped(page))
|
2018-02-01 09:17:36 +09:00
|
|
|
unmap_mapping_pages(mapping, index, 1, false);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_lock_irq(&xas);
|
|
|
|
xas_set(&xas, index);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
2017-12-05 04:56:08 +09:00
|
|
|
VM_BUG_ON_PAGE(page != xas_load(&xas), page);
|
2016-07-27 07:26:32 +09:00
|
|
|
VM_BUG_ON_PAGE(page_mapped(page), page);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The page is expected to have page_count() == 3:
|
|
|
|
* - we hold a pin on it;
|
2017-12-05 04:56:08 +09:00
|
|
|
* - one reference from page cache;
|
2016-07-27 07:26:32 +09:00
|
|
|
* - one from isolate_lru_page;
|
|
|
|
*/
|
|
|
|
if (!page_ref_freeze(page, 3)) {
|
|
|
|
result = SCAN_PAGE_COUNT;
|
2018-12-01 07:10:39 +09:00
|
|
|
xas_unlock_irq(&xas);
|
|
|
|
putback_lru_page(page);
|
|
|
|
goto out_unlock;
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add the page to the list to be able to undo the collapse if
|
|
|
|
* something go wrong.
|
|
|
|
*/
|
|
|
|
list_add_tail(&page->lru, &pagelist);
|
|
|
|
|
|
|
|
/* Finally, replace with the new page. */
|
2019-09-24 07:34:52 +09:00
|
|
|
xas_store(&xas, new_page);
|
2016-07-27 07:26:32 +09:00
|
|
|
continue;
|
|
|
|
out_unlock:
|
|
|
|
unlock_page(page);
|
|
|
|
put_page(page);
|
2018-12-01 07:10:39 +09:00
|
|
|
goto xa_unlocked;
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
|
2019-09-24 07:38:00 +09:00
|
|
|
if (is_shmem)
|
|
|
|
__inc_node_page_state(new_page, NR_SHMEM_THPS);
|
2019-09-24 07:38:03 +09:00
|
|
|
else {
|
2019-09-24 07:38:00 +09:00
|
|
|
__inc_node_page_state(new_page, NR_FILE_THPS);
|
2019-09-24 07:38:03 +09:00
|
|
|
filemap_nr_thps_inc(mapping);
|
FROMLIST: mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
Transparent huge pages are supported for read-only non-shmem files,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY). Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().
Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.
This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.
Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.
The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.
Comparison of the performance characteristics of 4KB and 2MB-backed
libraries follows; the Android dex2oat tool was used to AOT compile an
example application on a single ARM core.
4KB Pages:
==========
count event_name # count / runtime
598,995,035,942 cpu-cycles # 1.800861 GHz
81,195,620,851 raw-stall-frontend # 244.112 M/sec
347,754,466,597 iTLB-loads # 1.046 G/sec
2,970,248,900 iTLB-load-misses # 0.854122% miss rate
Total test time: 332.854998 seconds.
2MB Pages:
==========
count event_name # count / runtime
592,872,663,047 cpu-cycles # 1.800358 GHz
76,485,624,143 raw-stall-frontend # 232.261 M/sec
350,478,413,710 iTLB-loads # 1.064 G/sec
803,233,322 iTLB-load-misses # 0.229182% miss rate
Total test time: 329.826087 seconds
A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:
/apex/com.android.art/lib64/libart.so
FilePmdMapped: 4096 kB
/apex/com.android.art/lib64/libart-compiler.so
FilePmdMapped: 2048 kB
Bug: 158135888
Link: https://lore.kernel.org/patchwork/patch/1408266/
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
Change-Id: I75c693a4b4e7526d374ef2c010bde3094233eef2
2021-03-24 08:29:26 +09:00
|
|
|
/*
|
|
|
|
* Paired with smp_mb() in do_dentry_open() to ensure
|
|
|
|
* i_writecount is up to date and the update to nr_thps is
|
|
|
|
* visible. Ensures the page cache will be truncated if the
|
|
|
|
* file is opened writable.
|
|
|
|
*/
|
|
|
|
smp_mb();
|
|
|
|
if (inode_is_open_for_write(mapping->host)) {
|
|
|
|
result = SCAN_FAIL;
|
|
|
|
__dec_node_page_state(new_page, NR_FILE_THPS);
|
|
|
|
filemap_nr_thps_dec(mapping);
|
|
|
|
goto xa_locked;
|
|
|
|
}
|
2019-09-24 07:38:03 +09:00
|
|
|
}
|
2019-09-24 07:38:00 +09:00
|
|
|
|
2018-12-01 07:10:39 +09:00
|
|
|
if (nr_none) {
|
2020-06-04 08:02:04 +09:00
|
|
|
__mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
|
2019-09-24 07:38:00 +09:00
|
|
|
if (is_shmem)
|
2020-06-04 08:02:04 +09:00
|
|
|
__mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
|
2018-12-01 07:10:39 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
xa_locked:
|
|
|
|
xas_unlock_irq(&xas);
|
2017-12-05 04:56:08 +09:00
|
|
|
xa_unlocked:
|
2018-12-01 07:10:39 +09:00
|
|
|
|
2016-07-27 07:26:32 +09:00
|
|
|
if (result == SCAN_SUCCEED) {
|
2017-12-05 04:56:08 +09:00
|
|
|
struct page *page, *tmp;
|
2016-07-27 07:26:32 +09:00
|
|
|
|
|
|
|
/*
|
2017-12-05 04:56:08 +09:00
|
|
|
* Replacing old pages with new one has succeeded, now we
|
|
|
|
* need to copy the content and free the old pages.
|
2016-07-27 07:26:32 +09:00
|
|
|
*/
|
2018-12-01 07:10:35 +09:00
|
|
|
index = start;
|
2016-07-27 07:26:32 +09:00
|
|
|
list_for_each_entry_safe(page, tmp, &pagelist, lru) {
|
2018-12-01 07:10:35 +09:00
|
|
|
while (index < page->index) {
|
|
|
|
clear_highpage(new_page + (index % HPAGE_PMD_NR));
|
|
|
|
index++;
|
|
|
|
}
|
2016-07-27 07:26:32 +09:00
|
|
|
copy_highpage(new_page + (page->index % HPAGE_PMD_NR),
|
|
|
|
page);
|
|
|
|
list_del(&page->lru);
|
|
|
|
page->mapping = NULL;
|
2018-12-01 07:10:39 +09:00
|
|
|
page_ref_unfreeze(page, 1);
|
2016-07-27 07:26:32 +09:00
|
|
|
ClearPageActive(page);
|
|
|
|
ClearPageUnevictable(page);
|
2018-12-01 07:10:39 +09:00
|
|
|
unlock_page(page);
|
2016-07-27 07:26:32 +09:00
|
|
|
put_page(page);
|
2018-12-01 07:10:35 +09:00
|
|
|
index++;
|
|
|
|
}
|
|
|
|
while (index < end) {
|
|
|
|
clear_highpage(new_page + (index % HPAGE_PMD_NR));
|
|
|
|
index++;
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
SetPageUptodate(new_page);
|
mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-01 07:10:43 +09:00
|
|
|
page_ref_add(new_page, HPAGE_PMD_NR - 1);
|
2020-06-04 08:02:40 +09:00
|
|
|
if (is_shmem)
|
2019-09-24 07:38:00 +09:00
|
|
|
set_page_dirty(new_page);
|
2020-06-04 08:02:40 +09:00
|
|
|
lru_cache_add(new_page);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
2018-12-01 07:10:39 +09:00
|
|
|
/*
|
|
|
|
* Remove pte page tables, so we can re-fault the page as huge.
|
|
|
|
*/
|
|
|
|
retract_page_tables(mapping, start);
|
2016-07-27 07:26:32 +09:00
|
|
|
*hpage = NULL;
|
2018-08-18 07:45:29 +09:00
|
|
|
|
|
|
|
khugepaged_pages_collapsed++;
|
2016-07-27 07:26:32 +09:00
|
|
|
} else {
|
2017-12-05 04:56:08 +09:00
|
|
|
struct page *page;
|
2018-12-01 07:10:29 +09:00
|
|
|
|
2017-12-05 04:56:08 +09:00
|
|
|
/* Something went wrong: roll back page cache changes */
|
|
|
|
xas_lock_irq(&xas);
|
2018-12-01 07:10:29 +09:00
|
|
|
mapping->nrpages -= nr_none;
|
2019-09-24 07:38:00 +09:00
|
|
|
|
|
|
|
if (is_shmem)
|
|
|
|
shmem_uncharge(mapping->host, nr_none);
|
2018-12-01 07:10:29 +09:00
|
|
|
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_set(&xas, start);
|
|
|
|
xas_for_each(&xas, page, end - 1) {
|
2016-07-27 07:26:32 +09:00
|
|
|
page = list_first_entry_or_null(&pagelist,
|
|
|
|
struct page, lru);
|
2017-12-05 04:56:08 +09:00
|
|
|
if (!page || xas.xa_index < page->index) {
|
2016-07-27 07:26:32 +09:00
|
|
|
if (!nr_none)
|
|
|
|
break;
|
|
|
|
nr_none--;
|
2016-12-13 09:43:35 +09:00
|
|
|
/* Put holes back where they were */
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_store(&xas, NULL);
|
2016-07-27 07:26:32 +09:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2017-12-05 04:56:08 +09:00
|
|
|
VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
|
|
|
/* Unfreeze the page. */
|
|
|
|
list_del(&page->lru);
|
|
|
|
page_ref_unfreeze(page, 2);
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_store(&xas, page);
|
|
|
|
xas_pause(&xas);
|
|
|
|
xas_unlock_irq(&xas);
|
2016-07-27 07:26:32 +09:00
|
|
|
unlock_page(page);
|
2018-12-01 07:10:39 +09:00
|
|
|
putback_lru_page(page);
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_lock_irq(&xas);
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
VM_BUG_ON(nr_none);
|
2017-12-05 04:56:08 +09:00
|
|
|
xas_unlock_irq(&xas);
|
2016-07-27 07:26:32 +09:00
|
|
|
|
|
|
|
new_page->mapping = NULL;
|
|
|
|
}
|
2018-12-01 07:10:39 +09:00
|
|
|
|
|
|
|
unlock_page(new_page);
|
2016-07-27 07:26:32 +09:00
|
|
|
out:
|
|
|
|
VM_BUG_ON(!list_empty(&pagelist));
|
2020-06-04 08:02:04 +09:00
|
|
|
if (!IS_ERR_OR_NULL(*hpage))
|
|
|
|
mem_cgroup_uncharge(*hpage);
|
2016-07-27 07:26:32 +09:00
|
|
|
/* TODO: tracepoints */
|
|
|
|
}
|
|
|
|
|
2019-09-24 07:37:57 +09:00
|
|
|
static void khugepaged_scan_file(struct mm_struct *mm,
|
|
|
|
struct file *file, pgoff_t start, struct page **hpage)
|
2016-07-27 07:26:32 +09:00
|
|
|
{
|
|
|
|
struct page *page = NULL;
|
2019-09-24 07:37:57 +09:00
|
|
|
struct address_space *mapping = file->f_mapping;
|
2017-12-05 05:06:23 +09:00
|
|
|
XA_STATE(xas, &mapping->i_pages, start);
|
2016-07-27 07:26:32 +09:00
|
|
|
int present, swap;
|
|
|
|
int node = NUMA_NO_NODE;
|
|
|
|
int result = SCAN_SUCCEED;
|
|
|
|
|
|
|
|
present = 0;
|
|
|
|
swap = 0;
|
|
|
|
memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
|
|
|
|
rcu_read_lock();
|
2017-12-05 05:06:23 +09:00
|
|
|
xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) {
|
|
|
|
if (xas_retry(&xas, page))
|
2016-07-27 07:26:32 +09:00
|
|
|
continue;
|
|
|
|
|
2017-12-05 05:06:23 +09:00
|
|
|
if (xa_is_value(page)) {
|
2016-07-27 07:26:32 +09:00
|
|
|
if (++swap > khugepaged_max_ptes_swap) {
|
|
|
|
result = SCAN_EXCEED_SWAP_PTE;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (PageTransCompound(page)) {
|
|
|
|
result = SCAN_PAGE_COMPOUND;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
node = page_to_nid(page);
|
|
|
|
if (khugepaged_scan_abort(node)) {
|
|
|
|
result = SCAN_SCAN_ABORT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
khugepaged_node_load[node]++;
|
|
|
|
|
|
|
|
if (!PageLRU(page)) {
|
|
|
|
result = SCAN_PAGE_LRU;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-09-24 07:38:00 +09:00
|
|
|
if (page_count(page) !=
|
|
|
|
1 + page_mapcount(page) + page_has_private(page)) {
|
2016-07-27 07:26:32 +09:00
|
|
|
result = SCAN_PAGE_COUNT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We probably should check if the page is referenced here, but
|
|
|
|
* nobody would transfer pte_young() to PageReferenced() for us.
|
|
|
|
* And rmap walk here is just too costly...
|
|
|
|
*/
|
|
|
|
|
|
|
|
present++;
|
|
|
|
|
|
|
|
if (need_resched()) {
|
2017-12-05 05:06:23 +09:00
|
|
|
xas_pause(&xas);
|
2016-07-27 07:26:32 +09:00
|
|
|
cond_resched_rcu();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
if (result == SCAN_SUCCEED) {
|
|
|
|
if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
|
|
|
|
result = SCAN_EXCEED_NONE_PTE;
|
|
|
|
} else {
|
|
|
|
node = khugepaged_find_target_node();
|
2019-09-24 07:37:57 +09:00
|
|
|
collapse_file(mm, file, start, hpage, node);
|
2016-07-27 07:26:32 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* TODO: tracepoints */
|
|
|
|
}
|
|
|
|
#else
|
2019-09-24 07:37:57 +09:00
|
|
|
static void khugepaged_scan_file(struct mm_struct *mm,
|
|
|
|
struct file *file, pgoff_t start, struct page **hpage)
|
2016-07-27 07:26:32 +09:00
|
|
|
{
|
|
|
|
BUILD_BUG();
|
|
|
|
}
|
2019-09-24 07:38:30 +09:00
|
|
|
|
|
|
|
static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2016-07-27 07:26:32 +09:00
|
|
|
#endif
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
|
|
|
|
struct page **hpage)
|
|
|
|
__releases(&khugepaged_mm_lock)
|
|
|
|
__acquires(&khugepaged_mm_lock)
|
|
|
|
{
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
struct mm_struct *mm;
|
|
|
|
struct vm_area_struct *vma;
|
|
|
|
int progress = 0;
|
|
|
|
|
|
|
|
VM_BUG_ON(!pages);
|
2018-10-05 15:45:47 +09:00
|
|
|
lockdep_assert_held(&khugepaged_mm_lock);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
if (khugepaged_scan.mm_slot)
|
|
|
|
mm_slot = khugepaged_scan.mm_slot;
|
|
|
|
else {
|
|
|
|
mm_slot = list_entry(khugepaged_scan.mm_head.next,
|
|
|
|
struct mm_slot, mm_node);
|
|
|
|
khugepaged_scan.address = 0;
|
|
|
|
khugepaged_scan.mm_slot = mm_slot;
|
|
|
|
}
|
|
|
|
spin_unlock(&khugepaged_mm_lock);
|
2019-09-24 07:38:30 +09:00
|
|
|
khugepaged_collapse_pte_mapped_thps(mm_slot);
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
mm = mm_slot->mm;
|
2018-02-01 09:18:28 +09:00
|
|
|
/*
|
|
|
|
* Don't wait for semaphore (to avoid long wait times). Just move to
|
|
|
|
* the next mm on the list.
|
|
|
|
*/
|
|
|
|
vma = NULL;
|
2020-06-09 13:33:25 +09:00
|
|
|
if (unlikely(!mmap_read_trylock(mm)))
|
2020-06-09 13:33:54 +09:00
|
|
|
goto breakouterloop_mmap_lock;
|
2018-02-01 09:18:28 +09:00
|
|
|
if (likely(!khugepaged_test_exit(mm)))
|
2016-07-27 07:26:24 +09:00
|
|
|
vma = find_vma(mm, khugepaged_scan.address);
|
|
|
|
|
|
|
|
progress++;
|
|
|
|
for (; vma; vma = vma->vm_next) {
|
|
|
|
unsigned long hstart, hend;
|
|
|
|
|
|
|
|
cond_resched();
|
|
|
|
if (unlikely(khugepaged_test_exit(mm))) {
|
|
|
|
progress++;
|
|
|
|
break;
|
|
|
|
}
|
mm: thp: pass correct vm_flags to hugepage_vma_check()
khugepaged_enter_vma_merge() passes a stale vma->vm_flags to
hugepage_vma_check(). The argument vm_flags contains the latest value.
Therefore, it is necessary to pass this vm_flags into
hugepage_vma_check().
With this bug, madvise(MADV_HUGEPAGE) for mmap files in shmem fails to
put memory in huge pages. Here is an example of failed madvise():
/* mount /dev/shm with huge=advise:
* mount -o remount,huge=advise /dev/shm */
/* create file /dev/shm/huge */
#define HUGE_FILE "/dev/shm/huge"
fd = open(HUGE_FILE, O_RDONLY);
ptr = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
ret = madvise(ptr, FILE_SIZE, MADV_HUGEPAGE);
madvise() will return 0, but this memory region is never put in huge
page (check from /proc/meminfo: ShmemHugePages).
Link: http://lkml.kernel.org/r/20180629181752.792831-1-songliubraving@fb.com
Fixes: 02b75dc8160d ("mm: thp: register mm for khugepaged when merging vma for shmem")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 07:47:00 +09:00
|
|
|
if (!hugepage_vma_check(vma, vma->vm_flags)) {
|
2016-07-27 07:26:24 +09:00
|
|
|
skip:
|
|
|
|
progress++;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
|
|
|
|
hend = vma->vm_end & HPAGE_PMD_MASK;
|
|
|
|
if (hstart >= hend)
|
|
|
|
goto skip;
|
|
|
|
if (khugepaged_scan.address > hend)
|
|
|
|
goto skip;
|
|
|
|
if (khugepaged_scan.address < hstart)
|
|
|
|
khugepaged_scan.address = hstart;
|
|
|
|
VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
|
2020-04-07 12:04:35 +09:00
|
|
|
if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
|
|
|
|
goto skip;
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
while (khugepaged_scan.address < hend) {
|
|
|
|
int ret;
|
|
|
|
cond_resched();
|
|
|
|
if (unlikely(khugepaged_test_exit(mm)))
|
|
|
|
goto breakouterloop;
|
|
|
|
|
|
|
|
VM_BUG_ON(khugepaged_scan.address < hstart ||
|
|
|
|
khugepaged_scan.address + HPAGE_PMD_SIZE >
|
|
|
|
hend);
|
2019-09-24 07:38:00 +09:00
|
|
|
if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
|
2020-04-07 12:04:35 +09:00
|
|
|
struct file *file = get_file(vma->vm_file);
|
2016-07-27 07:26:32 +09:00
|
|
|
pgoff_t pgoff = linear_page_index(vma,
|
|
|
|
khugepaged_scan.address);
|
2019-09-24 07:38:00 +09:00
|
|
|
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm);
|
2016-07-27 07:26:32 +09:00
|
|
|
ret = 1;
|
2019-09-24 07:37:57 +09:00
|
|
|
khugepaged_scan_file(mm, file, pgoff, hpage);
|
2016-07-27 07:26:32 +09:00
|
|
|
fput(file);
|
|
|
|
} else {
|
|
|
|
ret = khugepaged_scan_pmd(mm, vma,
|
|
|
|
khugepaged_scan.address,
|
|
|
|
hpage);
|
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
/* move to next address */
|
|
|
|
khugepaged_scan.address += HPAGE_PMD_SIZE;
|
|
|
|
progress += HPAGE_PMD_NR;
|
|
|
|
if (ret)
|
2020-06-09 13:33:54 +09:00
|
|
|
/* we released mmap_lock so break loop */
|
|
|
|
goto breakouterloop_mmap_lock;
|
2016-07-27 07:26:24 +09:00
|
|
|
if (progress >= pages)
|
|
|
|
goto breakouterloop;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
breakouterloop:
|
2020-06-09 13:33:25 +09:00
|
|
|
mmap_read_unlock(mm); /* exit_mmap will destroy ptes after this */
|
2020-06-09 13:33:54 +09:00
|
|
|
breakouterloop_mmap_lock:
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
spin_lock(&khugepaged_mm_lock);
|
|
|
|
VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
|
|
|
|
/*
|
|
|
|
* Release the current mm_slot if this mm is about to die, or
|
|
|
|
* if we scanned all vmas of this mm.
|
|
|
|
*/
|
|
|
|
if (khugepaged_test_exit(mm) || !vma) {
|
|
|
|
/*
|
|
|
|
* Make sure that if mm_users is reaching zero while
|
|
|
|
* khugepaged runs here, khugepaged_exit will find
|
|
|
|
* mm_slot not pointing to the exiting mm.
|
|
|
|
*/
|
|
|
|
if (mm_slot->mm_node.next != &khugepaged_scan.mm_head) {
|
|
|
|
khugepaged_scan.mm_slot = list_entry(
|
|
|
|
mm_slot->mm_node.next,
|
|
|
|
struct mm_slot, mm_node);
|
|
|
|
khugepaged_scan.address = 0;
|
|
|
|
} else {
|
|
|
|
khugepaged_scan.mm_slot = NULL;
|
|
|
|
khugepaged_full_scans++;
|
|
|
|
}
|
|
|
|
|
|
|
|
collect_mm_slot(mm_slot);
|
|
|
|
}
|
|
|
|
|
|
|
|
return progress;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int khugepaged_has_work(void)
|
|
|
|
{
|
|
|
|
return !list_empty(&khugepaged_scan.mm_head) &&
|
|
|
|
khugepaged_enabled();
|
|
|
|
}
|
|
|
|
|
|
|
|
static int khugepaged_wait_event(void)
|
|
|
|
{
|
|
|
|
return !list_empty(&khugepaged_scan.mm_head) ||
|
|
|
|
kthread_should_stop();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void khugepaged_do_scan(void)
|
|
|
|
{
|
|
|
|
struct page *hpage = NULL;
|
|
|
|
unsigned int progress = 0, pass_through_head = 0;
|
|
|
|
unsigned int pages = khugepaged_pages_to_scan;
|
|
|
|
bool wait = true;
|
|
|
|
|
|
|
|
barrier(); /* write khugepaged_pages_to_scan to local stack */
|
|
|
|
|
2020-06-04 08:00:12 +09:00
|
|
|
lru_add_drain_all();
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
while (progress < pages) {
|
|
|
|
if (!khugepaged_prealloc_page(&hpage, &wait))
|
|
|
|
break;
|
|
|
|
|
|
|
|
cond_resched();
|
|
|
|
|
|
|
|
if (unlikely(kthread_should_stop() || try_to_freeze()))
|
|
|
|
break;
|
|
|
|
|
|
|
|
spin_lock(&khugepaged_mm_lock);
|
|
|
|
if (!khugepaged_scan.mm_slot)
|
|
|
|
pass_through_head++;
|
|
|
|
if (khugepaged_has_work() &&
|
|
|
|
pass_through_head < 2)
|
|
|
|
progress += khugepaged_scan_mm_slot(pages - progress,
|
|
|
|
&hpage);
|
|
|
|
else
|
|
|
|
progress = pages;
|
|
|
|
spin_unlock(&khugepaged_mm_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!IS_ERR_OR_NULL(hpage))
|
|
|
|
put_page(hpage);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool khugepaged_should_wakeup(void)
|
|
|
|
{
|
|
|
|
return kthread_should_stop() ||
|
|
|
|
time_after_eq(jiffies, khugepaged_sleep_expire);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void khugepaged_wait_work(void)
|
|
|
|
{
|
|
|
|
if (khugepaged_has_work()) {
|
|
|
|
const unsigned long scan_sleep_jiffies =
|
|
|
|
msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
|
|
|
|
|
|
|
|
if (!scan_sleep_jiffies)
|
|
|
|
return;
|
|
|
|
|
|
|
|
khugepaged_sleep_expire = jiffies + scan_sleep_jiffies;
|
|
|
|
wait_event_freezable_timeout(khugepaged_wait,
|
|
|
|
khugepaged_should_wakeup(),
|
|
|
|
scan_sleep_jiffies);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (khugepaged_enabled())
|
|
|
|
wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
|
|
|
|
}
|
|
|
|
|
|
|
|
static int khugepaged(void *none)
|
|
|
|
{
|
|
|
|
struct mm_slot *mm_slot;
|
|
|
|
|
|
|
|
set_freezable();
|
|
|
|
set_user_nice(current, MAX_NICE);
|
|
|
|
|
|
|
|
while (!kthread_should_stop()) {
|
|
|
|
khugepaged_do_scan();
|
|
|
|
khugepaged_wait_work();
|
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&khugepaged_mm_lock);
|
|
|
|
mm_slot = khugepaged_scan.mm_slot;
|
|
|
|
khugepaged_scan.mm_slot = NULL;
|
|
|
|
if (mm_slot)
|
|
|
|
collect_mm_slot(mm_slot);
|
|
|
|
spin_unlock(&khugepaged_mm_lock);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void set_recommended_min_free_kbytes(void)
|
|
|
|
{
|
|
|
|
struct zone *zone;
|
|
|
|
int nr_zones = 0;
|
|
|
|
unsigned long recommended_min;
|
|
|
|
|
mm/thp: don't count ZONE_MOVABLE as the target for freepage reserving
There was a regression report for "mm/cma: manage the memory of the CMA
area by using the ZONE_MOVABLE" [1] and I think that it is related to
this problem. CMA patchset makes the system use one more zone
(ZONE_MOVABLE) and then increases min_free_kbytes. It reduces usable
memory and it could cause regression.
ZONE_MOVABLE only has movable pages so we don't need to keep enough
freepages to avoid or deal with fragmentation. So, don't count it.
This changes min_free_kbytes and thus min_watermark greatly if
ZONE_MOVABLE is used. It will make the user uses more memory.
System:
22GB ram, fakenuma, 2 nodes. 5 zones are used.
Before:
min_free_kbytes: 112640
zone_info (min_watermark):
Node 0, zone DMA
min 19
Node 0, zone DMA32
min 3778
Node 0, zone Normal
min 10191
Node 0, zone Movable
min 0
Node 0, zone Device
min 0
Node 1, zone DMA
min 0
Node 1, zone DMA32
min 0
Node 1, zone Normal
min 14043
Node 1, zone Movable
min 127
Node 1, zone Device
min 0
After:
min_free_kbytes: 90112
zone_info (min_watermark):
Node 0, zone DMA
min 15
Node 0, zone DMA32
min 3022
Node 0, zone Normal
min 8152
Node 0, zone Movable
min 0
Node 0, zone Device
min 0
Node 1, zone DMA
min 0
Node 1, zone DMA32
min 0
Node 1, zone Normal
min 11234
Node 1, zone Movable
min 102
Node 1, zone Device
min 0
[1] (lkml.kernel.org/r/20180102063528.GG30397%20()%20yexl-desktop)
Link: http://lkml.kernel.org/r/1522913236-15776-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 08:30:27 +09:00
|
|
|
for_each_populated_zone(zone) {
|
|
|
|
/*
|
|
|
|
* We don't need to worry about fragmentation of
|
|
|
|
* ZONE_MOVABLE since it only has movable pages.
|
|
|
|
*/
|
|
|
|
if (zone_idx(zone) > gfp_zone(GFP_USER))
|
|
|
|
continue;
|
|
|
|
|
2016-07-27 07:26:24 +09:00
|
|
|
nr_zones++;
|
mm/thp: don't count ZONE_MOVABLE as the target for freepage reserving
There was a regression report for "mm/cma: manage the memory of the CMA
area by using the ZONE_MOVABLE" [1] and I think that it is related to
this problem. CMA patchset makes the system use one more zone
(ZONE_MOVABLE) and then increases min_free_kbytes. It reduces usable
memory and it could cause regression.
ZONE_MOVABLE only has movable pages so we don't need to keep enough
freepages to avoid or deal with fragmentation. So, don't count it.
This changes min_free_kbytes and thus min_watermark greatly if
ZONE_MOVABLE is used. It will make the user uses more memory.
System:
22GB ram, fakenuma, 2 nodes. 5 zones are used.
Before:
min_free_kbytes: 112640
zone_info (min_watermark):
Node 0, zone DMA
min 19
Node 0, zone DMA32
min 3778
Node 0, zone Normal
min 10191
Node 0, zone Movable
min 0
Node 0, zone Device
min 0
Node 1, zone DMA
min 0
Node 1, zone DMA32
min 0
Node 1, zone Normal
min 14043
Node 1, zone Movable
min 127
Node 1, zone Device
min 0
After:
min_free_kbytes: 90112
zone_info (min_watermark):
Node 0, zone DMA
min 15
Node 0, zone DMA32
min 3022
Node 0, zone Normal
min 8152
Node 0, zone Movable
min 0
Node 0, zone Device
min 0
Node 1, zone DMA
min 0
Node 1, zone DMA32
min 0
Node 1, zone Normal
min 11234
Node 1, zone Movable
min 102
Node 1, zone Device
min 0
[1] (lkml.kernel.org/r/20180102063528.GG30397%20()%20yexl-desktop)
Link: http://lkml.kernel.org/r/1522913236-15776-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 08:30:27 +09:00
|
|
|
}
|
2016-07-27 07:26:24 +09:00
|
|
|
|
|
|
|
/* Ensure 2 pageblocks are free to assist fragmentation avoidance */
|
|
|
|
recommended_min = pageblock_nr_pages * nr_zones * 2;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that on average at least two pageblocks are almost free
|
|
|
|
* of another type, one for a migratetype to fall back to and a
|
|
|
|
* second to avoid subsequent fallbacks of other types There are 3
|
|
|
|
* MIGRATE_TYPES we care about.
|
|
|
|
*/
|
|
|
|
recommended_min += pageblock_nr_pages * nr_zones *
|
|
|
|
MIGRATE_PCPTYPES * MIGRATE_PCPTYPES;
|
|
|
|
|
|
|
|
/* don't ever allow to reserve more than 5% of the lowmem */
|
|
|
|
recommended_min = min(recommended_min,
|
|
|
|
(unsigned long) nr_free_buffer_pages() / 20);
|
|
|
|
recommended_min <<= (PAGE_SHIFT-10);
|
|
|
|
|
|
|
|
if (recommended_min > min_free_kbytes) {
|
|
|
|
if (user_min_free_kbytes >= 0)
|
|
|
|
pr_info("raising min_free_kbytes from %d to %lu to help transparent hugepage allocations\n",
|
|
|
|
min_free_kbytes, recommended_min);
|
|
|
|
|
|
|
|
min_free_kbytes = recommended_min;
|
|
|
|
}
|
|
|
|
setup_per_zone_wmarks();
|
|
|
|
}
|
|
|
|
|
|
|
|
int start_stop_khugepaged(void)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
mutex_lock(&khugepaged_mutex);
|
|
|
|
if (khugepaged_enabled()) {
|
|
|
|
if (!khugepaged_thread)
|
|
|
|
khugepaged_thread = kthread_run(khugepaged, NULL,
|
|
|
|
"khugepaged");
|
|
|
|
if (IS_ERR(khugepaged_thread)) {
|
|
|
|
pr_err("khugepaged: kthread_run(khugepaged) failed\n");
|
|
|
|
err = PTR_ERR(khugepaged_thread);
|
|
|
|
khugepaged_thread = NULL;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!list_empty(&khugepaged_scan.mm_head))
|
|
|
|
wake_up_interruptible(&khugepaged_wait);
|
|
|
|
|
|
|
|
set_recommended_min_free_kbytes();
|
|
|
|
} else if (khugepaged_thread) {
|
|
|
|
kthread_stop(khugepaged_thread);
|
|
|
|
khugepaged_thread = NULL;
|
|
|
|
}
|
|
|
|
fail:
|
|
|
|
mutex_unlock(&khugepaged_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
2020-10-11 15:16:40 +09:00
|
|
|
|
|
|
|
void khugepaged_min_free_kbytes_update(void)
|
|
|
|
{
|
|
|
|
mutex_lock(&khugepaged_mutex);
|
|
|
|
if (khugepaged_enabled() && khugepaged_thread)
|
|
|
|
set_recommended_min_free_kbytes();
|
|
|
|
mutex_unlock(&khugepaged_mutex);
|
|
|
|
}
|