Go to file
Aaron Thompson 60806adc9b mm: Always release pages to the buddy allocator in memblock_free_late().
[ Upstream commit 115d9d77bb0f9152c60b6e8646369fa7f6167593 ]

If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.

In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().

For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.

One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.

For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:

v6.2-rc2:
  # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
  Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
  Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  178867

v6.2-rc2 + patch:
  # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
  Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
  Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  222816   # +43,949 pages

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18 11:44:59 +01:00
arch x86/boot: Avoid using Intel mnemonics in AT&T syntax asm 2023-01-18 11:44:58 +01:00
block blk-mq: fix possible memleak when register 'hctx' failed 2023-01-14 10:16:19 +01:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:13:17 +02:00
crypto crypto: tcrypt - Fix multibuffer skcipher speed test mem leak 2023-01-14 10:15:50 +01:00
Documentation iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options 2023-01-18 11:44:55 +01:00
drivers net/mlx5e: Don't support encap rules with gbp option 2023-01-18 11:44:59 +01:00
fs ext4: fix uninititialized value in 'ext4_evict_inode' 2023-01-18 11:44:57 +01:00
include dt-bindings: clocks: imx8mp: Add ID for usb suspend clock 2023-01-18 11:44:56 +01:00
init init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash 2022-12-02 17:40:03 +01:00
io_uring io_uring: Fix unsigned 'res' comparison with zero in io_fixup_rw_res() 2023-01-14 10:16:50 +01:00
ipc ipc/sem: Fix dangling sem_array access in semtimedop race 2022-12-08 11:24:00 +01:00
kernel tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line 2023-01-14 10:16:33 +01:00
lib mm/highmem: Lift memcpy_[to|from]_page to core 2023-01-14 10:16:42 +01:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm: Always release pages to the buddy allocator in memblock_free_late(). 2023-01-18 11:44:59 +01:00
net net/sched: act_mpls: Fix warning during failed attribute validation 2023-01-18 11:44:59 +01:00
samples samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe() 2023-01-14 10:16:01 +01:00
scripts scripts/faddr2line: Fix regression in name resolution on ppc64le 2022-12-08 11:23:54 +01:00
security device_cgroup: Roll back to original exceptions after copy failure 2023-01-14 10:16:36 +01:00
sound ASoC: wm8904: fix wrong outputs volume after power reactivation 2023-01-18 11:44:58 +01:00
tools perf auxtrace: Fix address filter duplicate symbol selection 2023-01-18 11:44:53 +01:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:25:48 +01:00
virt kvm: Add support for arch compat vm ioctls 2022-10-30 09:41:15 +01:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
Makefile Linux 5.10.163 2023-01-14 10:16:53 +01:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.