On architectures that support the preservation of memblock metadata
after __init, allow drivers to call memblock_free() to free a
reservation made by early arch code. This is a hack to support the
freeing of bootsplash reservations passed to Linux by the bootloader.
(This should be reworked in future versions of Android; do not
cherry-pick this patch forward.)
Bug: 139653858
Bug: 174620135
Bug: 249340121
Change-Id: I32c0ee70c33c94deff70aa548896caa9978396fb
Signed-off-by: Alistair Delva <adelva@google.com>
(cherry picked from commit 2eeee9f41c0cddbc300d9b89bba6d498998b2498)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmP2A8MACgkQONu9yGCS
aT7GrhAAky2nTRG9J0oPxh5Eu7wNKmjqDWNj9c6it3iGHpb+tfOY+LfPXHmWz0kX
NoaNYGZGD8SDbkmwrSOmFB1Q/0OZ4/aIwM7Kwcw72UJVvrlsKx1HwiJjXKk809ZL
bVlLUQzFTwyVIYcvjXQ8CuBHwBinLc3qkcyYGgbS8bseR4pDuxwoToDwAxk1d/0j
ozWuzUKhSdYHYIUrk3papUro2UpF+Kb7KFpNiVo2wMaZM7en2XK3khCt8TuojH6c
DXL+KZ/HbB8Ig1PWLaw2/6o4ispNy6bz7CJx6oDiOILR+le8xZA5WTdkXT3ovjyr
LxutmPTTw6PxextIyVRblJWzXNcjdlV552U4gnnngcWn6wg4D4otqYnYvTaAUc+u
sQnwrlQFxB2KfFKLNepGAy7klQJsYP3eadjDgGXP9TSmuUvUYRaNr6h0XukbyYkc
kx2+Tw51NMKEqhgnaiKDN8AZEDTuLu5F4+NrUertxlb3PWeRRMRYVGJ1uw0KJg6t
d5eniCB00SaaqN6M68u/hRYRi3gnwIsU7DitEpqejqwzskMpgegMFvebmCwORiq3
D+FD4EHOlztIToXhmEOXp0cz8fs27MuWmq4GkSwXvJuq+id5cQFdDN5GeLgNdAvH
Kiu/Y+DY6ObW31tAQ1Jjp20L2RaWWvubrCBGeIqiDzUWmCohsks=
=TXvc
-----END PGP SIGNATURE-----
Merge 6.1.13 into android14-6.1
Changes in 6.1.13
mptcp: sockopt: make 'tcp_fastopen_connect' generic
mptcp: fix locking for setsockopt corner-case
mptcp: deduplicate error paths on endpoint creation
mptcp: fix locking for in-kernel listener creation
btrfs: move the auto defrag code to defrag.c
btrfs: lock the inode in shared mode before starting fiemap
ASoC: amd: yc: Add DMI support for new acer/emdoor platforms
ASoC: SOF: sof-audio: start with the right widget type
ALSA: usb-audio: Add FIXED_RATE quirk for JBL Quantum610 Wireless
ASoC: Intel: sof_rt5682: always set dpcm_capture for amplifiers
ASoC: Intel: sof_cs42l42: always set dpcm_capture for amplifiers
ASoC: Intel: sof_nau8825: always set dpcm_capture for amplifiers
ASoC: Intel: sof_ssp_amp: always set dpcm_capture for amplifiers
selftests/bpf: Verify copy_register_state() preserves parent/live fields
ALSA: hda: Do not unset preset when cleaning up codec
ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022 into DMI table
bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself
ASoC: cs42l56: fix DT probe
tools/virtio: fix the vringh test for virtio ring changes
vdpa: ifcvf: Do proper cleanup if IFCVF init fails
net/rose: Fix to not accept on connected socket
selftest: net: Improve IPV6_TCLASS/IPV6_HOPLIMIT tests apparmor compatibility
net: stmmac: do not stop RX_CLK in Rx LPI state for qcs404 SoC
powerpc/64: Fix perf profiling asynchronous interrupt handlers
fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work()
drm/nouveau/devinit/tu102-: wait for GFW_BOOT_PROGRESS == COMPLETED
net: ethernet: mtk_eth_soc: Avoid truncating allocation
net: sched: sch: Bounds check priority
s390/decompressor: specify __decompress() buf len to avoid overflow
nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set
nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set
drm/amd/display: Add missing brackets in calculation
drm/amd/display: Adjust downscaling limits for dcn314
drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2
drm/amd/display: Reset DMUB mailbox SW state after HW reset
drm/amdgpu: enable HDP SD for gfx 11.0.3
drm/amdgpu: Enable vclk dclk node for gc11.0.3
drm/amd/display: Properly handle additional cases where DCN is not supported
platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match
ceph: move mount state enum to super.h
ceph: blocklist the kclient when receiving corrupted snap trace
selftests: mptcp: userspace: fix v4-v6 test in v6.1
of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem
kasan: fix Oops due to missing calls to kasan_arch_is_ready()
mm: shrinkers: fix deadlock in shrinker debugfs
aio: fix mremap after fork null-deref
vmxnet3: move rss code block under eop descriptor
fbdev: Fix invalid page access after closing deferred I/O devices
drm: Disable dynamic debug as broken
drm/amd/amdgpu: fix warning during suspend
drm/amd/display: Fail atomic_check early on normalize_zpos error
drm/vmwgfx: Stop accessing buffer objects which failed init
drm/vmwgfx: Do not drop the reference to the handle too soon
mmc: jz4740: Work around bug on JZ4760(B)
mmc: meson-gx: fix SDIO mode if cap_sdio_irq isn't set
mmc: sdio: fix possible resource leaks in some error paths
mmc: mmc_spi: fix error handling in mmc_spi_probe()
ALSA: hda: Fix codec device field initializan
ALSA: hda/conexant: add a new hda codec SN6180
ALSA: hda/realtek - fixed wrong gpio assigned
ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform.
ALSA: hda/realtek: Enable mute/micmute LEDs and speaker support for HP Laptops
ata: ahci: Add Tiger Lake UP{3,4} AHCI controller
ata: libata-core: Disable READ LOG DMA EXT for Samsung MZ7LH
sched/psi: Fix use-after-free in ep_remove_wait_queue()
hugetlb: check for undefined shift on 32 bit architectures
nilfs2: fix underflow in second superblock position calculations
mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
mm/filemap: fix page end in filemap_get_read_batch
mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
gpio: sim: fix a memory leak
freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL
coredump: Move dump_emit_page() to kill unused warning
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
net: Fix unwanted sign extension in netdev_stats_to_stats64()
revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"
drm/vc4: crtc: Increase setup cost in core clock calculation to handle extreme reduced blanking
drm/vc4: Fix YUV plane handling when planes are in different buffers
drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list
ice: fix lost multicast packets in promisc mode
ixgbe: allow to increase MTU to 3K with XDP enabled
i40e: add double of VLAN header when computing the max MTU
net: bgmac: fix BCM5358 support by setting correct flags
net: ethernet: ti: am65-cpsw: Add RX DMA Channel Teardown Quirk
sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list
net/sched: tcindex: update imperfect hash filters respecting rcu
ice: xsk: Fix cleaning of XDP_TX frames
dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.
net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
net/sched: act_ctinfo: use percpu stats
net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()
net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
bnxt_en: Fix mqprio and XDP ring checking logic
tracing: Make trace_define_field_ext() static
net: stmmac: Restrict warning on disabling DMA store and fwd mode
net: use a bounce buffer for copying skb->mark
tipc: fix kernel warning when sending SYN message
net: mpls: fix stale pointer if allocation fails during device rename
igb: conditionalize I2C bit banging on external thermal sensor support
igb: Fix PPS input and output using 3rd and 4th SDP
ixgbe: add double of VLAN header when computing the max MTU
ipv6: Fix datagram socket connection with DSCP.
ipv6: Fix tcp socket connection with DSCP.
mm/gup: add folio to list when folio_isolate_lru() succeed
mm: extend max struct page size for kmsan
i40e: Add checking for null for nlmsg_find_attr()
net/sched: tcindex: search key must be 16 bits
nvme-tcp: stop auth work after tearing down queues in error recovery
nvme-rdma: stop auth work after tearing down queues in error recovery
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
kvm: initialize all of the kvm_debugregs structure before sending it to userspace
perf/x86: Refuse to export capabilities for hybrid PMUs
alarmtimer: Prevent starvation by small intervals and SIG_IGN
nvme-pci: refresh visible attrs for cmb attributes
ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
net: sched: sch: Fix off by one in htb_activate_prios()
Linux 6.1.13
Change-Id: I8a1e4175939c14f726c545001061b95462566386
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 647037adcad00f2bab8828d3d41cd0553d41f3bd upstream.
This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593.
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range,
__free_one_page() might access nearby uninitialized pages when trying to
coalesce buddies. This can, for example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8
RIP: 0010:__list_del_entry_valid+0x3f/0x70
<TASK>
__free_one_page+0x139/0x410
__free_pages_ok+0x21d/0x450
memblock_free_late+0x8c/0xb9
efi_free_boot_services+0x16b/0x25c
efi_enter_virtual_mode+0x403/0x446
start_kernel+0x678/0x714
secondary_startup_64_no_verify+0xd2/0xdb
</TASK>
A proper fix will be more involved so revert this change for the time
being.
Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmPH0OcACgkQONu9yGCS
aT4CvA//aNaXbZpRlZZI9uR40B9X3ROao6UGFKAbJYTcaTNQZIAybuLY4/VKYhS3
3YNXPJghsge7rUKz2AsmksHhwRmU8pOUJOtAqL6HB0UVSpYNNo6KVbwgxFZgAFhw
XGkXO+h3y8BJjwiVUpmTj+HlmXVDcaVMEItMu4C9X2Z7wV2QLgKUbbq2kUlEX9DX
BfxWgd0tORsZFMVLy2JXahrRtH72TgfD8g3K+jHFfEsk9ySOaN58Mf736hSSNY0A
9jHyTWiwFRqC2nYtSvE/BQmEae8gEQp/wDZR8Qwu2Q51MtIh1Z1xCqMGN+1Fmrow
8q38lPJIoXeQbKKCmBcTJrXz5dqjjnDANl2oucQvKuhvfAMfvC9+w42kGBulTKaE
Ul9aqSKsyaFP6BzHJ8BIjFMhE5pXxsAKCRNsGSfxvEf1MrFuqaC+5yUdzVJyAPEQ
AHgPRKnpu5jIZKNqPYDbcj3WF2SRZqHboPVZV3pROtkh8KKMAYS4dGwi+CpCBdLD
GWCNqtLGDOJW196WThLWkrxT1xU4/x2zmiBb6ua138W9WpQnytLv7HicBXf7XnWt
LmNqs2ADs7/hGd521wA9mT2mHkS+KZpxrV9d3IYhEdf4xExbZdfhOQCvU9RcYZDO
ln01DCEY6mzrQA56s72hojj80sjauB2+ytDfuWIGJ6lnkfB2g+Q=
=hRZq
-----END PGP SIGNATURE-----
Merge 6.1.7 into android14-6.1
Changes in 6.1.7
netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bits
Revert "ALSA: usb-audio: Drop superfluous interface setup at parsing"
ALSA: control-led: use strscpy in set_led_id()
ALSA: usb-audio: Always initialize fixed_rate in snd_usb_find_implicit_fb_sync_format()
ALSA: hda/realtek - Turn on power early
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx
KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
KVM: arm64: Fix S1PTW handling on RO memslots
efi: fix userspace infinite retry read efivars after EFI runtime services page fault
efi: tpm: Avoid READ_ONCE() for accessing the event log
docs: Fix the docs build with Sphinx 6.0
io_uring/poll: add hash if ready poll request can't complete inline
arm64: mte: Fix double-freeing of the temporary tag storage during coredump
arm64: mte: Avoid the racy walk of the vma list during core dump
arm64: cmpxchg_double*: hazard against entire exchange variable
ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops
net: stmmac: add aux timestamps fifo clearance wait
perf auxtrace: Fix address filter duplicate symbol selection
s390/kexec: fix ipl report address for kdump
brcmfmac: Prefer DT board type over DMI board type
ASoC: qcom: lpass-cpu: Fix fallback SD line index handling
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering
s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops
s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()
drm/virtio: Fix GEM handle creation UAF
drm/amd/pm/smu13: BACO is supported when it's in BACO state
drm: Optimize drm buddy top-down allocation method
drm/i915/gt: Reset twice
drm/i915: Reserve enough fence slot for i915_vma_unbind_async
drm/i915: Fix potential context UAFs
drm/amd: Delay removal of the firmware framebuffer
drm/amdgpu: Fixed bug on error when unloading amdgpu
drm/amd/pm: correct the reference clock for fan speed(rpm) calculation
drm/amd/pm: add the missing mapping for PPT feature on SMU13.0.0 and 13.0.7
drm/amd/display: move remaining FPU code to dml folder
Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly""
cifs: Fix uninitialized memory read for smb311 posix symlink create
cifs: fix file info setting in cifs_query_path_info()
cifs: fix file info setting in cifs_open_file()
cifs: do not query ifaces on smb1 mounts
cifs: fix double free on failed kerberos auth
io_uring/fdinfo: include locked hash table in fdinfo output
ASoC: rt9120: Make dev PM runtime bind AsoC component PM
ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline
platform/x86: dell-privacy: Only register SW_CAMERA_LENS_COVER if present
platform/surface: aggregator: Ignore command messages not intended for us
platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode
platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode
platform/x86: asus-wmi: Don't load fan curves without fan
platform/x86: dell-privacy: Fix SW_CAMERA_LENS_COVER reporting
dt-bindings: msm: dsi-controller-main: Fix operating-points-v2 constraint
drm/msm: another fix for the headless Adreno GPU
firmware/psci: Fix MEM_PROTECT_RANGE function numbers
firmware/psci: Don't register with debugfs if PSCI isn't available
drm/msm/adreno: Make adreno quirks not overwrite each other
arm64/signal: Always allocate SVE signal frames on SME only systems
dt-bindings: msm: dsi-controller-main: Fix power-domain constraint
dt-bindings: msm: dsi-controller-main: Fix description of core clock
arm64/signal: Always accept SVE signal frames on SME only systems
arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
dt-bindings: msm: dsi-phy-28nm: Add missing qcom, dsi-phy-regulator-ldo-mode
arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
arm64/mm: fix incorrect file_map_count for invalid pmd
platform/x86: ideapad-laptop: Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[]
drm/msm/dp: do not complete dp_aux_cmd_fifo_tx() if irq is not for aux transfer
dt-bindings: msm/dsi: Don't require vdds-supply on 10nm PHY
dt-bindings: msm/dsi: Don't require vcca-supply on 14nm PHY
platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe
ixgbe: fix pci device refcount leak
ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
iavf/iavf_main: actually log ->src mask when talking about it
drm/i915/gt: Cleanup partial engine discovery failures
usb: ulpi: defer ulpi_register on ulpi_read_id timeout
drm/amd/pm: enable mode1 reset on smu_v13_0_10
drm/amd/pm: Enable bad memory page/channel recording support for smu v13_0_0
drm/amd/pm: enable GPO dynamic control support for SMU13.0.0
drm/amd/pm: enable GPO dynamic control support for SMU13.0.7
drm/amdgpu: add soc21 common ip block support for GC 11.0.4
drm/amdgpu: Enable pg/cg flags on GC11_0_4 for VCN
drm/amdgpu: enable VCN DPG for GC IP v11.0.4
mm: Always release pages to the buddy allocator in memblock_free_late().
iommu/iova: Fix alloc iova overflows issue
iommu/arm-smmu-v3: Don't unregister on shutdown
iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()
iommu/arm-smmu: Don't unregister on shutdown
iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer
sched/core: Fix use-after-free bug in dup_user_cpus_ptr()
netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.
selftests: netfilter: fix transaction test script timeout handling
powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
EDAC/device: Fix period calculation in edac_device_reset_delay_period()
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
x86/resctrl: Fix task CLOSID/RMID update race
x86/resctrl: Fix event counts regression in reused RMIDs
regulator: da9211: Use irq handler when ready
scsi: storvsc: Fix swiotlb bounce buffer leak in confidential VM
scsi: mpi3mr: Refer CONFIG_SCSI_MPI3MR in Makefile
scsi: ufs: core: WLUN suspend SSU/enter hibern8 fail recovery
ASoC: Intel: fix sof-nau8825 link failure
ASoC: Intel: sof_nau8825: support rt1015p speaker amplifier
ASoC: Intel: sof-nau8825: fix module alias overflow
drm/msm/dpu: Fix some kernel-doc comments
drm/msm/dpu: Fix memory leak in msm_mdss_parse_data_bus_icc_path
ASoC: wm8904: fix wrong outputs volume after power reactivation
mtd: parsers: scpart: fix __udivdi3 undefined on mips
mtd: cfi: allow building spi-intel standalone
ALSA: usb-audio: Make sure to stop endpoints before closing EPs
ALSA: usb-audio: Relax hw constraints for implicit fb sync
stmmac: dwmac-mediatek: remove the dwmac_fix_mac_speed
tipc: fix unexpected link reset due to discovery messages
NFSD: Pass the target nfsd_file to nfsd_commit()
NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
nfsd: remove the pages_flushed statistic from filecache
nfsd: reorganize filecache.c
NFSD: Add an nfsd_file_fsync tracepoint
nfsd: rework refcounting in filecache
nfsd: fix handling of cached open files in nfsd4_open codepath
octeontx2-af: Fix LMAC config in cgx_lmac_rx_tx_enable
sched/core: Fix arch_scale_freq_tick() on tickless systems
hvc/xen: lock console list traversal
nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()
gro: avoid checking for a failed search
gro: take care of DODGY packets
af_unix: selftest: Fix the size of the parameter to connect()
ASoC: qcom: Fix building APQ8016 machine driver without SOUNDWIRE
tools/nolibc: restore mips branch ordering in the _start block
tools/nolibc: fix the O_* fcntl/open macro definitions for riscv
drm/amdgpu: Fix potential NULL dereference
ice: Fix potential memory leak in ice_gnss_tty_write()
ice: Add check for kzalloc
drm/vmwgfx: Write the driver id registers
drm/vmwgfx: Refactor resource manager's hashtable to use linux/hashtable implementation.
drm/vmwgfx: Remove ttm object hashtable
drm/vmwgfx: Refactor resource validation hashtable to use linux/hashtable implementation.
drm/vmwgfx: Refactor ttm reference object hashtable to use linux/hashtable.
drm/vmwgfx: Remove vmwgfx_hashtab
drm/vmwgfx: Remove rcu locks from user resources
net/sched: act_mpls: Fix warning during failed attribute validation
Revert "r8169: disable detection of chip version 36"
net/mlx5: check attr pointer validity before dereferencing it
net/mlx5e: TC, Keep mod hdr actions after mod hdr alloc
net/mlx5: Fix command stats access after free
net/mlx5e: Verify dev is present for fix features ndo
net/mlx5e: IPoIB, Block queue count configuration when sub interfaces are present
net/mlx5e: IPoIB, Block PKEY interfaces with less rx queues than parent
net/mlx5e: IPoIB, Fix child PKEY interface stats on rx path
net/mlx5: Fix ptp max frequency adjustment range
net/mlx5e: Don't support encap rules with gbp option
net/mlx5e: Fix macsec ssci attribute handling in offload path
net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY)
selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad".
selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns.
selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure.
octeontx2-pf: Fix resource leakage in VF driver unbind
perf build: Properly guard libbpf includes
perf kmem: Support legacy tracepoints
perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring
igc: Fix PPS delta between two synchronized end-points
net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()
net: hns3: fix wrong use of rss size during VF rss config
bnxt: make sure we return pages to the pool
platform/surface: aggregator: Add missing call to ssam_request_sync_free()
platform/x86/amd: Fix refcount leak in amd_pmc_probe
ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate()
efi: fix NULL-deref in init error path
io_uring: lock overflowing for IOPOLL
io_uring/poll: attempt request issue after racy poll wakeup
drm/i915: Fix CFI violations in gt_sysfs
io_uring/io-wq: free worker if task_work creation is canceled
io_uring/io-wq: only free worker if it was allocated for creation
block: handle bio_split_to_limits() NULL return
Revert "usb: ulpi: defer ulpi_register on ulpi_read_id timeout"
pinctrl: amd: Add dynamic debugging for active GPIOs
Linux 6.1.7
Change-Id: Ib9cf1bfac2998d354fca458302541e706284d07a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 115d9d77bb0f9152c60b6e8646369fa7f6167593 upstream.
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.
One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
# grep -E 'Node|spanned|present|managed' /proc/zoneinfo
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
# grep -E 'Node|spanned|present|managed' /proc/zoneinfo
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add pageblock_align() macro and use it to simplify code.
Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then
they could be used somewhere else, not only in compaction, also use
ALIGN_DOWN() instead of round_down() to be pair with ALIGN(), which should
be same for pageblock usage.
Link: https://lkml.kernel.org/r/20220907060844.126891-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* An optimization in memblock_add_range() to reduce array traversals
* Improvements to the memblock test suite
-----BEGIN PGP SIGNATURE-----
iQFMBAABCAA2FiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmLyCbgYHG1pa2UucmFw
b3BvcnRAZ21haWwuY29tAAoJEDkDhibLDv2RBxMH/1uIcfERl3Cbw25zluWSVn4O
mrnr+JPqUkyeVLQDEGzk/VWIM1WT11s7fFpoTpIwu3dq/fVoD3HZlZQkWS0ANFDL
V3xf6Xz17R5ZNoZmacczhNaBqkJSi+dcvoAevjyBHPpKEaCLC/rNrISpDdCD0Lz0
5fgv2F4sISBUVc6FVIFB+9zKC/neI9ewemCABSFTIa5mmQvZZwX1Tj5BrxIsESwN
DwX5u1Q65SoFBbAk6F5+aoClJ7wMGz8OlZoFw106HTvxq8sNne27KXW9mKugBzJr
yAZ/TWrjXigNAr8dcXQEZuxagFSB1PQ4aNgU8phiAwE7/5z3j1KLa65hDRzc9t4=
=JMiG
-----END PGP SIGNATURE-----
Merge tag 'memblock-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- An optimization in memblock_add_range() to reduce array traversals
- Improvements to the memblock test suite
* tag 'memblock-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock test: Modify the obsolete description in README
memblock tests: fix compilation errors
memblock tests: change build options to run-time options
memblock tests: remove completed TODO items
memblock tests: set memblock_debug to enable memblock_dbg() messages
memblock tests: add verbose output to memblock tests
memblock tests: Makefile: add arguments to control verbosity
memblock: avoid some repeat when add new range
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error
occurs, and the BIOS will mark the corresponding area (for example, 2 MB)
as unusable. When the system restarts next time, these areas are not
reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an
increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to
a larger number of memblocks.
For example, if the EFI_UNUSABLE_MEMORY type is reported:
...
memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
...
The EFI memory map is parsed to construct the memblock arrays before the
memblock arrays can be resized. As the result, memory regions beyond
INIT_MEMBLOCK_REGIONS are lost.
Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace
INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory
array.
Allow overriding memblock.memory array size with architecture defined
INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Darren Hart <darren@os.amperecomputing.com>
Acked-by: Will Deacon <will@kernel.org> [arm64]
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Xu Qiang <xuqiang36@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The worst case is that the new memory range overlaps all existing
regions, which requires type->cnt + 1 empty struct memblock_region slots in
the type->regions array.
So if type->cnt + 1 + type->cnt is less than type->max, we can insert
regions directly rather than calculate the needed amount before the
insertion.
And becase of merge operation in the end of function, tpye->cnt will
increase slowly for many cases.
This change allows to avoid unnecessary repeat of memblock ranges traversal
for many cases when adding new memory range.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
[rppt: massaged comment and changelog text]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
If system have some mirrored memory and mirrored feature is not specified
in boot parameter, the basic mirrored feature will be enabled and this will
lead to the following situations:
- memblock memory allocation prefers mirrored region. This may have some
unexpected influence on numa affinity.
- contiguous memory will be split into several parts if parts of them
is mirrored memory via memblock_mark_mirror().
To fix this, variable mirrored_kernelcore will be checked in
memblock_mark_mirror(). Mark mirrored memory with flag MEMBLOCK_MIRROR iff
kernelcore=mirror is added in the kernel parameters.
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614092156.1972846-6-mawupeng1@huawei.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
If system has mirrored memory, memblock will try to allocate mirrored
memory firstly and fallback to non-mirrored memory when fails, but if with
limited mirrored memory or some numa node without mirrored memory, lots of
warning message about memblock allocation will occur.
This patch ratelimit the warning message to avoid a very long print during
bootup.
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220614092156.1972846-3-mawupeng1@huawei.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
* A small cleanup of unused variable in __next_mem_pfn_range_in_zone
* Initial test suite to simulate memblock behaviour in userspace
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmI9bD4THHJwcHRAbGlu
dXguaWJtLmNvbQAKCRA5A4Ymyw79kXwhB/wNXR1wUb/eD3eKD+aNa2KMY5+8csjD
ghJph8wQmM9U9hsLViv3/M/H5+bY/s0riZNulKYrcmzW2BgIzF2ebcoqgfQ89YGV
bLx7lMJGxG/lCglur9m6KnOF89//Owq6Vfk7Jd6jR/F+43JO/3+5siCbTo6NrbVw
3DjT/WzvaICA646foyFTh8WotnIRbB2iYX1k/vIA3gwJ2C6n7WwoKzxU3ulKMUzg
hVlhcuTVnaV4mjFBbl23wC7i4l9dgPO9M4ZrTtlEsNHeV6uoFYRObwy6/q/CsBqI
avwgV0bQDch+QuCteUXcqIcnBpcUAfGxgiqp2PYX4lXA4gYTbo7plTna
=IemP
-----END PGP SIGNATURE-----
Merge tag 'memblock-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Test suite and a small cleanup:
- A small cleanup of unused variable in __next_mem_pfn_range_in_zone
- Initial test suite to simulate memblock behaviour in userspace"
* tag 'memblock-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: (27 commits)
memblock tests: Add TODO and README files
memblock tests: Add memblock_alloc_try_nid tests for bottom up
memblock tests: Add memblock_alloc_try_nid tests for top down
memblock tests: Add memblock_alloc_from tests for bottom up
memblock tests: Add memblock_alloc_from tests for top down
memblock tests: Add memblock_alloc tests for bottom up
memblock tests: Add memblock_alloc tests for top down
memblock tests: Add simulation of physical memory
memblock tests: Split up reset_memblock function
memblock tests: Fix testing with 32-bit physical addresses
memblock: __next_mem_pfn_range_in_zone: remove unneeded local variable nid
memblock tests: Add memblock_free tests
memblock tests: Add memblock_add_node test
memblock tests: Add memblock_remove tests
memblock tests: Add memblock_reserve tests
memblock tests: Add memblock_add tests
memblock tests: Add memblock reset function
memblock tests: Add skeleton of the memblock simulator
tools/include: Add debugfs.h stub
tools/include: Add pfn.h stub
...
The nid is only used to act as output parameter of __next_mem_range.
Since NULL can be passed to __next_mem_range as out_nid, we can thus
remove nid by passing NULL here.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
[rppt: updated the commit message]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
memblock.{reserved,memory}.regions may be allocated using kmalloc() in
memblock_double_array(). Use kfree() to release these kmalloced regions
indicated by memblock_{reserved,memory}_in_slab.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Fixes: 3010f87650 ("mm: discard memblock data later")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
- Fix double-evaluation of 'pte' macro argument when using 52-bit PAs
- Fix signedness of some MTE prctl PR_* constants
- Fix kmemleak memory usage by skipping early pgtable allocations
- Fix printing of CPU feature register strings
- Remove redundant -nostdlib linker flag for vDSO binaries
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmGL8ukQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNB0XCADIFxVeLM3YMPsnDC6ttmGp/w1TXh5RoNBm
IY5utdu8ybdrdCYO//Z6i7rQiYwPBRbR8Xts1LAfZDVuD/IPKqxiY+wjMe9FYu0U
lb14dnT5iI7L3l0yo/5s1EC8FC/pfCMwi50+trAS5H5jo2wg/+6B/bgbymzTd1ZB
vzVVzyGv1L/3xKHT4uEDjY1jMPRBH4Ugx2bJ1tzRzsC3Gz0D7ZeVd1Dg3ftPMsqI
MA4Q5QeE9MsafcV8sKDRLnzNSYEr50goA0xtCre81VmJDNcm/y6UybUCF75KU72M
kAOImXs0+wrDqNIocYgYlSWRu9xjw8qjoxjnyqikFx47HBesjY8T
=pHQk
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix double-evaluation of 'pte' macro argument when using 52-bit PAs
- Fix signedness of some MTE prctl PR_* constants
- Fix kmemleak memory usage by skipping early pgtable allocations
- Fix printing of CPU feature register strings
- Remove redundant -nostdlib linker flag for vDSO binaries
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
arm64: Track no early_pgtable_alloc() for kmemleak
arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long
arm64: vdso: remove -nostdlib compiler flag
arm64: arm64_ftr_reg->name may not be a human-readable string
After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:
kmemleak_alloc_phys()
memblock_alloc_range_nid()
memblock_phys_alloc_range()
early_pgtable_alloc()
init_pmd()
alloc_init_pud()
__create_pgd_mapping()
__map_memblock()
paging_init()
setup_arch()
start_kernel()
Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c7852
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED,
indicating that we're dealing with a memory region that is never
indicated in the firmware-provided memory map, but always detected and
added by a driver.
Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such
memory regions like ordinary MEMBLOCK_NONE memory regions -- for
example, when selecting memory regions to add to the vmcore for dumping
in the crashkernel via for_each_mem_range().
However, especially kexec_file is not supposed to select such memblocks
via for_each_free_mem_range() / for_each_free_mem_range_reverse() to
place kexec images, similar to how we handle
IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK.
We'll make sure that memory hotplug code sets the flag where applicable
(IORESOURCE_SYSRAM_DRIVER_MANAGED) next. This prepares architectures
that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem
support.
Note that kexec *must not* indicate this memory to the second kernel and
*must not* place kexec-images on this memory. Let's add a comment to
kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED
now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in
locate_mem_hole_callback() for kexec_walk_resources().
Also note that MEMBLOCK_HOTPLUG cannot be reused due to different
semantics:
MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the
firmware-provided memory map and added to the system early during
boot; kexec *has to* indicate this memory to the second kernel and
can place kexec-images on this memory. After memory hotunplug,
kexec has to be re-armed. We mostly ignore this flag when
"movable_node" is not set on the kernel command line, because
then we're told to not care about hotunpluggability of such
memory regions.
MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in
the firmware-provided memory map; this memory is always detected
and added to the system by a driver; memory might not actually be
physically hotunpluggable. kexec *must not* indicate this memory to
the second kernel and *must not* place kexec-images on this memory.
Link: https://lkml.kernel.org/r/20211004093605.5830-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jianyong Wu <Jianyong.Wu@arm.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shahab Vahedi <shahab@synopsys.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want to specify flags when hotplugging memory. Let's prepare to pass
flags to memblock_add_node() by adjusting all existing users.
Note that when hotplugging memory the system is already up and running
and we might have concurrent memblock users: for example, while we're
hotplugging memory, kexec_file code might search for suitable memory
regions to place kexec images. It's important to add the memory
directly to memblock via a single call with the right flags, instead of
adding the memory first and apply flags later: otherwise, concurrent
memblock users might temporarily stumble over memblocks with wrong
flags, which will be important in a follow-up patch that introduces a
new flag to properly handle add_memory_driver_managed().
Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shahab Vahedi <shahab@synopsys.com> [arch/arc]
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jianyong Wu <Jianyong.Wu@arm.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()
The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.
@@
identifier vaddr;
expression size;
@@
(
- memblock_phys_free(__pa(vaddr), size);
+ memblock_free(vaddr, size);
|
- memblock_free_ptr(vaddr, size);
+ memblock_free(vaddr, size);
)
[sfr@canb.auug.org.au: fixup]
Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au
Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().
The callers are updated with the below semantic patch:
@@
expression addr;
expression size;
@@
- memblock_free(addr, size);
+ memblock_phys_free(addr, size);
Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memblock_free_late() is a NOP wrapper for __memblock_free_late(), there
is no point to keep this indirection.
Drop the wrapper and rename __memblock_free_late() to
memblock_free_late().
Link: https://lkml.kernel.org/r/20210930185031.18648-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Zapolskiy reports:
Commit a7259df767 ("memblock: make memblock_find_in_range method
private") invokes a kernel panic while running kmemleak on OF platforms
with nomaped regions:
Unable to handle kernel paging request at virtual address fff000021e00000
[...]
scan_block+0x64/0x170
scan_gray_list+0xe8/0x17c
kmemleak_scan+0x270/0x514
kmemleak_write+0x34c/0x4ac
The memory allocated from memblock is registered with kmemleak, but if
it is marked MEMBLOCK_NOMAP it won't have linear map entries so an
attempt to scan such areas will fault.
Ideally, memblock_mark_nomap() would inform kmemleak to ignore
MEMBLOCK_NOMAP memory, but it can be called before kmemleak interfaces
operating on physical addresses can use __va() conversion.
Make sure that functions that mark allocated memory as MEMBLOCK_NOMAP
take care of informing kmemleak to ignore such memory.
Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
Link: https://lore.kernel.org/all/c30ff0a2-d196-c50d-22f0-bd50696b1205@quicinc.com
Fixes: a7259df767 ("memblock: make memblock_find_in_range method private")
Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem=[X][G|M] is broken on ARM64 platform, there are cases that even
type.cnt is 1, but total_size is not 0 because regions are merged into
1. So only check 'cnt' is not enough, total_size should be used,
othersize bootargs 'mem=[X][G|B]' not work anymore.
Link: https://lkml.kernel.org/r/20210930024437.32598-1-peng.fan@oss.nxp.com
Fixes: e888fa7bb8 ("memblock: Check memory add/cap ordering")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Zapolskiy reports:
commit a7259df767 ("memblock: make memblock_find_in_range method private")
invokes a kernel panic while running kmemleak on OF platforms with nomaped
regions:
Unable to handle kernel paging request at virtual address fff000021e00000
[...]
scan_block+0x64/0x170
scan_gray_list+0xe8/0x17c
kmemleak_scan+0x270/0x514
kmemleak_write+0x34c/0x4ac
Indeed, NOMAP regions don't have linear map entries so an attempt to scan
these areas would fault.
Prevent such faults by excluding NOMAP regions from kmemleak.
Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
Fixes: a7259df767 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.
Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:
https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
or just random memory corruption if the debug checks don't catch it:
https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.
I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.
So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer. And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc updates from Andrew Morton:
"173 patches.
Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
oom-kill, migration, ksm, percpu, vmstat, and madvise)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
mm/madvise: add MADV_WILLNEED to process_madvise()
mm/vmstat: remove unneeded return value
mm/vmstat: simplify the array size calculation
mm/vmstat: correct some wrong comments
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
selftests: vm: add COW time test for KSM pages
selftests: vm: add KSM merging time test
mm: KSM: fix data type
selftests: vm: add KSM merging across nodes test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM merge test
mm/migrate: correct kernel-doc notation
mm: wire up syscall process_mrelease
mm: introduce process_mrelease system call
memblock: make memblock_find_in_range method private
mm/mempolicy.c: use in_task() in mempolicy_slab_node()
mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
mm/mempolicy: advertise new MPOL_PREFERRED_MANY
mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
...
There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.
memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.
Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.
This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().
Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Functions memblock_alloc_exact_nid_raw() and memblock_alloc_try_nid_raw()
are intended for early memory allocation without overhead of zeroing the
allocated memory. Since these functions were used to allocate the memory
map, they have ended up with addition of a call to page_init_poison() that
poisoned the allocated memory when CONFIG_PAGE_POISON was set.
Since the memory map is allocated using a dedicated memmep_alloc()
function that takes care of the poisoning, remove page poisoning from the
memblock_alloc_*_raw() functions.
Link: https://lkml.kernel.org/r/20210714123739.16493-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For memblock_cap_memory_range() to work properly, it should be called
after memory is detected and added to memblock with memblock_add() or
memblock_add_node(). If memblock_cap_memory_range() would be called
before memory is registered, we may silently corrupt memory later
because the crash kernel will see all memory as available.
Print a warning and bail out if ordering is not satisfied.
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/aabc5bad008d49f07d542815c6c8d28ec90bb09e.1628672091.git.geert+renesas@glider.be
Commit b10d6bca87 ("arch, drivers: replace for_each_membock() with
for_each_mem_range()") didn't take into account that when there is
movable_node parameter in the kernel command line, for_each_mem_range()
would skip ranges marked with MEMBLOCK_HOTPLUG.
The page table setup code in POWER uses for_each_mem_range() to create
the linear mapping of the physical memory and since the regions marked
as MEMORY_HOTPLUG are skipped, they never make it to the linear map.
A later access to the memory in those ranges will fail:
BUG: Unable to handle kernel data access on write at 0xc000000400000000
Faulting instruction address: 0xc00000000008a3c0
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7
NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040
REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000
CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0
GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000
GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200
GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300
GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000
GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0
GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680
NIP clear_user_page+0x50/0x80
LR __handle_mm_fault+0xc88/0x1910
Call Trace:
__handle_mm_fault+0xc44/0x1910 (unreliable)
handle_mm_fault+0x130/0x2a0
__get_user_pages+0x248/0x610
__get_user_pages_remote+0x12c/0x3e0
get_arg_page+0x54/0xf0
copy_string_kernel+0x11c/0x210
kernel_execve+0x16c/0x220
call_usermodehelper_exec_async+0x1b0/0x2f0
ret_from_kernel_thread+0x5c/0x70
Instruction dump:
79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050
7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec
---[ end trace 490b8c67e6075e09 ]---
Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the
traversal fixes this issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100
Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>