488e19cdcf
624 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Greg Kroah-Hartman
|
3ca4271578 |
Reapply "Merge tag 'android14-6.1.75_r00' into android14-6.1"
This reverts commit
|
||
Zi Yan
|
176b8fe524 |
FROMLIST: mm/migrate: set swap entry values of THP tail pages properly.
The tail pages in a THP can have swap entry information stored in their private field. When migrating to a new page, all tail pages of the new page need to update ->private to avoid future data corruption. This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory: work on folio->swap instead of page->private when splitting folio"), subpages of a swapcached THP no longer requires the maintenance. Adding THPs to the swapcache was introduced in commit |
||
Todd Kjos
|
6bad1052c2 |
Revert "Merge tag 'android14-6.1.75_r00' into android14-6.1"
This reverts commit
|
||
Greg Kroah-Hartman
|
e1b12db2de |
Merge 6.1.72 into android14-6.1-lts
Changes in 6.1.72 keys, dns: Fix missing size check of V1 server-list header block: Don't invalidate pagecache for invalid falloc modes ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6 mptcp: prevent tcp diag from closing listener subflows Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE cifs: cifs_chan_is_iface_active should be called with chan_lock held cifs: do not depend on release_iface for maintaining iface_list KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL wifi: iwlwifi: pcie: don't synchronize IRQs from IRQ drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes in AUX xfer netfilter: use skb_ip_totlen and iph_totlen netfilter: nf_tables: set transport offset from mac header for netdev/egress nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local octeontx2-af: Fix marking couple of structure as __packed drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern ice: Fix link_down_on_close message ice: Shut down VSI with "link-down-on-close" enabled i40e: Fix filter input checks to prevent config with invalid values igc: Report VLAN EtherType matching back to user igc: Check VLAN TCI mask igc: Check VLAN EtherType mask ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset mlxbf_gige: fix receive packet race condition net: sched: em_text: fix possible memory leak in em_text_destroy() r8169: Fix PCI error on system resume can: raw: add support for SO_MARK net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps net: annotate data-races around sk->sk_tsflags net: annotate data-races around sk->sk_bind_phc net: Implement missing getsockopt(SO_TIMESTAMPING_NEW) selftests: bonding: do not set port down when adding to bond ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init sfc: fix a double-free bug in efx_probe_filters net: bcmgenet: Fix FCS generation for fragmented skbuffs netfilter: nft_immediate: drop chain reference counter on error net: Save and restore msg_namelen in sock_sendmsg i40e: fix use-after-free in i40e_aqc_add_filters() ASoC: meson: g12a-toacodec: Validate written enum values ASoC: meson: g12a-tohdmitx: Validate written enum values ASoC: meson: g12a-toacodec: Fix event generation ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux i40e: Restore VF MSI-X state during PCI reset igc: Fix hicredit calculation net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues net/smc: fix invalid link access in dumping SMC-R connections octeontx2-af: Always configure NIX TX link credits based on max frame size octeontx2-af: Re-enable MAC TX in otx2_stop processing asix: Add check for usbnet_get_endpoints net: ravb: Wait for operating mode to be applied bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters() net: Implement missing SO_TIMESTAMPING_NEW cmsg support selftests: secretmem: floor the memory size to the multiple of page_size cpu/SMT: Create topology_smt_thread_allowed() cpu/SMT: Make SMT control more robust against enumeration failures srcu: Fix callbacks acceleration mishandling bpf, x64: Fix tailcall infinite loop bpf, x86: Simplify the parsing logic of structure parameters bpf, x86: save/restore regs with BPF_DW size net: Declare MSG_SPLICE_PAGES internal sendmsg() flag udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES splice, net: Add a splice_eof op to file-ops and socket-ops ipv4, ipv6: Use splice_eof() to flush udp: introduce udp->udp_flags udp: move udp->no_check6_tx to udp->udp_flags udp: move udp->no_check6_rx to udp->udp_flags udp: move udp->gro_enabled to udp->udp_flags udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO udp: annotate data-races around udp->encap_type wifi: iwlwifi: yoyo: swap cdb and jacket bits values arm64: dts: qcom: sdm845: align RPMh regulator nodes with bindings arm64: dts: qcom: sdm845: Fix PSCI power domain names fbdev: imsttfb: Release framebuffer and dealloc cmap on error path fbdev: imsttfb: fix double free in probe() bpf: decouple prune and jump points bpf: remove unnecessary prune and jump points bpf: Remove unused insn_cnt argument from visit_[func_call_]insn() bpf: clean up visit_insn()'s instruction processing bpf: Support new 32bit offset jmp instruction bpf: handle ldimm64 properly in check_cfg() bpf: fix precision backtracking instruction iteration blk-mq: make sure active queue usage is held for bio_integrity_prep() net/mlx5: Increase size of irq name buffer s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc() s390/cpumf: support user space events for counting f2fs: clean up i_compress_flag and i_compress_level usage f2fs: convert to use bitmap API f2fs: assign default compression level f2fs: set the default compress_level on ioctl selftests: mptcp: fix fastclose with csum failure selftests: mptcp: set FAILING_LINKS in run_tests media: camss: sm8250: Virtual channels for CSID media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3 ext4: convert move_extent_per_page() to use folios khugepage: replace try_to_release_page() with filemap_release_folio() memory-failure: convert truncate_error_page() to use folio mm: merge folio_has_private()/filemap_release_folio() call pairs mm, netfs, fscache: stop read optimisation when folio removed from pagecache filemap: add a per-mapping stable writes flag block: update the stable_writes flag in bdev_add smb: client: fix missing mode bits for SMB symlinks net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats dpaa2-eth: recycle the RX buffer only after all processing done ethtool: don't propagate EOPNOTSUPP from dumps bpf, sockmap: af_unix stream sockets need to hold ref for pair sock firmware: arm_scmi: Fix frequency truncation by promoting multiplier type ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 genirq/affinity: Remove the 'firstvec' parameter from irq_build_affinity_masks genirq/affinity: Pass affinity managed mask array to irq_build_affinity_masks genirq/affinity: Don't pass irq_affinity_desc array to irq_build_affinity_masks genirq/affinity: Rename irq_build_affinity_masks as group_cpus_evenly genirq/affinity: Move group_cpus_evenly() into lib/ lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly mm/memory_hotplug: add missing mem_hotplug_lock mm/memory_hotplug: fix error handling in add_memory_resource() net: sched: call tcf_ct_params_free to free params in tcf_ct_init netfilter: flowtable: allow unidirectional rules netfilter: flowtable: cache info of last offload net/sched: act_ct: offload UDP NEW connections net/sched: act_ct: Fix promotion of offloaded unreplied tuple netfilter: flowtable: GC pushes back packets to classic path net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table octeontx2-af: Fix pause frame configuration octeontx2-af: Support variable number of lmacs btrfs: fix qgroup_free_reserved_data int overflow btrfs: mark the len field in struct btrfs_ordered_sum as unsigned ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg() firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect i2c: core: Fix atomic xfer check for non-preempt config mm: fix unmap_mapping_range high bits shift bug drm/amdgpu: skip gpu_info fw loading on navi12 drm/amd/display: add nv12 bounding box mmc: meson-mx-sdhc: Fix initialization frozen issue mmc: rpmb: fixes pause retune on all RPMB partitions. mmc: core: Cancel delayed work before releasing host mmc: sdhci-sprd: Fix eMMC init failure after hw reset genirq/affinity: Only build SMP-only helper functions on SMP kernels f2fs: compress: fix to assign compress_level for lz4 correctly net/sched: act_ct: additional checks for outdated flows net/sched: act_ct: Always fill offloading tuple iifidx bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4 bpf: syzkaller found null ptr deref in unix_bpf proto add media: qcom: camss: Comment CSID dt_id field smb3: Replace smb2pdu 1-element arrays with flex-arrays Revert "interconnect: qcom: sm8250: Enable sync_state" Linux 6.1.72 Change-Id: Id00eb2ae1159d4d5fa0ef914e672c5669cbf5b0a Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Greg Kroah-Hartman
|
bb47960a9d |
Merge branch 'android14-6.1' into branch 'android14-6.1-lts'
This merges all of the latest changes in 'android14-6.1' into 'android14-6.1-lts' to get it to pass TH again due to new symbols being added. Included in here are the following commits: * |
||
David Howells
|
bceff380f3 |
mm: merge folio_has_private()/filemap_release_folio() call pairs
[ Upstream commit 0201ebf274a306a6ebb95e5dc2d6a0a27c737cac ] Patch series "mm, netfs, fscache: Stop read optimisation when folio removed from pagecache", v7. This fixes an optimisation in fscache whereby we don't read from the cache for a particular file until we know that there's data there that we don't have in the pagecache. The problem is that I'm no longer using PG_fscache (aka PG_private_2) to indicate that the page is cached and so I don't get a notification when a cached page is dropped from the pagecache. The first patch merges some folio_has_private() and filemap_release_folio() pairs and introduces a helper, folio_needs_release(), to indicate if a release is required. The second patch is the actual fix. Following Willy's suggestions[1], it adds an AS_RELEASE_ALWAYS flag to an address_space that will make filemap_release_folio() always call ->release_folio(), even if PG_private/PG_private_2 aren't set. folio_needs_release() is altered to add a check for this. This patch (of 2): Make filemap_release_folio() check folio_has_private(). Then, in most cases, where a call to folio_has_private() is immediately followed by a call to filemap_release_folio(), we can get rid of the test in the pair. There are a couple of sites in mm/vscan.c that this can't so easily be done. In shrink_folio_list(), there are actually three cases (something different is done for incompletely invalidated buffers), but filemap_release_folio() elides two of them. In shrink_active_list(), we don't have have the folio lock yet, so the check allows us to avoid locking the page unnecessarily. A wrapper function to check if a folio needs release is provided for those places that still need to do it in the mm/ directory. This will acquire additional parts to the condition in a future patch. After this, the only remaining caller of folio_has_private() outside of mm/ is a check in fuse. Link: https://lkml.kernel.org/r/20230628104852.3391651-1-dhowells@redhat.com Link: https://lkml.kernel.org/r/20230628104852.3391651-2-dhowells@redhat.com Reported-by: Rohith Surabattula <rohiths.msft@gmail.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steve French <sfrench@samba.org> Cc: Shyam Prasad N <nspmangalore@gmail.com> Cc: Rohith Surabattula <rohiths.msft@gmail.com> Cc: Dave Wysochanski <dwysocha@redhat.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Xiubo Li <xiubli@redhat.com> Cc: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 1898efcdbed3 ("block: update the stable_writes flag in bdev_add") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Charan Teja Kalla
|
be72d197b2 |
mm: migrate high-order folios in swap cache correctly
commit fc346d0a70a13d52fe1c4bc49516d83a42cd7c4c upstream.
Large folios occupy N consecutive entries in the swap cache instead of
using multi-index entries like the page cache. However, if a large folio
is re-added to the LRU list, it can be migrated. The migration code was
not aware of the difference between the swap cache and the page cache and
assumed that a single xas_store() would be sufficient.
This leaves potentially many stale pointers to the now-migrated folio in
the swap cache, which can lead to almost arbitrary data corruption in the
future. This can also manifest as infinite loops with the RCU read lock
held.
[willy@infradead.org: modifications to the changelog & tweaked the fix]
Fixes:
|
||
Charan Teja Kalla
|
44702d8fa1 |
FROMLIST: mm: migrate high-order folios in swap cache correctly
Large folios occupy N consecutive entries in the swap cache instead of
using multi-index entries like the page cache. However, if a large folio
is re-added to the LRU list, it can be migrated. The migration code was
not aware of the difference between the swap cache and the page cache and
assumed that a single xas_store() would be sufficient.
This leaves potentially many stale pointers to the now-migrated folio in
the swap cache, which can lead to almost arbitrary data corruption in the
future. This can also manifest as infinite loops with the RCU read lock
held.
Bug: 315281107
Change-Id: I455f964a9f21c13089890073777388236b6669d7
[willy@infradead.org: modifications to the changelog & tweaked the fix]
Fixes:
|
||
Greg Kroah-Hartman
|
2cd386b08b |
Merge 6.1.61 into android14-6.1-lts
Changes in 6.1.61
KVM: x86/pmu: Truncate counter value to allowed width on write
mmc: core: Align to common busy polling behaviour for mmc ioctls
mmc: block: ioctl: do write error check for spi
mmc: core: Fix error propagation for some ioctl commands
ASoC: codecs: wcd938x: Convert to platform remove callback returning void
ASoC: codecs: wcd938x: Simplify with dev_err_probe
ASoC: codecs: wcd938x: fix regulator leaks on probe errors
ASoC: codecs: wcd938x: fix runtime PM imbalance on remove
pinctrl: qcom: lpass-lpi: fix concurrent register updates
mcb: Return actual parsed size when reading chameleon table
mcb-lpc: Reallocate memory region to avoid memory overlapping
virtio_balloon: Fix endless deflation and inflation on arm64
virtio-mmio: fix memory leak of vm_dev
virtio-crypto: handle config changed by work queue
virtio_pci: fix the common cfg map size
vsock/virtio: initialize the_virtio_vsock before using VQs
vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE
arm64: dts: rockchip: Add i2s0-2ch-bus-bclk-off pins to RK3399
arm64: dts: rockchip: Fix i2s0 pin conflict on ROCK Pi 4 boards
mm: fix vm_brk_flags() to not bail out while holding lock
hugetlbfs: clear resv_map pointer if mmap fails
mm/page_alloc: correct start page when guard page debug is enabled
mm/migrate: fix do_pages_move for compat pointers
hugetlbfs: extend hugetlb_vma_lock to private VMAs
maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
nfsd: lock_rename() needs both directories to live on the same fs
drm/i915/pmu: Check if pmu is closed before stopping event
drm/amd: Disable ASPM for VI w/ all Intel systems
drm/dp_mst: Fix NULL deref in get_mst_branch_device_by_guid_helper()
ARM: OMAP: timer32K: fix all kernel-doc warnings
firmware/imx-dsp: Fix use_after_free in imx_dsp_setup_channels()
clk: ti: Fix missing omap4 mcbsp functional clock and aliases
clk: ti: Fix missing omap5 mcbsp functional clock and aliases
r8169: fix the KCSAN reported data-race in rtl_tx() while reading tp->cur_tx
r8169: fix the KCSAN reported data-race in rtl_tx while reading TxDescArray[entry].opts1
r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1
iavf: initialize waitqueues before starting watchdog_task
i40e: Fix I40E_FLAG_VF_VLAN_PRUNING value
treewide: Spelling fix in comment
igb: Fix potential memory leak in igb_add_ethtool_nfc_entry
neighbour: fix various data-races
igc: Fix ambiguity in the ethtool advertising
net: ethernet: adi: adin1110: Fix uninitialized variable
net: ieee802154: adf7242: Fix some potential buffer overflow in adf7242_stats_show()
net: usb: smsc95xx: Fix uninit-value access in smsc95xx_read_reg
r8152: Increase USB control msg timeout to 5000ms as per spec
r8152: Run the unload routine if we have errors during probe
r8152: Cancel hw_phy_work if we have an error in probe
r8152: Release firmware if we have an error in probe
tcp: fix wrong RTO timeout when received SACK reneging
gtp: uapi: fix GTPA_MAX
gtp: fix fragmentation needed check with gso
i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR
drm/logicvc: Kconfig: select REGMAP and REGMAP_MMIO
iavf: in iavf_down, disable queues when removing the driver
scsi: sd: Introduce manage_shutdown device flag
blk-throttle: check for overflow in calculate_bytes_allowed
kasan: print the original fault addr when access invalid shadow
io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid
iio: afe: rescale: Accept only offset channels
iio: exynos-adc: request second interupt only when touchscreen mode is used
iio: adc: xilinx-xadc: Don't clobber preset voltage/temperature thresholds
iio: adc: xilinx-xadc: Correct temperature offset/scale for UltraScale
i2c: muxes: i2c-mux-pinctrl: Use of_get_i2c_adapter_by_node()
i2c: muxes: i2c-mux-gpmux: Use of_get_i2c_adapter_by_node()
i2c: muxes: i2c-demux-pinctrl: Use of_get_i2c_adapter_by_node()
i2c: stm32f7: Fix PEC handling in case of SMBUS transfers
i2c: aspeed: Fix i2c bus hang in slave read
tracing/kprobes: Fix the description of variable length arguments
misc: fastrpc: Reset metadata buffer to avoid incorrect free
misc: fastrpc: Free DMA handles for RPC calls with no arguments
misc: fastrpc: Clean buffers on remote invocation failures
misc: fastrpc: Unmap only if buffer is unmapped from DSP
nvmem: imx: correct nregs for i.MX6ULL
nvmem: imx: correct nregs for i.MX6SLL
nvmem: imx: correct nregs for i.MX6UL
x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
x86/cpu: Add model number for Intel Arrow Lake mobile processor
perf/core: Fix potential NULL deref
sparc32: fix a braino in fault handling in csum_and_copy_..._user()
clk: Sanitize possible_parent_show to Handle Return Value of of_clk_get_parent_name
platform/x86: Add s2idle quirk for more Lenovo laptops
ext4: add two helper functions extent_logical_end() and pa_logical_end()
ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
ext4: avoid overlapping preallocations due to overflow
objtool/x86: add missing embedded_insn check
Linux 6.1.61
Note, this merge point also reverts commit
|
||
Gregory Price
|
c9b066f692 |
mm/migrate: fix do_pages_move for compat pointers
commit 229e2253766c7cdfe024f1fe280020cc4711087c upstream. do_pages_move does not handle compat pointers for the page list. correctly. Add in_compat_syscall check and appropriate get_user fetch when iterating the page list. It makes the syscall in compat mode (32-bit userspace, 64-bit kernel) work the same way as the native 32-bit syscall again, restoring the behavior before my broken commit |
||
Peifeng Li
|
c3d26e2b5a |
ANDROID: vendor_hooks: Add hooks for lookaround
Add hooks for support lookaround in memory reclamation. - android_vh_test_clear_look_around_ref - android_vh_check_folio_look_around_ref - android_vh_look_around_migrate_folio - android_vh_look_around Bug: 292051411 Signed-off-by: Peifeng Li <lipeifeng@oppo.com> Change-Id: I9a606ae71d2f1303df3b02403b30bc8fdc9d06dd (cherry picked from commit f50f24e781738c8e5aa9f285d8726202f33107d6) [huzhanyuan: changed page to folio where appropriate] |
||
Greg Kroah-Hartman
|
dafc2fae4d |
This is the 6.1.13 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmP2A8MACgkQONu9yGCS aT7GrhAAky2nTRG9J0oPxh5Eu7wNKmjqDWNj9c6it3iGHpb+tfOY+LfPXHmWz0kX NoaNYGZGD8SDbkmwrSOmFB1Q/0OZ4/aIwM7Kwcw72UJVvrlsKx1HwiJjXKk809ZL bVlLUQzFTwyVIYcvjXQ8CuBHwBinLc3qkcyYGgbS8bseR4pDuxwoToDwAxk1d/0j ozWuzUKhSdYHYIUrk3papUro2UpF+Kb7KFpNiVo2wMaZM7en2XK3khCt8TuojH6c DXL+KZ/HbB8Ig1PWLaw2/6o4ispNy6bz7CJx6oDiOILR+le8xZA5WTdkXT3ovjyr LxutmPTTw6PxextIyVRblJWzXNcjdlV552U4gnnngcWn6wg4D4otqYnYvTaAUc+u sQnwrlQFxB2KfFKLNepGAy7klQJsYP3eadjDgGXP9TSmuUvUYRaNr6h0XukbyYkc kx2+Tw51NMKEqhgnaiKDN8AZEDTuLu5F4+NrUertxlb3PWeRRMRYVGJ1uw0KJg6t d5eniCB00SaaqN6M68u/hRYRi3gnwIsU7DitEpqejqwzskMpgegMFvebmCwORiq3 D+FD4EHOlztIToXhmEOXp0cz8fs27MuWmq4GkSwXvJuq+id5cQFdDN5GeLgNdAvH Kiu/Y+DY6ObW31tAQ1Jjp20L2RaWWvubrCBGeIqiDzUWmCohsks= =TXvc -----END PGP SIGNATURE----- Merge 6.1.13 into android14-6.1 Changes in 6.1.13 mptcp: sockopt: make 'tcp_fastopen_connect' generic mptcp: fix locking for setsockopt corner-case mptcp: deduplicate error paths on endpoint creation mptcp: fix locking for in-kernel listener creation btrfs: move the auto defrag code to defrag.c btrfs: lock the inode in shared mode before starting fiemap ASoC: amd: yc: Add DMI support for new acer/emdoor platforms ASoC: SOF: sof-audio: start with the right widget type ALSA: usb-audio: Add FIXED_RATE quirk for JBL Quantum610 Wireless ASoC: Intel: sof_rt5682: always set dpcm_capture for amplifiers ASoC: Intel: sof_cs42l42: always set dpcm_capture for amplifiers ASoC: Intel: sof_nau8825: always set dpcm_capture for amplifiers ASoC: Intel: sof_ssp_amp: always set dpcm_capture for amplifiers selftests/bpf: Verify copy_register_state() preserves parent/live fields ALSA: hda: Do not unset preset when cleaning up codec ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022 into DMI table bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself ASoC: cs42l56: fix DT probe tools/virtio: fix the vringh test for virtio ring changes vdpa: ifcvf: Do proper cleanup if IFCVF init fails net/rose: Fix to not accept on connected socket selftest: net: Improve IPV6_TCLASS/IPV6_HOPLIMIT tests apparmor compatibility net: stmmac: do not stop RX_CLK in Rx LPI state for qcs404 SoC powerpc/64: Fix perf profiling asynchronous interrupt handlers fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work() drm/nouveau/devinit/tu102-: wait for GFW_BOOT_PROGRESS == COMPLETED net: ethernet: mtk_eth_soc: Avoid truncating allocation net: sched: sch: Bounds check priority s390/decompressor: specify __decompress() buf len to avoid overflow nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set drm/amd/display: Add missing brackets in calculation drm/amd/display: Adjust downscaling limits for dcn314 drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2 drm/amd/display: Reset DMUB mailbox SW state after HW reset drm/amdgpu: enable HDP SD for gfx 11.0.3 drm/amdgpu: Enable vclk dclk node for gc11.0.3 drm/amd/display: Properly handle additional cases where DCN is not supported platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match ceph: move mount state enum to super.h ceph: blocklist the kclient when receiving corrupted snap trace selftests: mptcp: userspace: fix v4-v6 test in v6.1 of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem kasan: fix Oops due to missing calls to kasan_arch_is_ready() mm: shrinkers: fix deadlock in shrinker debugfs aio: fix mremap after fork null-deref vmxnet3: move rss code block under eop descriptor fbdev: Fix invalid page access after closing deferred I/O devices drm: Disable dynamic debug as broken drm/amd/amdgpu: fix warning during suspend drm/amd/display: Fail atomic_check early on normalize_zpos error drm/vmwgfx: Stop accessing buffer objects which failed init drm/vmwgfx: Do not drop the reference to the handle too soon mmc: jz4740: Work around bug on JZ4760(B) mmc: meson-gx: fix SDIO mode if cap_sdio_irq isn't set mmc: sdio: fix possible resource leaks in some error paths mmc: mmc_spi: fix error handling in mmc_spi_probe() ALSA: hda: Fix codec device field initializan ALSA: hda/conexant: add a new hda codec SN6180 ALSA: hda/realtek - fixed wrong gpio assigned ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform. ALSA: hda/realtek: Enable mute/micmute LEDs and speaker support for HP Laptops ata: ahci: Add Tiger Lake UP{3,4} AHCI controller ata: libata-core: Disable READ LOG DMA EXT for Samsung MZ7LH sched/psi: Fix use-after-free in ep_remove_wait_queue() hugetlb: check for undefined shift on 32 bit architectures nilfs2: fix underflow in second superblock position calculations mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount mm/filemap: fix page end in filemap_get_read_batch mm/migrate: fix wrongly apply write bit after mkdirty on sparc64 gpio: sim: fix a memory leak freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL coredump: Move dump_emit_page() to kill unused warning Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." net: Fix unwanted sign extension in netdev_stats_to_stats64() revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" drm/vc4: crtc: Increase setup cost in core clock calculation to handle extreme reduced blanking drm/vc4: Fix YUV plane handling when planes are in different buffers drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list ice: fix lost multicast packets in promisc mode ixgbe: allow to increase MTU to 3K with XDP enabled i40e: add double of VLAN header when computing the max MTU net: bgmac: fix BCM5358 support by setting correct flags net: ethernet: ti: am65-cpsw: Add RX DMA Channel Teardown Quirk sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list net/sched: tcindex: update imperfect hash filters respecting rcu ice: xsk: Fix cleaning of XDP_TX frames dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions. net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path net/sched: act_ctinfo: use percpu stats net: openvswitch: fix possible memory leak in ovs_meter_cmd_set() net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence bnxt_en: Fix mqprio and XDP ring checking logic tracing: Make trace_define_field_ext() static net: stmmac: Restrict warning on disabling DMA store and fwd mode net: use a bounce buffer for copying skb->mark tipc: fix kernel warning when sending SYN message net: mpls: fix stale pointer if allocation fails during device rename igb: conditionalize I2C bit banging on external thermal sensor support igb: Fix PPS input and output using 3rd and 4th SDP ixgbe: add double of VLAN header when computing the max MTU ipv6: Fix datagram socket connection with DSCP. ipv6: Fix tcp socket connection with DSCP. mm/gup: add folio to list when folio_isolate_lru() succeed mm: extend max struct page size for kmsan i40e: Add checking for null for nlmsg_find_attr() net/sched: tcindex: search key must be 16 bits nvme-tcp: stop auth work after tearing down queues in error recovery nvme-rdma: stop auth work after tearing down queues in error recovery KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs) kvm: initialize all of the kvm_debugregs structure before sending it to userspace perf/x86: Refuse to export capabilities for hybrid PMUs alarmtimer: Prevent starvation by small intervals and SIG_IGN nvme-pci: refresh visible attrs for cmb attributes ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak net: sched: sch: Fix off by one in htb_activate_prios() Linux 6.1.13 Change-Id: I8a1e4175939c14f726c545001061b95462566386 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Peter Xu
|
54806cb751 |
mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
commit 96a9c287e25d690fd9623b5133703b8e310fbed1 upstream.
Nick Bowler reported another sparc64 breakage after the young/dirty
persistent work for page migration (per "Link:" below). That's after a
similar report [2].
It turns out page migration was overlooked, and it wasn't failing before
because page migration was not enabled in the initial report test
environment.
David proposed another way [2] to fix this from sparc64 side, but that
patch didn't land somehow. Neither did I check whether there's any other
arch that has similar issues.
Let's fix it for now as simple as moving the write bit handling to be
after dirty, like what we did before.
Note: this is based on mm-unstable, because the breakage was since 6.1 and
we're at a very late stage of 6.2 (-rc8), so I assume for this specific
case we should target this at 6.3.
[1] https://lore.kernel.org/all/20221021160603.GA23307@u164.east.ru/
[2] https://lore.kernel.org/all/20221212130213.136267-1-david@redhat.com/
Link: https://lkml.kernel.org/r/20230216153059.256739-1-peterx@redhat.com
Fixes:
|
||
Charan Teja Reddy
|
731835eae3 |
ANDROID: implement wrapper for reverse migration
Reverse migration is used to do the balancing the occupancy of memory zones in a node in the system whose imabalance may be caused by migration of pages to other zones by an operation, eg: hotremove and then hotadding the same memory. In this case there is a lot of free memory in newly hotadd memory which can be filled up by the previous migrated pages(as part of offline/hotremove) thus may free up some pressure in other zones of the node. Upstream discussion: https://lore.kernel.org/all/ee78c83d-da9b-f6d1-4f66-934b7782acfb@codeaurora.org/ Bug: 201263307 Change-Id: Ib3137dab0db66ecf6858c4077dcadb9dfd0c6b1c Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com> Signed-off-by: Sukadev Bhattiprolu <quic_sukadev@quicinc.com> |
||
Baolin Wang
|
03e5f82ea6 |
mm: migrate: fix return value if all subpages of THPs are migrated successfully
During THP migration, if THPs are not migrated but they are split and all
subpages are migrated successfully, migrate_pages() will still return the
number of THP pages that were not migrated. This will confuse the callers
of migrate_pages(). For example, the longterm pinning will failed though
all pages are migrated successfully.
Thus we should return 0 to indicate that all pages are migrated in this
case
Link: https://lkml.kernel.org/r/de386aa864be9158d2f3b344091419ea7c38b2f7.1666599848.git.baolin.wang@linux.alibaba.com
Fixes:
|
||
Alistair Popple
|
16ce101db8 |
mm/memory.c: fix race when faulting a device private page
Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Lyude Paul <lyude@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
29eea9b5a9 |
mm: convert page_get_anon_vma() to folio_get_anon_vma()
With all callers now passing in a folio, rename the function and convert all callers. Removes a couple of calls to compound_head() and a reference to page->mapping. Link: https://lkml.kernel.org/r/20220902194653.1739778-55-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
c33db29231 |
migrate: convert unmap_and_move_huge_page() to use folios
Saves several calls to compound_head() and removes a couple of uses of page->lru. Link: https://lkml.kernel.org/r/20220902194653.1739778-52-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
682a71a1b6 |
migrate: convert __unmap_and_move() to use folios
Removes a lot of calls to compound_head(). Also remove a VM_BUG_ON that can never trigger as the PageAnon bit is the bottom bit of page->mapping. Link: https://lkml.kernel.org/r/20220902194653.1739778-51-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Haiyue Wang
|
f7091ed64e |
mm: fix the handling Non-LRU pages returned by follow_page
The handling Non-LRU pages returned by follow_page() jumps directly, it
doesn't call put_page() to handle the reference count, since 'FOLL_GET'
flag for follow_page() has get_page() called. Fix the zone device page
check by handling the page reference count correctly before returning.
And as David reviewed, "device pages are never PageKsm pages". Drop this
zone device page check for break_ksm().
Since the zone device page can't be a transparent huge page, so drop the
redundant zone device page check for split_huge_pages_pid(). (by Miaohe)
Link: https://lkml.kernel.org/r/20220823135841.934465-3-haiyue.wang@intel.com
Fixes:
|
||
Aneesh Kumar K.V
|
467b171af8 |
mm/demotion: update node_is_toptier to work with memory tiers
With memory tier support we can have memory only NUMA nodes in the top tier from which we want to avoid promotion tracking NUMA faults. Update node_is_toptier to work with memory tiers. All NUMA nodes are by default top tier nodes. With lower(slower) memory tiers added we consider all memory tiers above a memory tier having CPU NUMA nodes as a top memory tier [sj@kernel.org: include missed header file, memory-tiers.h] Link: https://lkml.kernel.org/r/20220820190720.248704-1-sj@kernel.org [akpm@linux-foundation.org: mm/memory.c needs linux/memory-tiers.h] [aneesh.kumar@linux.ibm.com: make toptier_distance inclusive upper bound of toptiers] Link: https://lkml.kernel.org/r/20220830081457.118960-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20220818131042.113280-10-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Wei Xu <weixugc@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hesham Almatary <hesham.almatary@huawei.com> Cc: Jagdish Gediya <jvgediya.oss@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Aneesh Kumar K.V
|
6c542ab757 |
mm/demotion: build demotion targets based on explicit memory tiers
This patch switch the demotion target building logic to use memory tiers instead of NUMA distance. All N_MEMORY NUMA nodes will be placed in the default memory tier and additional memory tiers will be added by drivers like dax kmem. This patch builds the demotion target for a NUMA node by looking at all memory tiers below the tier to which the NUMA node belongs. The closest node in the immediately following memory tier is used as a demotion target. Since we are now only building demotion target for N_MEMORY NUMA nodes the CPU hotplug calls are removed in this patch. Link: https://lkml.kernel.org/r/20220818131042.113280-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Wei Xu <weixugc@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hesham Almatary <hesham.almatary@huawei.com> Cc: Jagdish Gediya <jvgediya.oss@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Aneesh Kumar K.V
|
9195244022 |
mm/demotion: move memory demotion related code
This moves memory demotion related code to mm/memory-tiers.c. No functional change in this patch. Link: https://lkml.kernel.org/r/20220818131042.113280-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Wei Xu <weixugc@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hesham Almatary <hesham.almatary@huawei.com> Cc: Jagdish Gediya <jvgediya.oss@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Baolin Wang
|
7047b5a40b |
mm: migrate: do not retry 10 times for the subpages of fail-to-migrate THP
If THP is failed to migrate due to -ENOSYS or -ENOMEM case, the THP will be split, and the subpages of fail-to-migrate THP will be tried to migrate again, so we should not account the retry counter in the second loop, since we already accounted 'nr_thp_failed' in the first loop. Moreover we also do not need retry 10 times for -EAGAIN case for the subpages of fail-to-migrate THP in the second loop, since we already regarded the THP as migration failure, and save some migration time (for the worst case, will try 512 * 10 times) according to previous discussion [1]. [1] https://lore.kernel.org/linux-mm/87r13a7n04.fsf@yhuang6-desk2.ccr.corp.intel.com/ Link: https://lkml.kernel.org/r/20220817081408.513338-9-ying.huang@intel.com Tested-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Zi Yan <ziy@nvidia.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Huang Ying
|
077309bc1e |
migrate_pages(): fix failure counting for retry
After 10 retries, we will give up and the remaining pages will be counted
as failure in nr_failed and nr_thp_failed. We should count the failure in
nr_failed_pages too. This is done in this patch.
Link: https://lkml.kernel.org/r/20220817081408.513338-8-ying.huang@intel.com
Fixes:
|
||
Huang Ying
|
e6fa8a79fe |
migrate_pages(): fix failure counting for THP splitting
If THP is failed to be migrated, it may be split and retry. But after
splitting, the head page will be left in "from" list, although THP
migration failure has been counted already. If the head page is failed to
be migrated too, the failure will be counted twice incorrectly. So this
is fixed in this patch via moving the head page of THP after splitting to
"thp_split_pages" too.
Link: https://lkml.kernel.org/r/20220817081408.513338-7-ying.huang@intel.com
Fixes:
|
||
Huang Ying
|
577be05c89 |
migrate_pages(): fix failure counting for THP on -ENOSYS
If THP or hugetlbfs page migration isn't supported, unmap_and_move() or
unmap_and_move_huge_page() will return -ENOSYS. For THP, splitting will
be tried, but if splitting doesn't succeed, the THP will be left in "from"
list wrongly. If some other pages are retried, the THP migration failure
will counted again. This is fixed via moving the failure THP from "from"
to "ret_pages".
Another issue of the original code is that the unsupported failure
processing isn't consistent between THP and hugetlbfs page. Make them
consistent in this patch to make the code easier to be understood too.
Link: https://lkml.kernel.org/r/20220817081408.513338-6-ying.huang@intel.com
Fixes:
|
||
Huang Ying
|
5fc30916b5 |
migrate_pages(): fix failure counting for THP subpages retrying
If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will be
split into thp_split_pages, and after other pages are migrated, pages in
thp_split_pages will be migrated with no_subpage_counting == true, because
its failure have been counted already. If some pages in thp_split_pages
are retried during migration, we should not count their failure if
no_subpage_counting == true too. This is done this patch to fix the
failure counting for THP subpages retrying.
Link: https://lkml.kernel.org/r/20220817081408.513338-5-ying.huang@intel.com
Fixes:
|
||
Huang Ying
|
fbed53b477 |
migrate_pages(): fix THP failure counting for -ENOMEM
In unmap_and_move(), if the new THP cannot be allocated, -ENOMEM will be
returned, and migrate_pages() will try to split the THP unless "reason" is
MR_NUMA_MISPLACED (that is, nosplit == true). But when nosplit == true,
the THP migration failure will not be counted.
This is incorrect, so in this patch, the THP migration failure will be
counted for -ENOMEM regardless of nosplit is true or false. The nr_failed
counting isn't fixed because it's not used. Added some comments for it
per Baolin's suggestion.
Link: https://lkml.kernel.org/r/20220817081408.513338-4-ying.huang@intel.com
Fixes:
|
||
Huang Ying
|
9c62ff005f |
migrate_pages(): remove unnecessary list_safe_reset_next()
Before commit
|
||
Huang Ying
|
a7504ed14f |
migrate: fix syscall move_pages() return value for failure
Patch series "migrate_pages(): fix several bugs in error path", v3.
During review the code of migrate_pages() and build a test program for
it. Several bugs in error path are identified and fixed in this
series.
Most patches are tested via
- Apply error-inject.patch in Linux kernel
- Compile test-migrate.c (with -lnuma)
- Test with test-migrate.sh
error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
It turns out that error injection is an important tool to fix bugs in
error path.
This patch (of 8):
The return value of move_pages() syscall is incorrect when counting
the remaining pages to be migrated. For example, for the following
test program,
"
#define _GNU_SOURCE
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <unistd.h>
#include <numaif.h>
#include <numa.h>
#ifndef MADV_FREE
#define MADV_FREE 8 /* free pages only if memory pressure */
#endif
#define ONE_MB (1024 * 1024)
#define MAP_SIZE (16 * ONE_MB)
#define THP_SIZE (2 * ONE_MB)
#define THP_MASK (THP_SIZE - 1)
#define ERR_EXIT_ON(cond, msg) \
do { \
int __cond_in_macro = (cond); \
if (__cond_in_macro) \
error_exit(__cond_in_macro, (msg)); \
} while (0)
void error_msg(int ret, int nr, int *status, const char *msg)
{
int i;
fprintf(stderr, "Error: %s, ret : %d, error: %s\n",
msg, ret, strerror(errno));
if (!nr)
return;
fprintf(stderr, "status: ");
for (i = 0; i < nr; i++)
fprintf(stderr, "%d ", status[i]);
fprintf(stderr, "\n");
}
void error_exit(int ret, const char *msg)
{
error_msg(ret, 0, NULL, msg);
exit(1);
}
int page_size;
bool do_vmsplice;
bool do_thp;
static int pipe_fds[2];
void *addr;
char *pn;
char *pn1;
void *pages[2];
int status[2];
void prepare()
{
int ret;
struct iovec iov;
if (addr) {
munmap(addr, MAP_SIZE);
close(pipe_fds[0]);
close(pipe_fds[1]);
}
ret = pipe(pipe_fds);
ERR_EXIT_ON(ret, "pipe");
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
ERR_EXIT_ON(addr == MAP_FAILED, "mmap");
if (do_thp) {
ret = madvise(addr, MAP_SIZE, MADV_HUGEPAGE);
ERR_EXIT_ON(ret, "advise hugepage");
}
pn = (char *)(((unsigned long)addr + THP_SIZE) & ~THP_MASK);
pn1 = pn + THP_SIZE;
pages[0] = pn;
pages[1] = pn1;
*pn = 1;
if (do_vmsplice) {
iov.iov_base = pn;
iov.iov_len = page_size;
ret = vmsplice(pipe_fds[1], &iov, 1, 0);
ERR_EXIT_ON(ret < 0, "vmsplice");
}
status[0] = status[1] = 1024;
}
void test_migrate()
{
int ret;
int nodes[2] = { 1, 1 };
pid_t pid = getpid();
prepare();
ret = move_pages(pid, 1, pages, nodes, status, MPOL_MF_MOVE_ALL);
error_msg(ret, 1, status, "move 1 page");
prepare();
ret = move_pages(pid, 2, pages, nodes, status, MPOL_MF_MOVE_ALL);
error_msg(ret, 2, status, "move 2 pages, page 1 not mapped");
prepare();
*pn1 = 1;
ret = move_pages(pid, 2, pages, nodes, status, MPOL_MF_MOVE_ALL);
error_msg(ret, 2, status, "move 2 pages");
prepare();
*pn1 = 1;
nodes[1] = 0;
ret = move_pages(pid, 2, pages, nodes, status, MPOL_MF_MOVE_ALL);
error_msg(ret, 2, status, "move 2 pages, page 1 to node 0");
}
int main(int argc, char *argv[])
{
numa_run_on_node(0);
page_size = getpagesize();
test_migrate();
fprintf(stderr, "\nMake page 0 cannot be migrated:\n");
do_vmsplice = true;
test_migrate();
fprintf(stderr, "\nTest THP:\n");
do_thp = true;
do_vmsplice = false;
test_migrate();
fprintf(stderr, "\nTHP: make page 0 cannot be migrated:\n");
do_vmsplice = true;
test_migrate();
return 0;
}
"
The output of the current kernel is,
"
Error: move 1 page, ret : 0, error: Success
status: 1
Error: move 2 pages, page 1 not mapped, ret : 0, error: Success
status: 1 -14
Error: move 2 pages, ret : 0, error: Success
status: 1 1
Error: move 2 pages, page 1 to node 0, ret : 0, error: Success
status: 1 0
Make page 0 cannot be migrated:
Error: move 1 page, ret : 0, error: Success
status: 1024
Error: move 2 pages, page 1 not mapped, ret : 1, error: Success
status: 1024 -14
Error: move 2 pages, ret : 0, error: Success
status: 1024 1024
Error: move 2 pages, page 1 to node 0, ret : 1, error: Success
status: 1024 1024
"
While the expected output is,
"
Error: move 1 page, ret : 0, error: Success
status: 1
Error: move 2 pages, page 1 not mapped, ret : 0, error: Success
status: 1 -14
Error: move 2 pages, ret : 0, error: Success
status: 1 1
Error: move 2 pages, page 1 to node 0, ret : 0, error: Success
status: 1 0
Make page 0 cannot be migrated:
Error: move 1 page, ret : 1, error: Success
status: 1024
Error: move 2 pages, page 1 not mapped, ret : 1, error: Success
status: 1024 -14
Error: move 2 pages, ret : 1, error: Success
status: 1024 1024
Error: move 2 pages, page 1 to node 0, ret : 2, error: Success
status: 1024 1024
"
Fix this via correcting the remaining pages counting. With the fix,
the output for the test program as above is expected.
Link: https://lkml.kernel.org/r/20220817081408.513338-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220817081408.513338-2-ying.huang@intel.com
Fixes:
|
||
Peter Xu
|
2e3468778d |
mm: remember young/dirty bit for page migrations
When page migration happens, we always ignore the young/dirty bit settings in the old pgtable, and marking the page as old in the new page table using either pte_mkold() or pmd_mkold(), and keeping the pte clean. That's fine from functional-wise, but that's not friendly to page reclaim because the moving page can be actively accessed within the procedure. Not to mention hardware setting the young bit can bring quite some overhead on some systems, e.g. x86_64 needs a few hundreds nanoseconds to set the bit. The same slowdown problem to dirty bits when the memory is first written after page migration happened. Actually we can easily remember the A/D bit configuration and recover the information after the page is migrated. To achieve it, define a new set of bits in the migration swap offset field to cache the A/D bits for old pte. Then when removing/recovering the migration entry, we can recover the A/D bits even if the page changed. One thing to mention is that here we used max_swapfile_size() to detect how many swp offset bits we have, and we'll only enable this feature if we know the swp offset is big enough to store both the PFN value and the A/D bits. Otherwise the A/D bits are dropped like before. Link: https://lkml.kernel.org/r/20220811161331.37055-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Minchan Kim <minchan@kernel.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Huang Ying
|
33024536ba |
memory tiering: hot page selection with hint page fault latency
Patch series "memory tiering: hot page selection", v4. To optimize page placement in a memory tiering system with NUMA balancing, the hot pages in the slow memory nodes need to be identified. Essentially, the original NUMA balancing implementation selects the mostly recently accessed (MRU) pages to promote. But this isn't a perfect algorithm to identify the hot pages. Because the pages with quite low access frequency may be accessed eventually given the NUMA balancing page table scanning period could be quite long (e.g. 60 seconds). So in this patchset, we implement a new hot page identification algorithm based on the latency between NUMA balancing page table scanning and hint page fault. Which is a kind of mostly frequently accessed (MFU) algorithm. In NUMA balancing memory tiering mode, if there are hot pages in slow memory node and cold pages in fast memory node, we need to promote/demote hot/cold pages between the fast and cold memory nodes. A choice is to promote/demote as fast as possible. But the CPU cycles and memory bandwidth consumed by the high promoting/demoting throughput will hurt the latency of some workload because of accessing inflating and slow memory bandwidth contention. A way to resolve this issue is to restrict the max promoting/demoting throughput. It will take longer to finish the promoting/demoting. But the workload latency will be better. This is implemented in this patchset as the page promotion rate limit mechanism. The promotion hot threshold is workload and system configuration dependent. So in this patchset, a method to adjust the hot threshold automatically is implemented. The basic idea is to control the number of the candidate promotion pages to match the promotion rate limit. We used the pmbench memory accessing benchmark tested the patchset on a 2-socket server system with DRAM and PMEM installed. The test results are as follows, pmbench score promote rate (accesses/s) MB/s ------------- ------------ base 146887704.1 725.6 hot selection 165695601.2 544.0 rate limit 162814569.8 165.2 auto adjustment 170495294.0 136.9 From the results above, With hot page selection patch [1/3], the pmbench score increases about 12.8%, and promote rate (overhead) decreases about 25.0%, compared with base kernel. With rate limit patch [2/3], pmbench score decreases about 1.7%, and promote rate decreases about 69.6%, compared with hot page selection patch. With threshold auto adjustment patch [3/3], pmbench score increases about 4.7%, and promote rate decrease about 17.1%, compared with rate limit patch. Baolin helped to test the patchset with MySQL on a machine which contains 1 DRAM node (30G) and 1 PMEM node (126G). sysbench /usr/share/sysbench/oltp_read_write.lua \ ...... --tables=200 \ --table-size=1000000 \ --report-interval=10 \ --threads=16 \ --time=120 The tps can be improved about 5%. This patch (of 3): To optimize page placement in a memory tiering system with NUMA balancing, the hot pages in the slow memory node need to be identified. Essentially, the original NUMA balancing implementation selects the mostly recently accessed (MRU) pages to promote. But this isn't a perfect algorithm to identify the hot pages. Because the pages with quite low access frequency may be accessed eventually given the NUMA balancing page table scanning period could be quite long (e.g. 60 seconds). The most frequently accessed (MFU) algorithm is better. So, in this patch we implemented a better hot page selection algorithm. Which is based on NUMA balancing page table scanning and hint page fault as follows, - When the page tables of the processes are scanned to change PTE/PMD to be PROT_NONE, the current time is recorded in struct page as scan time. - When the page is accessed, hint page fault will occur. The scan time is gotten from the struct page. And The hint page fault latency is defined as hint page fault time - scan time The shorter the hint page fault latency of a page is, the higher the probability of their access frequency to be higher. So the hint page fault latency is a better estimation of the page hot/cold. It's hard to find some extra space in struct page to hold the scan time. Fortunately, we can reuse some bits used by the original NUMA balancing. NUMA balancing uses some bits in struct page to store the page accessing CPU and PID (referring to page_cpupid_xchg_last()). Which is used by the multi-stage node selection algorithm to avoid to migrate pages shared accessed by the NUMA nodes back and forth. But for pages in the slow memory node, even if they are shared accessed by multiple NUMA nodes, as long as the pages are hot, they need to be promoted to the fast memory node. So the accessing CPU and PID information are unnecessary for the slow memory pages. We can reuse these bits in struct page to record the scan time. For the fast memory pages, these bits are used as before. For the hot threshold, the default value is 1 second, which works well in our performance test. All pages with hint page fault latency < hot threshold will be considered hot. It's hard for users to determine the hot threshold. So we don't provide a kernel ABI to set it, just provide a debugfs interface for advanced users to experiment. We will continue to work on a hot threshold automatic adjustment mechanism. The downside of the above method is that the response time to the workload hot spot changing may be much longer. For example, - A previous cold memory area becomes hot - The hint page fault will be triggered. But the hint page fault latency isn't shorter than the hot threshold. So the pages will not be promoted. - When the memory area is scanned again, maybe after a scan period, the hint page fault latency measured will be shorter than the hot threshold and the pages will be promoted. To mitigate this, if there are enough free space in the fast memory node, the hot threshold will not be used, all pages will be promoted upon the hint page fault for fast response. Thanks Zhong Jiang reported and tested the fix for a bug when disabling memory tiering mode dynamically. Link: https://lkml.kernel.org/r/20220713083954.34196-1-ying.huang@intel.com Link: https://lkml.kernel.org/r/20220713083954.34196-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Wei Xu <weixugc@google.com> Cc: osalvador <osalvador@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Zhong Jiang <zhongjiang-ali@linux.alibaba.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Haiyue Wang
|
8315682148 |
mm: migration: fix the FOLL_GET failure on following huge page
Not all huge page APIs support FOLL_GET option, so move_pages() syscall
will fail to get the page node information for some huge pages.
Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will
return NULL page for FOLL_GET when calling move_pages() syscall with the
NULL 'nodes' parameter, the 'status' parameter has '-2' error in array.
Note: follow_huge_pud() now supports FOLL_GET in linux 6.0.
Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev
But these huge page APIs don't support FOLL_GET:
1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c
2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c
It will cause WARN_ON_ONCE for FOLL_GET.
3. follow_huge_pgd() in mm/hugetlb.c
This is an temporary solution to mitigate the side effect of the race
condition fix by calling follow_page() with FOLL_GET set for huge pages.
After supporting follow huge page by FOLL_GET is done, this fix can be
reverted safely.
Link: https://lkml.kernel.org/r/20220823135841.934465-2-haiyue.wang@intel.com
Link: https://lkml.kernel.org/r/20220812084921.409142-1-haiyue.wang@intel.com
Fixes:
|
||
Linus Torvalds
|
6614a3c316 |
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/ SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE= =w/UH -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ... |
||
Matthew Wilcox (Oracle)
|
9d0ddc0cb5 |
fs: Remove aops->migratepage()
With all users converted to migrate_folio(), remove this operation. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Matthew Wilcox (Oracle)
|
b890ec2a2c |
hugetlb: Convert to migrate_folio
This involves converting migrate_huge_page_move_mapping(). We also need a folio variant of hugetlb_set_page_subpool(), but that's for a later patch. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> |
||
Matthew Wilcox (Oracle)
|
2ec810d596 |
mm/migrate: Add filemap_migrate_folio()
There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to be filemap_migrate_folio() and convert the iomap filesystems to use it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> |
||
Matthew Wilcox (Oracle)
|
541846502f |
mm/migrate: Convert migrate_page() to migrate_folio()
Convert all callers to pass a folio. Most have the folio already available. Switch all users from aops->migratepage to aops->migrate_folio. Also turn the documentation into kerneldoc. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> |
||
Matthew Wilcox (Oracle)
|
108ca83581 |
mm/migrate: Convert expected_page_refs() to folio_expected_refs()
Now that both callers have a folio, convert this function to take a folio & rename it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Matthew Wilcox (Oracle)
|
67235182a4 |
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
Use a folio throughout __buffer_migrate_folio(), add kernel-doc for buffer_migrate_folio() and buffer_migrate_folio_norefs(), move their declarations to buffer.h and switch all filesystems that have wired them up. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Matthew Wilcox (Oracle)
|
2be7fa10c0 |
mm/migrate: Convert writeout() to take a folio
Use a folio throughout this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Matthew Wilcox (Oracle)
|
8faa8ef5dd |
mm/migrate: Convert fallback_migrate_page() to fallback_migrate_folio()
Use a folio throughout. migrate_page() will be converted to migrate_folio() later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Matthew Wilcox (Oracle)
|
5490da4f06 |
fs: Add aops->migrate_folio
Provide a folio-based replacement for aops->migratepage. Update the documentation to document migrate_folio instead of migratepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Matthew Wilcox (Oracle)
|
68f2736a85 |
mm: Convert all PageMovable users to movable_operations
These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> |
||
Alex Sierra
|
3218f8712d |
mm: handling Non-LRU pages returned by vm_normal_pages
With DEVICE_COHERENT, we'll soon have vm_normal_pages() return device-managed anonymous pages that are not LRU pages. Although they behave like normal pages for purposes of mapping in CPU page, and for COW. They do not support LRU lists, NUMA migration or THP. Callers to follow_page() currently don't expect ZONE_DEVICE pages, however, with DEVICE_COHERENT we might now return ZONE_DEVICE. Check for ZONE_DEVICE pages in applicable users of follow_page() as well. Link: https://lkml.kernel.org/r/20220715150521.18165-5-alex.sierra@amd.com Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> [v2] Reviewed-by: Alistair Popple <apopple@nvidia.com> [v6] Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Miaohe Lin
|
ad1ac596e8 |
mm/migration: fix potential pte_unmap on an not mapped pte
__migration_entry_wait and migration_entry_wait_on_locked assume pte is
always mapped from caller. But this is not the case when it's called from
migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant
that calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue.
Link: https://lkml.kernel.org/r/20220530113016.16663-5-linmiaohe@huawei.com
Fixes:
|
||
Miaohe Lin
|
7ce82f4c3f |
mm/migration: return errno when isolate_huge_page failed
We might fail to isolate huge page due to e.g. the page is under
migration which cleared HPageMigratable. We should return errno in this
case rather than always return 1 which could confuse the user, i.e. the
caller might think all of the memory is migrated while the hugetlb page is
left behind. We make the prototype of isolate_huge_page consistent with
isolate_lru_page as suggested by Huang Ying and rename isolate_huge_page
to isolate_hugetlb as suggested by Muchun to improve the readability.
Link: https://lkml.kernel.org/r/20220530113016.16663-4-linmiaohe@huawei.com
Fixes:
|
||
Miaohe Lin
|
160088b3b6 |
mm/migration: remove unneeded lock page and PageMovable check
When non-lru movable page was freed from under us, __ClearPageMovable must have been done. So we can remove unneeded lock page and PageMovable check here. Also free_pages_prepare() will clear PG_isolated for us, so we can further remove ClearPageIsolated as suggested by David. Link: https://lkml.kernel.org/r/20220530113016.16663-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
b653db7735 |
mm: Clear page->private when splitting or migrating a page
In our efforts to remove uses of PG_private, we have found folios with
the private flag clear and folio->private not-NULL. That is the root
cause behind
|