This reverts commit 6bad1052c2, it is the
LTS merge that had to previously get reverted due to being merged too
early.
Cc: Todd Kjos <tkjos@google.com>
Change-Id: I31b7d660bd833cf022ac4870f6d01e723fda5182
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 1dbafe61e3.
Reason for revert: Too early. Needs to wait until 2024-03-27
Change-Id: I769b944bd089aa2278659ec87f7ba4ac4e74ee4a
Signed-off-by: Todd Kjos <tkjos@google.com>
Changes in 6.1.72
keys, dns: Fix missing size check of V1 server-list header
block: Don't invalidate pagecache for invalid falloc modes
ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series
ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6
mptcp: prevent tcp diag from closing listener subflows
Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE
cifs: cifs_chan_is_iface_active should be called with chan_lock held
cifs: do not depend on release_iface for maintaining iface_list
KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
wifi: iwlwifi: pcie: don't synchronize IRQs from IRQ
drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes in AUX xfer
netfilter: use skb_ip_totlen and iph_totlen
netfilter: nf_tables: set transport offset from mac header for netdev/egress
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
octeontx2-af: Fix marking couple of structure as __packed
drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern
ice: Fix link_down_on_close message
ice: Shut down VSI with "link-down-on-close" enabled
i40e: Fix filter input checks to prevent config with invalid values
igc: Report VLAN EtherType matching back to user
igc: Check VLAN TCI mask
igc: Check VLAN EtherType mask
ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable
ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset
mlxbf_gige: fix receive packet race condition
net: sched: em_text: fix possible memory leak in em_text_destroy()
r8169: Fix PCI error on system resume
can: raw: add support for SO_MARK
net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps
net: annotate data-races around sk->sk_tsflags
net: annotate data-races around sk->sk_bind_phc
net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)
selftests: bonding: do not set port down when adding to bond
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
sfc: fix a double-free bug in efx_probe_filters
net: bcmgenet: Fix FCS generation for fragmented skbuffs
netfilter: nft_immediate: drop chain reference counter on error
net: Save and restore msg_namelen in sock_sendmsg
i40e: fix use-after-free in i40e_aqc_add_filters()
ASoC: meson: g12a-toacodec: Validate written enum values
ASoC: meson: g12a-tohdmitx: Validate written enum values
ASoC: meson: g12a-toacodec: Fix event generation
ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux
i40e: Restore VF MSI-X state during PCI reset
igc: Fix hicredit calculation
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
net/smc: fix invalid link access in dumping SMC-R connections
octeontx2-af: Always configure NIX TX link credits based on max frame size
octeontx2-af: Re-enable MAC TX in otx2_stop processing
asix: Add check for usbnet_get_endpoints
net: ravb: Wait for operating mode to be applied
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
net: Implement missing SO_TIMESTAMPING_NEW cmsg support
selftests: secretmem: floor the memory size to the multiple of page_size
cpu/SMT: Create topology_smt_thread_allowed()
cpu/SMT: Make SMT control more robust against enumeration failures
srcu: Fix callbacks acceleration mishandling
bpf, x64: Fix tailcall infinite loop
bpf, x86: Simplify the parsing logic of structure parameters
bpf, x86: save/restore regs with BPF_DW size
net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
splice, net: Add a splice_eof op to file-ops and socket-ops
ipv4, ipv6: Use splice_eof() to flush
udp: introduce udp->udp_flags
udp: move udp->no_check6_tx to udp->udp_flags
udp: move udp->no_check6_rx to udp->udp_flags
udp: move udp->gro_enabled to udp->udp_flags
udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags
udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO
udp: annotate data-races around udp->encap_type
wifi: iwlwifi: yoyo: swap cdb and jacket bits values
arm64: dts: qcom: sdm845: align RPMh regulator nodes with bindings
arm64: dts: qcom: sdm845: Fix PSCI power domain names
fbdev: imsttfb: Release framebuffer and dealloc cmap on error path
fbdev: imsttfb: fix double free in probe()
bpf: decouple prune and jump points
bpf: remove unnecessary prune and jump points
bpf: Remove unused insn_cnt argument from visit_[func_call_]insn()
bpf: clean up visit_insn()'s instruction processing
bpf: Support new 32bit offset jmp instruction
bpf: handle ldimm64 properly in check_cfg()
bpf: fix precision backtracking instruction iteration
blk-mq: make sure active queue usage is held for bio_integrity_prep()
net/mlx5: Increase size of irq name buffer
s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc()
s390/cpumf: support user space events for counting
f2fs: clean up i_compress_flag and i_compress_level usage
f2fs: convert to use bitmap API
f2fs: assign default compression level
f2fs: set the default compress_level on ioctl
selftests: mptcp: fix fastclose with csum failure
selftests: mptcp: set FAILING_LINKS in run_tests
media: camss: sm8250: Virtual channels for CSID
media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3
ext4: convert move_extent_per_page() to use folios
khugepage: replace try_to_release_page() with filemap_release_folio()
memory-failure: convert truncate_error_page() to use folio
mm: merge folio_has_private()/filemap_release_folio() call pairs
mm, netfs, fscache: stop read optimisation when folio removed from pagecache
filemap: add a per-mapping stable writes flag
block: update the stable_writes flag in bdev_add
smb: client: fix missing mode bits for SMB symlinks
net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats
dpaa2-eth: recycle the RX buffer only after all processing done
ethtool: don't propagate EOPNOTSUPP from dumps
bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7
genirq/affinity: Remove the 'firstvec' parameter from irq_build_affinity_masks
genirq/affinity: Pass affinity managed mask array to irq_build_affinity_masks
genirq/affinity: Don't pass irq_affinity_desc array to irq_build_affinity_masks
genirq/affinity: Rename irq_build_affinity_masks as group_cpus_evenly
genirq/affinity: Move group_cpus_evenly() into lib/
lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
mm/memory_hotplug: add missing mem_hotplug_lock
mm/memory_hotplug: fix error handling in add_memory_resource()
net: sched: call tcf_ct_params_free to free params in tcf_ct_init
netfilter: flowtable: allow unidirectional rules
netfilter: flowtable: cache info of last offload
net/sched: act_ct: offload UDP NEW connections
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
netfilter: flowtable: GC pushes back packets to classic path
net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
octeontx2-af: Fix pause frame configuration
octeontx2-af: Support variable number of lmacs
btrfs: fix qgroup_free_reserved_data int overflow
btrfs: mark the len field in struct btrfs_ordered_sum as unsigned
ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
i2c: core: Fix atomic xfer check for non-preempt config
mm: fix unmap_mapping_range high bits shift bug
drm/amdgpu: skip gpu_info fw loading on navi12
drm/amd/display: add nv12 bounding box
mmc: meson-mx-sdhc: Fix initialization frozen issue
mmc: rpmb: fixes pause retune on all RPMB partitions.
mmc: core: Cancel delayed work before releasing host
mmc: sdhci-sprd: Fix eMMC init failure after hw reset
genirq/affinity: Only build SMP-only helper functions on SMP kernels
f2fs: compress: fix to assign compress_level for lz4 correctly
net/sched: act_ct: additional checks for outdated flows
net/sched: act_ct: Always fill offloading tuple iifidx
bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
bpf: syzkaller found null ptr deref in unix_bpf proto add
media: qcom: camss: Comment CSID dt_id field
smb3: Replace smb2pdu 1-element arrays with flex-arrays
Revert "interconnect: qcom: sm8250: Enable sync_state"
Linux 6.1.72
Change-Id: Id00eb2ae1159d4d5fa0ef914e672c5669cbf5b0a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWYD8QACgkQONu9yGCS
aT5cEA//UKwVnselP3QHU6yEm2j8Vuq5IOEIqIeYTDTyS7TGP83SsyM4n2KRlTwC
/vaY3HWNsZHLqsNICPOPSdQn9STa7MYTnf/ackBbPglDnDz/A6mSB3zkXtCKFm6+
UBmk6Y8pZwpdvk3aa6Z62Kr5bGGHdzvXdiJitERLlD2PFUOZT9/IHSncGnts3TQv
PjFXy1KVIGsThKbtjtYPpa100RAti5HeLv/NbsaVbuKYMME/QCFmqyNRAp9k2iHx
3nkze70aoREShEDjaLkcsirzwRKJu7qqNriYLt+wd7HmcD328R2UlTR8L3ZM0xOq
qxBHnzbFtQyGR7NAudi2pStqwctPhFP6vRz1aJvt+w9tmbeKAWQWMd2pNvG8GhJm
nxYFGyPLzTgPifK5SELCNIW4WXf8rnrRNgZ+Ph/JIGuhp+603//ATHRlVEwHcnl+
M0GRbL06nWFVvfdKCYuu0autb9sW5T/vq02cbE5vRVVaziazry8S8EmxYQyOg9X/
CBAd1XTybVZki9VkIP5zbdvWJL3LhFfsabBFy7TPZor/YCJQDvxzw1iwtY/BPVDT
MryHjrYwH/n5RvibANRcTbCamMQY4IrJ4X3afJGgh7BK5N5C5ug4HYJ7oG5QB++x
xC4A5x3L6D9SE/St8hFWghjYcd6lFcjlz1wJ5MyLImwYqfr8DnY=
=Vt0s
-----END PGP SIGNATURE-----
Merge 6.1.71 into android14-6.1-lts
Changes in 6.1.71
ksmbd: replace one-element arrays with flexible-array members
ksmbd: set SMB2_SESSION_FLAG_ENCRYPT_DATA when enforcing data encryption for this share
ksmbd: use F_SETLK when unlocking a file
ksmbd: Fix resource leak in smb2_lock()
ksmbd: Convert to use sysfs_emit()/sysfs_emit_at() APIs
ksmbd: Implements sess->rpc_handle_list as xarray
ksmbd: fix typo, syncronous->synchronous
ksmbd: Remove duplicated codes
ksmbd: update Kconfig to note Kerberos support and fix indentation
ksmbd: Fix spelling mistake "excceed" -> "exceeded"
ksmbd: Fix parameter name and comment mismatch
ksmbd: remove unused is_char_allowed function
ksmbd: delete asynchronous work from list
ksmbd: set NegotiateContextCount once instead of every inc
ksmbd: avoid duplicate negotiate ctx offset increments
ksmbd: remove unused compression negotiate ctx packing
fs: introduce lock_rename_child() helper
ksmbd: fix racy issue from using ->d_parent and ->d_name
ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()
ksmbd: fix uninitialized pointer read in smb2_create_link()
ksmbd: call putname after using the last component
ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
ksmbd: add mnt_want_write to ksmbd vfs functions
ksmbd: remove unused ksmbd_tree_conn_share function
ksmbd: use kzalloc() instead of __GFP_ZERO
ksmbd: return a literal instead of 'err' in ksmbd_vfs_kern_path_locked()
ksmbd: Change the return value of ksmbd_vfs_query_maximal_access to void
ksmbd: use kvzalloc instead of kvmalloc
ksmbd: Replace the ternary conditional operator with min()
ksmbd: Use struct_size() helper in ksmbd_negotiate_smb_dialect()
ksmbd: Replace one-element array with flexible-array member
ksmbd: Fix unsigned expression compared with zero
ksmbd: check if a mount point is crossed during path lookup
ksmbd: switch to use kmemdup_nul() helper
ksmbd: add support for read compound
ksmbd: fix wrong interim response on compound
ksmbd: fix `force create mode' and `force directory mode'
ksmbd: Fix one kernel-doc comment
ksmbd: add missing calling smb2_set_err_rsp() on error
ksmbd: remove experimental warning
ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
ksmbd: fix passing freed memory 'aux_payload_buf'
ksmbd: return invalid parameter error response if smb2 request is invalid
ksmbd: check iov vector index in ksmbd_conn_write()
ksmbd: fix race condition with fp
ksmbd: fix race condition from parallel smb2 logoff requests
ksmbd: fix race condition from parallel smb2 lock requests
ksmbd: fix race condition between tree conn lookup and disconnect
ksmbd: fix wrong error response status by using set_smb2_rsp_status()
ksmbd: fix Null pointer dereferences in ksmbd_update_fstate()
ksmbd: fix potential double free on smb2_read_pipe() error path
ksmbd: Remove unused field in ksmbd_user struct
ksmbd: reorganize ksmbd_iov_pin_rsp()
ksmbd: fix kernel-doc comment of ksmbd_vfs_setxattr()
ksmbd: fix recursive locking in vfs helpers
ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()
ksmbd: add support for surrogate pair conversion
ksmbd: no need to wait for binded connection termination at logoff
ksmbd: fix kernel-doc comment of ksmbd_vfs_kern_path_locked()
ksmbd: prevent memory leak on error return
ksmbd: fix possible deadlock in smb2_open
ksmbd: separately allocate ci per dentry
ksmbd: move oplock handling after unlock parent dir
ksmbd: release interim response after sending status pending response
ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId
ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error
ksmbd: set epoch in create context v2 lease
ksmbd: set v2 lease capability
ksmbd: downgrade RWH lease caching state to RH for directory
ksmbd: send v2 lease break notification for directory
ksmbd: lazy v2 lease break on smb2_write()
ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
ksmbd: fix wrong allocation size update in smb2_open()
ARM: dts: Fix occasional boot hang for am3 usb
usb: fotg210-hcd: delete an incorrect bounds test
spi: Introduce spi_get_device_match_data() helper
iio: imu: adis16475: add spi_device_id table
nfsd: separate nfsd_last_thread() from nfsd_put()
nfsd: call nfsd_last_thread() before final nfsd_put()
linux/export: Ensure natural alignment of kcrctab array
spi: Reintroduce spi_set_cs_timing()
spi: Add APIs in spi core to set/get spi->chip_select and spi->cs_gpiod
spi: atmel: Fix clock issue when using devices with different polarities
block: renumber QUEUE_FLAG_HW_WC
ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
mm/filemap: avoid buffered read/write race to read inconsistent data
mm: migrate high-order folios in swap cache correctly
mm/memory-failure: cast index to loff_t before shifting it
mm/memory-failure: check the mapcount of the precise page
ring-buffer: Fix wake ups when buffer_percent is set to 100
tracing: Fix blocked reader of snapshot buffer
ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
netfilter: nf_tables: skip set commit for deleted/destroyed sets
ring-buffer: Fix slowpath of interrupted event
NFSD: fix possible oops when nfsd/pool_stats is closed.
spi: Constify spi parameters of chip select APIs
device property: Allow const parameter to dev_fwnode()
kallsyms: Make module_kallsyms_on_each_symbol generally available
tracing/kprobes: Fix symbol counting logic by looking at modules as well
Revert "platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe"
Linux 6.1.71
Change-Id: I7bc16d981b90e8e0b633628438f79fce898ad15a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 0201ebf274a306a6ebb95e5dc2d6a0a27c737cac ]
Patch series "mm, netfs, fscache: Stop read optimisation when folio
removed from pagecache", v7.
This fixes an optimisation in fscache whereby we don't read from the cache
for a particular file until we know that there's data there that we don't
have in the pagecache. The problem is that I'm no longer using PG_fscache
(aka PG_private_2) to indicate that the page is cached and so I don't get
a notification when a cached page is dropped from the pagecache.
The first patch merges some folio_has_private() and
filemap_release_folio() pairs and introduces a helper,
folio_needs_release(), to indicate if a release is required.
The second patch is the actual fix. Following Willy's suggestions[1], it
adds an AS_RELEASE_ALWAYS flag to an address_space that will make
filemap_release_folio() always call ->release_folio(), even if
PG_private/PG_private_2 aren't set. folio_needs_release() is altered to
add a check for this.
This patch (of 2):
Make filemap_release_folio() check folio_has_private(). Then, in most
cases, where a call to folio_has_private() is immediately followed by a
call to filemap_release_folio(), we can get rid of the test in the pair.
There are a couple of sites in mm/vscan.c that this can't so easily be
done. In shrink_folio_list(), there are actually three cases (something
different is done for incompletely invalidated buffers), but
filemap_release_folio() elides two of them.
In shrink_active_list(), we don't have have the folio lock yet, so the
check allows us to avoid locking the page unnecessarily.
A wrapper function to check if a folio needs release is provided for those
places that still need to do it in the mm/ directory. This will acquire
additional parts to the condition in a future patch.
After this, the only remaining caller of folio_has_private() outside of
mm/ is a check in fuse.
Link: https://lkml.kernel.org/r/20230628104852.3391651-1-dhowells@redhat.com
Link: https://lkml.kernel.org/r/20230628104852.3391651-2-dhowells@redhat.com
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Cc: Dave Wysochanski <dwysocha@redhat.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 1898efcdbed3 ("block: update the stable_writes flag in bdev_add")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac5efa782041670b63a05c36d92d02a80e50bb63 ]
Replace try_to_release_page() with filemap_release_folio(). This change
is in preparation for the removal of the try_to_release_page() wrapper.
Link: https://lkml.kernel.org/r/20221118073055.55694-4-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 1898efcdbed3 ("block: update the stable_writes flag in bdev_add")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c79c5a0a00a9457718056b588f312baadf44e471 upstream.
A process may map only some of the pages in a folio, and might be missed
if it maps the poisoned page but not the head page. Or it might be
unnecessarily hit if it maps the head page, but not the poisoned page.
Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org
Fixes: 7af446a841 ("HWPOISON, hugetlb: enable error handling path for hugepage")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39ebd6dce62d8cfe3864e16148927a139f11bc9a upstream.
On 32-bit systems, we'll lose the top bits of index because arithmetic
will be performed in unsigned long instead of unsigned long long. This
affects files over 4GB in size.
Link: https://lkml.kernel.org/r/20231218135837.3310403-4-willy@infradead.org
Fixes: 6100e34b25 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 6.1.50
NFSv4.2: fix error handling in nfs42_proc_getxattr
NFSv4: fix out path in __nfs4_get_acl_uncached
xprtrdma: Remap Receive buffers after a reconnect
drm/ast: Use drm_aperture_remove_conflicting_pci_framebuffers
fbdev/radeon: use pci aperture helpers
drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
drm/aperture: Remove primary argument
video/aperture: Only kick vgacon when the pdev is decoding vga
video/aperture: Move vga handling to pci function
PCI: acpiphp: Reassign resources on bridge if necessary
MIPS: cpu-features: Enable octeon_cache by cpu_type
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
jbd2: remove t_checkpoint_io_list
jbd2: remove journal_clean_one_cp_list()
jbd2: fix a race when checking checkpoint buffer busy
can: raw: fix receiver memory leak
can: raw: fix lockdep issue in raw_release()
s390/zcrypt: remove unnecessary (void *) conversions
s390/zcrypt: fix reply buffer calculations for CCA replies
drm/i915: Add the gen12_needs_ccs_aux_inv helper
drm/i915/gt: Ensure memory quiesced before invalidation
drm/i915/gt: Poll aux invalidation register bit on invalidation
drm/i915/gt: Support aux invalidation on all engines
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
tracing: Fix memleak due to race between current_tracer and trace
octeontx2-af: SDP: fix receive link config
devlink: move code to a dedicated directory
devlink: add missing unregister linecard notification
net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates
sock: annotate data-races around prot->memory_pressure
dccp: annotate data-races in dccp_poll()
ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC
mlxsw: reg: Fix SSPR register layout
mlxsw: Fix the size of 'VIRT_ROUTER_MSB'
selftests: mlxsw: Fix test failure on Spectrum-4
net: dsa: mt7530: fix handling of 802.1X PAE frames
net: bgmac: Fix return value check for fixed_phy_register()
net: bcmgenet: Fix return value check for fixed_phy_register()
net: validate veth and vxcan peer ifindexes
ipv4: fix data-races around inet->inet_id
ice: fix receive buffer size miscalculation
Revert "ice: Fix ice VF reset during iavf initialization"
ice: Fix NULL pointer deref during VF reset
selftests: bonding: do not set port down before adding to bond
can: isotp: fix support for transmission of SF without flow control
igb: Avoid starting unnecessary workqueues
igc: Fix the typo in the PTM Control macro
net/sched: fix a qdisc modification with ambiguous command request
i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: fix out of memory error handling
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
bonding: fix macvlan over alb bond support
KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated
KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
io_uring: get rid of double locking
io_uring: extract a io_msg_install_complete helper
io_uring/msg_ring: move double lock/unlock helpers higher up
io_uring/msg_ring: fix missing lock on overflow for IOPOLL
ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
ASoC: cs35l41: Correct amp_gain_tlv values
ibmveth: Use dcbf rather than dcbfl
wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
NFSv4: Fix dropped lock for racing OPEN and delegation return
clk: Fix slab-out-of-bounds error in devm_clk_release()
mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
shmem: fix smaps BUG sleeping while atomic
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
mm: add a call to flush_cache_vmap() in vmap_pfn()
mm: memory-failure: fix unexpected return value in soft_offline_page()
NFS: Fix a use after free in nfs_direct_join_group()
nfsd: Fix race to FREE_STATEID and cl_revoked
selinux: set next pointer before attaching to list
batman-adv: Trigger events for auto adjusted MTU
batman-adv: Don't increase MTU when set by user
batman-adv: Do not get eth header before batadv_check_management_packet
batman-adv: Fix TT global entry leak when client roamed back
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
batman-adv: Hold rtnl lock during MTU update via netlink
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
riscv: Handle zicsr/zifencei issue between gcc and binutils
riscv: Fix build errors using binutils2.37 toolchains
radix tree: remove unused variable
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
pinctrl: amd: Mask wake bits on probe again
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
drm/vmwgfx: Fix shader stage validation
drm/i915/dgfx: Enable d3cold at s2idle
drm/display/dp: Fix the DP DSC Receiver cap size
x86/fpu: Invalidate FPU state correctly on exec()
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
hwmon: (aquacomputer_d5next) Add selective 200ms delay after sending ctrl report
selftests/net: mv bpf/nat6to4.c to net folder
nfs: use vfs setgid helper
nfsd: use vfs setgid helper
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
sched/cpuset: Bring back cpuset_mutex
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
cgroup/cpuset: Iterate only if DEADLINE tasks are present
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails
thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
can: raw: add missing refcount for memory leak fix
madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
dma-buf/sw_sync: Avoid recursive lock during fence signal
gpio: sim: dispose of irq mappings before destroying the irq_sim domain
gpio: sim: pass the GPIO device's software node to irq domain
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
maple_tree: disable mas_wr_append() when other readers are possible
ASoC: amd: vangogh: select CONFIG_SND_AMD_ACP_CONFIG
Linux 6.1.50
Change-Id: I9b8e3da5baa106b08b2b90974c19128141817580
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit e2c1ab070fdc81010ec44634838d24fce9ff9e53 upstream.
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822d ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
walk_page_range() and friends often operate under write-locked mmap_lock.
With introduction of vma locks, the vmas have to be locked as well during
such walks to prevent concurrent page faults in these areas. Add an
additional member to mm_walk_ops to indicate locking requirements for the
walk.
The change ensures that page walks which prevent concurrent page faults
by write-locking mmap_lock, operate correctly after introduction of
per-vma locks. With per-vma locks page faults can be handled under vma
lock without taking mmap_lock at all, so write locking mmap_lock would
not stop them. The change ensures vmas are properly locked during such
walks.
A sample issue this solves is do_mbind() performing queue_pages_range()
to queue pages for migration. Without this change a concurrent page
can be faulted into the area and be left out of migration.
Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Suggested-by: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 2ebc368f59eedcef0de7c832fe1d62935cd3a7ff
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: changed locking in break_ksm since it's done differently,
skipped the change in the missing __ksm_del_vma(), skipped the change in
the missing walk_page_range_vma(), removed unused local variables]
Bug: 293665307
Change-Id: Iede9eaa950ea59a268a2e74a8d3022162f0bbd80
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
commit 6da6b1d4a7df8c35770186b53ef65d388398e139 upstream.
After a memory error happens on a clean folio, a process unexpectedly
receives SIGBUS when it accesses the error page. This SIGBUS killing is
pointless and simply degrades the level of RAS of the system, because the
clean folio can be dropped without any data lost on memory error handling
as we do for a clean pagecache.
When memory_failure() is called on a clean folio, try_to_unmap() is called
twice (one from split_huge_page() and one from hwpoison_user_mappings()).
The root cause of the issue is that pte conversion to hwpoisoned entry is
now done in the first call of try_to_unmap() because PageHWPoison is
already set at this point, while it's actually expected to be done in the
second call. This behavior disturbs the error handling operation like
removing pagecache, which results in the malfunction described above.
So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
when we really intend to convert pte to hwpoison entry. This can prevent
other callers of try_to_unmap() from accidentally converting to hwpoison
entries.
Link: https://lkml.kernel.org/r/20230221085905.1465385-1-naoya.horiguchi@linux.dev
Fixes: a42634a6c0 ("readahead: Use a folio in read_pages()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a760542666 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We've got a bunch of special swap entries that stores PFN inside the swap
offset fields. To fetch the PFN, normally the user just calls
swp_offset() assuming that'll be the PFN.
Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the
max possible length of a PFN on the host, meanwhile doing proper check
with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the
PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry().
One reason to do so is we never tried to sanitize whether swap offset can
really fit for storing PFN. At the meantime, this patch also prepares us
with the future possibility to store more information inside the swp
offset field, so assuming "swp_offset(entry)" to be the PFN will not stand
any more very soon.
Replace many of the swp_offset() callers to use swp_offset_pfn() where
proper. Note that many of the existing users are not candidates for the
replacement, e.g.:
(1) When the swap entry is not a pfn swap entry at all, or,
(2) when we wanna keep the whole swp_offset but only change the swp type.
For the latter, it can happen when fork() triggered on a write-migration
swap entry pte, we may want to only change the migration type from
write->read but keep the rest, so it's not "fetching PFN" but "changing
swap type only". They're left aside so that when there're more
information within the swp offset they'll be carried over naturally in
those cases.
Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what
the new swp_offset_pfn() is about.
Link: https://lkml.kernel.org/r/20220811161331.37055-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1.Remove meaningless comment in kill_proc(). That doesn't tell anything.
2.Fix the wrong function name get_hwpoison_unless_zero(). It should be
get_page_unless_zero().
3.The gate keeper for free hwpoison page has moved to check_new_page().
Update the corresponding comment.
Link: https://lkml.kernel.org/r/20220830123604.25763-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
PageTable can't be handled by memory_failure(). Filter it out explicitly in
hwpoison_user_mappings(). This will also make code more consistent with the
relevant check in unpoison_memory().
Link: https://lkml.kernel.org/r/20220830123604.25763-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If vma->vm_mm != t->mm, there's no need to call page_mapped_in_vma() as
add_to_kill() won't be called in this case. Move up the mm check to avoid
possible unneeded calling to page_mapped_in_vma().
Link: https://lkml.kernel.org/r/20220830123604.25763-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use num_poisoned_pages_sub() to combine multiple atomic ops into one. Also
num_poisoned_pages_dec() can be killed as there's no caller now.
Link: https://lkml.kernel.org/r/20220830123604.25763-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It's more recommended to use __PageMovable() to detect non-lru movable
pages. We can avoid bumping page refcnt via isolate_movable_page() for
the isolated lru pages. Also if pages become PageLRU just after they're
checked but before trying to isolate them, isolate_lru_page() will be
called to do the right work.
[linmiaohe@huawei.com: fixes per Naoya Horiguchi]
Link: https://lkml.kernel.org/r/1f7ee86e-7d28-0d8c-e0de-b7a5a94519e8@huawei.com
Link: https://lkml.kernel.org/r/20220830123604.25763-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "A few cleanup patches for memory-failure".
his series contains a few cleanup patches to use __PageMovable() to detect
non-lru movable pages, use num_poisoned_pages_sub() to reduce multiple
atomic ops overheads and so on. More details can be found in the
respective changelogs.
This patch (of 6):
Use ClearPageHWPoison() instead of TestClearPageHWPoison() to clear page
hwpoison flags to avoid unneeded full memory barrier overhead.
Link: https://lkml.kernel.org/r/20220830123604.25763-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220830123604.25763-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The GHES code calls memory_failure_queue() from IRQ context to queue work
into workqueue and schedule it on the current CPU. Then the work is
processed in memory_failure_work_func() by kworker and calls
memory_failure().
When a page is already poisoned, commit a3f5d80ea4 ("mm,hwpoison: send
SIGBUS with error virutal address") make memory_failure() call
kill_accessing_process() that:
- holds mmap locking of current->mm
- does pagetable walk to find the error virtual address
- and sends SIGBUS to the current process with error info.
However, the mm of kworker is not valid, resulting in a null-pointer
dereference. So check mm when killing the accessing process.
[akpm@linux-foundation.org: remove unrelated whitespace alteration]
Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
Fixes: a3f5d80ea4 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Bixuan Cui <cuibixuan@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Open-code the page_handle_poison() into soft_offline_page() and kill
unneeded soft_offline_free_page().
Link: https://lkml.kernel.org/r/20220819033402.156519-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For reserved pages, HWPoison flag will be set without increasing the page
refcnt. So we shouldn't even try to unpoison these pages and thus
decrease the page refcnt unexpectly. Add a PageReserved() check to filter
this case out and remove the below unneeded zero page (zero page is
reserved) check.
Link: https://lkml.kernel.org/r/20220818130016.45313-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If try_to_unmap() fails, the hwpoisoned page still resides in the address
space of some processes. We should kill these processes or the hwpoisoned
page might be consumed later. collect_procs() is always called to collect
relevant processes now so they can be killed later if unmap fails.
[linmiaohe@huawei.com: v2]
Link: https://lkml.kernel.org/r/20220823032346.4260-6-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220818130016.45313-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After kill_procs(), tk will be freed without being removed from the
to_kill list. In the next iteration, the freed list entry in the to_kill
list will be accessed, thus leading to use-after-free issue. Adding
list_del() in kill_procs() to fix the issue.
Link: https://lkml.kernel.org/r/20220823032346.4260-5-linmiaohe@huawei.com
Fixes: c36e202495 ("mm: introduce mf_dax_kill_procs() for fsdax case")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When hwpoison_filter() refuses to soft offline a page, the page refcnt
incremented previously by MF_COUNT_INCREASED would have been consumed via
get_hwpoison_page() if ret <= 0. So the put_ref_page() here will put the
extra one. Remove it to fix the issue.
Link: https://lkml.kernel.org/r/20220818130016.45313-4-linmiaohe@huawei.com
Fixes: 9113eaf331 ("mm/memory-failure.c: add hwpoison_filter for soft offline")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When free_raw_hwp_pages() fails its work, the refcnt of the hugetlb page
would have been incremented if ret > 0. Using put_page() to fix refcnt
leaking in this case.
Link: https://lkml.kernel.org/r/20220818130016.45313-3-linmiaohe@huawei.com
Fixes: debb6b9c3fdd ("mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "A few fixup patches for memory-failure", v2.
This series contains a few fixup patches to fix incorrect update of page
refcnt, fix possible use-after-free issue and so on. More details can be
found in the respective changelogs.
This patch (of 6):
When hwpoison_filter() refuses to hwpoison a hugetlb page, the refcnt of
the page would have been incremented if res == 1. Using put_page() to fix
the refcnt leaking in this case.
Link: https://lkml.kernel.org/r/20220823032346.4260-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220818130016.45313-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220818130016.45313-2-linmiaohe@huawei.com
Fixes: 405ce05123 ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since commit 5d1fd5dc87 ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP"),
the action_result(,MF_MSG_UNSPLIT_THP,) called to show memory error event
in memory_failure(), so the pr_info() in try_to_split_thp_page() is only
needed in soft_offline_in_use_page().
Meanwhile this could also fix the unexpected prefix for "thp split failed"
due to commit 96f96763de ("mm: memory-failure: convert to pr_fmt()").
Link: https://lkml.kernel.org/r/20220809111813.139690-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the case where a filesystem is polled to take over the memory failure
and receives -EOPNOTSUPP it indicates that page->index and page->mapping
are valid for reverse mapping the failure address. Introduce
FSDAX_INVALID_PGOFF to distinguish when add_to_kill() is being called from
mf_dax_kill_procs() by a filesytem vs the typical memory_failure() path.
Otherwise, vma_pgoff_address() is called with an invalid fsdax_pgoff which
then trips this failing signature:
kernel BUG at mm/memory-failure.c:319!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 13 PID: 1262 Comm: dax-pmd Tainted: G OE N 6.0.0-rc2+ #62
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:add_to_kill.cold+0x19d/0x209
[..]
Call Trace:
<TASK>
collect_procs.part.0+0x2c4/0x460
memory_failure+0x71b/0xba0
? _printk+0x58/0x73
do_madvise.part.0.cold+0xaf/0xc5
Link: https://lkml.kernel.org/r/166153429427.2758201.14605968329933175594.stgit@dwillia2-xfh.jf.intel.com
Fixes: c36e202495 ("mm: introduce mf_dax_kill_procs() for fsdax case")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now error handling code is prepared, so remove the blocking code and
enable memory error handling on 1GB hugepage.
Link: https://lkml.kernel.org/r/20220714042420.1847125-9-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently if memory_failure() (modified to remove blocking code with
subsequent patch) is called on a page in some 1GB hugepage, memory error
handling fails and the raw error page gets into leaked state. The impact
is small in production systems (just leaked single 4kB page), but this
limits the testability because unpoison doesn't work for it. We can no
longer create 1GB hugepage on the 1GB physical address range with such
leaked pages, that's not useful when testing on small systems.
When a hwpoison page in a 1GB hugepage is handled, it's caught by the
PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
broken down into raw error pages before coming to this point:
if (unlikely(PageHWPoison(page)) && !order) {
...
return false;
}
Then, the page is not sent to buddy and the page refcount is left 0.
Originally this check is supposed to work when the error page is freed
from page_handle_poison() (that is called from soft-offline), but now we
are opening another path to call it, so the callers of
__page_handle_poison() need to handle the case by considering the return
value 0 as success. Then page refcount for hwpoison is properly
incremented so unpoison works.
Link: https://lkml.kernel.org/r/20220714042420.1847125-8-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__page_handle_poison() returns bool that shows whether
take_page_off_buddy() has passed or not now. But we will want to
distinguish another case of "dissolve has passed but taking off failed" by
its return value. So change the type of the return value. No functional
change.
Link: https://lkml.kernel.org/r/20220714042420.1847125-7-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If memory_failure() fails to grab page refcount on a hugetlb page because
it's busy, it returns without setting PG_hwpoison on it. This not only
loses a chance of error containment, but breaks the rule that
action_result() should be called only when memory_failure() do any of
handling work (even if that's just setting PG_hwpoison). This
inconsistency could harm code maintainability.
So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
Link: https://lkml.kernel.org/r/20220714042420.1847125-6-naoya.horiguchi@linux.dev
Fixes: 405ce05123 ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Raw error info list needs to be removed when hwpoisoned hugetlb is
unpoisoned. And unpoison handler needs to know how many errors there are
in the target hugepage. So add them.
HPageVmemmapOptimized(hpage) and HPageRawHwpUnreliable(hpage)) sometimes
can't be unpoisoned, so skip them.
Link: https://lkml.kernel.org/r/20220714042420.1847125-5-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When handling memory error on a hugetlb page, the error handler tries to
dissolve and turn it into 4kB pages. If it's successfully dissolved,
PageHWPoison flag is moved to the raw error page, so that's all right.
However, dissolve sometimes fails, then the error page is left as
hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
healthy pages, but that's not possible now because the information about
where the raw error pages is lost.
Use the private field of a few tail pages to keep that information. The
code path of shrinking hugepage pool uses this info to try delayed
dissolve. In order to remember multiple errors in a hugepage, a
singly-linked list originated from SUBPAGE_INDEX_HWPOISON-th tail page is
constructed. Only simple operations (adding an entry or clearing all) are
required and the list is assumed not to be very long, so this simple data
structure should be enough.
If we failed to save raw error info, the hwpoison hugepage has errors on
unknown subpage, then this new saving mechanism does not work any more, so
disable saving new raw error info and freeing hwpoison hugepages.
Link: https://lkml.kernel.org/r/20220714042420.1847125-4-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into their
own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp
gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha
mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS
7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT
/58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z
C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M
Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q==
=DgUi
-----END PGP SIGNATURE-----
Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into
their own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
fs: remove the NULL get_block case in mpage_writepages
fs: don't call ->writepage from __mpage_writepage
fs: remove the nobh helpers
jfs: stop using the nobh helper
ext2: remove nobh support
ntfs3: refactor ntfs_writepages
mm/folio-compat: Remove migration compatibility functions
fs: Remove aops->migratepage()
secretmem: Convert to migrate_folio
hugetlb: Convert to migrate_folio
aio: Convert to migrate_folio
f2fs: Convert to filemap_migrate_folio()
ubifs: Convert to filemap_migrate_folio()
btrfs: Convert btrfs_migratepage to migrate_folio
mm/migrate: Add filemap_migrate_folio()
mm/migrate: Convert migrate_page() to migrate_folio()
nfs: Convert to migrate_folio
btrfs: Convert btree_migratepage to migrate_folio
mm/migrate: Convert expected_page_refs() to folio_expected_refs()
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
...
Use pr_fmt to prefix all pr_<level> output, but unpoison_memory() and
soft_offline_page() are used by error injection, which have own prefixes
like "Unpoison:" and "soft offline:", meanwhile, soft_offline_page() could
be used by memory hotremove, so reset pr_fmt before unpoison_pr_info
definition to keep the original output for them.
[wangkefeng.wang@huawei.com: v3]
Link: https://lkml.kernel.org/r/20220729031919.72331-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20220726081046.10742-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This new function is a variant of mf_generic_kill_procs that accepts a
file, offset pair instead of a struct to support multiple files sharing a
DAX mapping. It is intended to be called by the file systems as part of
the memory_failure handler after the file system performed a reverse
mapping from the storage address to the file and file offset.
Link: https://lkml.kernel.org/r/20220603053738.1218681-6-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When memory-failure occurs, we call this function which is implemented by
each kind of devices. For the fsdax case, pmem device driver implements
it. Pmem device driver will find out the filesystem in which the
corrupted page located in.
With dax_holder notify support, we are able to notify the memory failure
from pmem driver to upper layers. If there is something not support in
the notify routine, memory_failure will fall back to the generic hanlder.
Link: https://lkml.kernel.org/r/20220603053738.1218681-4-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
memory_failure_dev_pagemap code is a bit complex before introduce RMAP
feature for fsdax. So it is needed to factor some helper functions to
simplify these code.
[akpm@linux-foundation.org: fix CONFIG_HUGETLB_PAGE=n build]
[zhengbin13@huawei.com: fix redefinition of mf_generic_kill_procs]
Link: https://lkml.kernel.org/r/20220628112143.1170473-1-zhengbin13@huawei.com
Link: https://lkml.kernel.org/r/20220603053738.1218681-3-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI or
CXL). Any page of a process can be migrated to such memory. However, no
one should be allowed to pin such memory so that it can always be evicted.
[hch@lst.de: rebased ontop of the refcount changes, remove is_dev_private_or_coherent_page]
Link: https://lkml.kernel.org/r/20220715150521.18165-4-alex.sierra@amd.com
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We might fail to isolate huge page due to e.g. the page is under
migration which cleared HPageMigratable. We should return errno in this
case rather than always return 1 which could confuse the user, i.e. the
caller might think all of the memory is migrated while the hugetlb page is
left behind. We make the prototype of isolate_huge_page consistent with
isolate_lru_page as suggested by Huang Ying and rename isolate_huge_page
to isolate_hugetlb as suggested by Muchun to improve the readability.
Link: https://lkml.kernel.org/r/20220530113016.16663-4-linmiaohe@huawei.com
Fixes: e8db67eb0d ("mm: migrate: move_pages() supports thp migration")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Huang Ying <ying.huang@intel.com>
Reported-by: kernel test robot <lkp@intel.com> (build error)
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>