Commit Graph

435 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
3ca4271578 Reapply "Merge tag 'android14-6.1.75_r00' into android14-6.1"
This reverts commit 6bad1052c2, it is the
LTS merge that had to previously get reverted due to being merged too
early.

Cc: Todd Kjos <tkjos@google.com>
Change-Id: I31b7d660bd833cf022ac4870f6d01e723fda5182
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2024-04-02 19:49:12 +00:00
Todd Kjos
6bad1052c2 Revert "Merge tag 'android14-6.1.75_r00' into android14-6.1"
This reverts commit 1dbafe61e3.

Reason for revert: Too early. Needs to wait until 2024-03-27

Change-Id: I769b944bd089aa2278659ec87f7ba4ac4e74ee4a
Signed-off-by: Todd Kjos <tkjos@google.com>
2024-03-07 21:18:27 +00:00
Greg Kroah-Hartman
e1b12db2de Merge 6.1.72 into android14-6.1-lts
Changes in 6.1.72
	keys, dns: Fix missing size check of V1 server-list header
	block: Don't invalidate pagecache for invalid falloc modes
	ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series
	ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook
	ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6
	mptcp: prevent tcp diag from closing listener subflows
	Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
	drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE
	cifs: cifs_chan_is_iface_active should be called with chan_lock held
	cifs: do not depend on release_iface for maintaining iface_list
	KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
	wifi: iwlwifi: pcie: don't synchronize IRQs from IRQ
	drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes in AUX xfer
	netfilter: use skb_ip_totlen and iph_totlen
	netfilter: nf_tables: set transport offset from mac header for netdev/egress
	nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
	octeontx2-af: Fix marking couple of structure as __packed
	drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern
	ice: Fix link_down_on_close message
	ice: Shut down VSI with "link-down-on-close" enabled
	i40e: Fix filter input checks to prevent config with invalid values
	igc: Report VLAN EtherType matching back to user
	igc: Check VLAN TCI mask
	igc: Check VLAN EtherType mask
	ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable
	ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset
	mlxbf_gige: fix receive packet race condition
	net: sched: em_text: fix possible memory leak in em_text_destroy()
	r8169: Fix PCI error on system resume
	can: raw: add support for SO_MARK
	net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps
	net: annotate data-races around sk->sk_tsflags
	net: annotate data-races around sk->sk_bind_phc
	net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)
	selftests: bonding: do not set port down when adding to bond
	ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
	sfc: fix a double-free bug in efx_probe_filters
	net: bcmgenet: Fix FCS generation for fragmented skbuffs
	netfilter: nft_immediate: drop chain reference counter on error
	net: Save and restore msg_namelen in sock_sendmsg
	i40e: fix use-after-free in i40e_aqc_add_filters()
	ASoC: meson: g12a-toacodec: Validate written enum values
	ASoC: meson: g12a-tohdmitx: Validate written enum values
	ASoC: meson: g12a-toacodec: Fix event generation
	ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux
	i40e: Restore VF MSI-X state during PCI reset
	igc: Fix hicredit calculation
	net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
	net/smc: fix invalid link access in dumping SMC-R connections
	octeontx2-af: Always configure NIX TX link credits based on max frame size
	octeontx2-af: Re-enable MAC TX in otx2_stop processing
	asix: Add check for usbnet_get_endpoints
	net: ravb: Wait for operating mode to be applied
	bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
	net: Implement missing SO_TIMESTAMPING_NEW cmsg support
	selftests: secretmem: floor the memory size to the multiple of page_size
	cpu/SMT: Create topology_smt_thread_allowed()
	cpu/SMT: Make SMT control more robust against enumeration failures
	srcu: Fix callbacks acceleration mishandling
	bpf, x64: Fix tailcall infinite loop
	bpf, x86: Simplify the parsing logic of structure parameters
	bpf, x86: save/restore regs with BPF_DW size
	net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
	udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
	splice, net: Add a splice_eof op to file-ops and socket-ops
	ipv4, ipv6: Use splice_eof() to flush
	udp: introduce udp->udp_flags
	udp: move udp->no_check6_tx to udp->udp_flags
	udp: move udp->no_check6_rx to udp->udp_flags
	udp: move udp->gro_enabled to udp->udp_flags
	udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags
	udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO
	udp: annotate data-races around udp->encap_type
	wifi: iwlwifi: yoyo: swap cdb and jacket bits values
	arm64: dts: qcom: sdm845: align RPMh regulator nodes with bindings
	arm64: dts: qcom: sdm845: Fix PSCI power domain names
	fbdev: imsttfb: Release framebuffer and dealloc cmap on error path
	fbdev: imsttfb: fix double free in probe()
	bpf: decouple prune and jump points
	bpf: remove unnecessary prune and jump points
	bpf: Remove unused insn_cnt argument from visit_[func_call_]insn()
	bpf: clean up visit_insn()'s instruction processing
	bpf: Support new 32bit offset jmp instruction
	bpf: handle ldimm64 properly in check_cfg()
	bpf: fix precision backtracking instruction iteration
	blk-mq: make sure active queue usage is held for bio_integrity_prep()
	net/mlx5: Increase size of irq name buffer
	s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc()
	s390/cpumf: support user space events for counting
	f2fs: clean up i_compress_flag and i_compress_level usage
	f2fs: convert to use bitmap API
	f2fs: assign default compression level
	f2fs: set the default compress_level on ioctl
	selftests: mptcp: fix fastclose with csum failure
	selftests: mptcp: set FAILING_LINKS in run_tests
	media: camss: sm8250: Virtual channels for CSID
	media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3
	ext4: convert move_extent_per_page() to use folios
	khugepage: replace try_to_release_page() with filemap_release_folio()
	memory-failure: convert truncate_error_page() to use folio
	mm: merge folio_has_private()/filemap_release_folio() call pairs
	mm, netfs, fscache: stop read optimisation when folio removed from pagecache
	filemap: add a per-mapping stable writes flag
	block: update the stable_writes flag in bdev_add
	smb: client: fix missing mode bits for SMB symlinks
	net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats
	dpaa2-eth: recycle the RX buffer only after all processing done
	ethtool: don't propagate EOPNOTSUPP from dumps
	bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
	firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
	ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7
	genirq/affinity: Remove the 'firstvec' parameter from irq_build_affinity_masks
	genirq/affinity: Pass affinity managed mask array to irq_build_affinity_masks
	genirq/affinity: Don't pass irq_affinity_desc array to irq_build_affinity_masks
	genirq/affinity: Rename irq_build_affinity_masks as group_cpus_evenly
	genirq/affinity: Move group_cpus_evenly() into lib/
	lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
	mm/memory_hotplug: add missing mem_hotplug_lock
	mm/memory_hotplug: fix error handling in add_memory_resource()
	net: sched: call tcf_ct_params_free to free params in tcf_ct_init
	netfilter: flowtable: allow unidirectional rules
	netfilter: flowtable: cache info of last offload
	net/sched: act_ct: offload UDP NEW connections
	net/sched: act_ct: Fix promotion of offloaded unreplied tuple
	netfilter: flowtable: GC pushes back packets to classic path
	net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
	octeontx2-af: Fix pause frame configuration
	octeontx2-af: Support variable number of lmacs
	btrfs: fix qgroup_free_reserved_data int overflow
	btrfs: mark the len field in struct btrfs_ordered_sum as unsigned
	ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
	firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
	x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
	i2c: core: Fix atomic xfer check for non-preempt config
	mm: fix unmap_mapping_range high bits shift bug
	drm/amdgpu: skip gpu_info fw loading on navi12
	drm/amd/display: add nv12 bounding box
	mmc: meson-mx-sdhc: Fix initialization frozen issue
	mmc: rpmb: fixes pause retune on all RPMB partitions.
	mmc: core: Cancel delayed work before releasing host
	mmc: sdhci-sprd: Fix eMMC init failure after hw reset
	genirq/affinity: Only build SMP-only helper functions on SMP kernels
	f2fs: compress: fix to assign compress_level for lz4 correctly
	net/sched: act_ct: additional checks for outdated flows
	net/sched: act_ct: Always fill offloading tuple iifidx
	bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
	bpf: syzkaller found null ptr deref in unix_bpf proto add
	media: qcom: camss: Comment CSID dt_id field
	smb3: Replace smb2pdu 1-element arrays with flex-arrays
	Revert "interconnect: qcom: sm8250: Enable sync_state"
	Linux 6.1.72

Change-Id: Id00eb2ae1159d4d5fa0ef914e672c5669cbf5b0a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2024-01-14 13:26:13 +00:00
Greg Kroah-Hartman
8eac30b25e This is the 6.1.71 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWYD8QACgkQONu9yGCS
 aT5cEA//UKwVnselP3QHU6yEm2j8Vuq5IOEIqIeYTDTyS7TGP83SsyM4n2KRlTwC
 /vaY3HWNsZHLqsNICPOPSdQn9STa7MYTnf/ackBbPglDnDz/A6mSB3zkXtCKFm6+
 UBmk6Y8pZwpdvk3aa6Z62Kr5bGGHdzvXdiJitERLlD2PFUOZT9/IHSncGnts3TQv
 PjFXy1KVIGsThKbtjtYPpa100RAti5HeLv/NbsaVbuKYMME/QCFmqyNRAp9k2iHx
 3nkze70aoREShEDjaLkcsirzwRKJu7qqNriYLt+wd7HmcD328R2UlTR8L3ZM0xOq
 qxBHnzbFtQyGR7NAudi2pStqwctPhFP6vRz1aJvt+w9tmbeKAWQWMd2pNvG8GhJm
 nxYFGyPLzTgPifK5SELCNIW4WXf8rnrRNgZ+Ph/JIGuhp+603//ATHRlVEwHcnl+
 M0GRbL06nWFVvfdKCYuu0autb9sW5T/vq02cbE5vRVVaziazry8S8EmxYQyOg9X/
 CBAd1XTybVZki9VkIP5zbdvWJL3LhFfsabBFy7TPZor/YCJQDvxzw1iwtY/BPVDT
 MryHjrYwH/n5RvibANRcTbCamMQY4IrJ4X3afJGgh7BK5N5C5ug4HYJ7oG5QB++x
 xC4A5x3L6D9SE/St8hFWghjYcd6lFcjlz1wJ5MyLImwYqfr8DnY=
 =Vt0s
 -----END PGP SIGNATURE-----

Merge 6.1.71 into android14-6.1-lts

Changes in 6.1.71
	ksmbd: replace one-element arrays with flexible-array members
	ksmbd: set SMB2_SESSION_FLAG_ENCRYPT_DATA when enforcing data encryption for this share
	ksmbd: use F_SETLK when unlocking a file
	ksmbd: Fix resource leak in smb2_lock()
	ksmbd: Convert to use sysfs_emit()/sysfs_emit_at() APIs
	ksmbd: Implements sess->rpc_handle_list as xarray
	ksmbd: fix typo, syncronous->synchronous
	ksmbd: Remove duplicated codes
	ksmbd: update Kconfig to note Kerberos support and fix indentation
	ksmbd: Fix spelling mistake "excceed" -> "exceeded"
	ksmbd: Fix parameter name and comment mismatch
	ksmbd: remove unused is_char_allowed function
	ksmbd: delete asynchronous work from list
	ksmbd: set NegotiateContextCount once instead of every inc
	ksmbd: avoid duplicate negotiate ctx offset increments
	ksmbd: remove unused compression negotiate ctx packing
	fs: introduce lock_rename_child() helper
	ksmbd: fix racy issue from using ->d_parent and ->d_name
	ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()
	ksmbd: fix uninitialized pointer read in smb2_create_link()
	ksmbd: call putname after using the last component
	ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
	ksmbd: add mnt_want_write to ksmbd vfs functions
	ksmbd: remove unused ksmbd_tree_conn_share function
	ksmbd: use kzalloc() instead of __GFP_ZERO
	ksmbd: return a literal instead of 'err' in ksmbd_vfs_kern_path_locked()
	ksmbd: Change the return value of ksmbd_vfs_query_maximal_access to void
	ksmbd: use kvzalloc instead of kvmalloc
	ksmbd: Replace the ternary conditional operator with min()
	ksmbd: Use struct_size() helper in ksmbd_negotiate_smb_dialect()
	ksmbd: Replace one-element array with flexible-array member
	ksmbd: Fix unsigned expression compared with zero
	ksmbd: check if a mount point is crossed during path lookup
	ksmbd: switch to use kmemdup_nul() helper
	ksmbd: add support for read compound
	ksmbd: fix wrong interim response on compound
	ksmbd: fix `force create mode' and `force directory mode'
	ksmbd: Fix one kernel-doc comment
	ksmbd: add missing calling smb2_set_err_rsp() on error
	ksmbd: remove experimental warning
	ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
	ksmbd: fix passing freed memory 'aux_payload_buf'
	ksmbd: return invalid parameter error response if smb2 request is invalid
	ksmbd: check iov vector index in ksmbd_conn_write()
	ksmbd: fix race condition with fp
	ksmbd: fix race condition from parallel smb2 logoff requests
	ksmbd: fix race condition from parallel smb2 lock requests
	ksmbd: fix race condition between tree conn lookup and disconnect
	ksmbd: fix wrong error response status by using set_smb2_rsp_status()
	ksmbd: fix Null pointer dereferences in ksmbd_update_fstate()
	ksmbd: fix potential double free on smb2_read_pipe() error path
	ksmbd: Remove unused field in ksmbd_user struct
	ksmbd: reorganize ksmbd_iov_pin_rsp()
	ksmbd: fix kernel-doc comment of ksmbd_vfs_setxattr()
	ksmbd: fix recursive locking in vfs helpers
	ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()
	ksmbd: add support for surrogate pair conversion
	ksmbd: no need to wait for binded connection termination at logoff
	ksmbd: fix kernel-doc comment of ksmbd_vfs_kern_path_locked()
	ksmbd: prevent memory leak on error return
	ksmbd: fix possible deadlock in smb2_open
	ksmbd: separately allocate ci per dentry
	ksmbd: move oplock handling after unlock parent dir
	ksmbd: release interim response after sending status pending response
	ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId
	ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error
	ksmbd: set epoch in create context v2 lease
	ksmbd: set v2 lease capability
	ksmbd: downgrade RWH lease caching state to RH for directory
	ksmbd: send v2 lease break notification for directory
	ksmbd: lazy v2 lease break on smb2_write()
	ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
	ksmbd: fix wrong allocation size update in smb2_open()
	ARM: dts: Fix occasional boot hang for am3 usb
	usb: fotg210-hcd: delete an incorrect bounds test
	spi: Introduce spi_get_device_match_data() helper
	iio: imu: adis16475: add spi_device_id table
	nfsd: separate nfsd_last_thread() from nfsd_put()
	nfsd: call nfsd_last_thread() before final nfsd_put()
	linux/export: Ensure natural alignment of kcrctab array
	spi: Reintroduce spi_set_cs_timing()
	spi: Add APIs in spi core to set/get spi->chip_select and spi->cs_gpiod
	spi: atmel: Fix clock issue when using devices with different polarities
	block: renumber QUEUE_FLAG_HW_WC
	ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
	platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
	mm/filemap: avoid buffered read/write race to read inconsistent data
	mm: migrate high-order folios in swap cache correctly
	mm/memory-failure: cast index to loff_t before shifting it
	mm/memory-failure: check the mapcount of the precise page
	ring-buffer: Fix wake ups when buffer_percent is set to 100
	tracing: Fix blocked reader of snapshot buffer
	ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
	netfilter: nf_tables: skip set commit for deleted/destroyed sets
	ring-buffer: Fix slowpath of interrupted event
	NFSD: fix possible oops when nfsd/pool_stats is closed.
	spi: Constify spi parameters of chip select APIs
	device property: Allow const parameter to dev_fwnode()
	kallsyms: Make module_kallsyms_on_each_symbol generally available
	tracing/kprobes: Fix symbol counting logic by looking at modules as well
	Revert "platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe"
	Linux 6.1.71

Change-Id: I7bc16d981b90e8e0b633628438f79fce898ad15a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2024-01-14 11:21:18 +00:00
David Howells
bceff380f3 mm: merge folio_has_private()/filemap_release_folio() call pairs
[ Upstream commit 0201ebf274a306a6ebb95e5dc2d6a0a27c737cac ]

Patch series "mm, netfs, fscache: Stop read optimisation when folio
removed from pagecache", v7.

This fixes an optimisation in fscache whereby we don't read from the cache
for a particular file until we know that there's data there that we don't
have in the pagecache.  The problem is that I'm no longer using PG_fscache
(aka PG_private_2) to indicate that the page is cached and so I don't get
a notification when a cached page is dropped from the pagecache.

The first patch merges some folio_has_private() and
filemap_release_folio() pairs and introduces a helper,
folio_needs_release(), to indicate if a release is required.

The second patch is the actual fix.  Following Willy's suggestions[1], it
adds an AS_RELEASE_ALWAYS flag to an address_space that will make
filemap_release_folio() always call ->release_folio(), even if
PG_private/PG_private_2 aren't set.  folio_needs_release() is altered to
add a check for this.

This patch (of 2):

Make filemap_release_folio() check folio_has_private().  Then, in most
cases, where a call to folio_has_private() is immediately followed by a
call to filemap_release_folio(), we can get rid of the test in the pair.

There are a couple of sites in mm/vscan.c that this can't so easily be
done.  In shrink_folio_list(), there are actually three cases (something
different is done for incompletely invalidated buffers), but
filemap_release_folio() elides two of them.

In shrink_active_list(), we don't have have the folio lock yet, so the
check allows us to avoid locking the page unnecessarily.

A wrapper function to check if a folio needs release is provided for those
places that still need to do it in the mm/ directory.  This will acquire
additional parts to the condition in a future patch.

After this, the only remaining caller of folio_has_private() outside of
mm/ is a check in fuse.

Link: https://lkml.kernel.org/r/20230628104852.3391651-1-dhowells@redhat.com
Link: https://lkml.kernel.org/r/20230628104852.3391651-2-dhowells@redhat.com
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Cc: Dave Wysochanski <dwysocha@redhat.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 1898efcdbed3 ("block: update the stable_writes flag in bdev_add")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10 17:10:31 +01:00
Vishal Moola (Oracle)
8b6b3ecf0c memory-failure: convert truncate_error_page() to use folio
[ Upstream commit ac5efa782041670b63a05c36d92d02a80e50bb63 ]

Replace try_to_release_page() with filemap_release_folio().  This change
is in preparation for the removal of the try_to_release_page() wrapper.

Link: https://lkml.kernel.org/r/20221118073055.55694-4-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 1898efcdbed3 ("block: update the stable_writes flag in bdev_add")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10 17:10:31 +01:00
Matthew Wilcox (Oracle)
4ee9d9291b mm/memory-failure: check the mapcount of the precise page
commit c79c5a0a00a9457718056b588f312baadf44e471 upstream.

A process may map only some of the pages in a folio, and might be missed
if it maps the poisoned page but not the head page.  Or it might be
unnecessarily hit if it maps the head page, but not the poisoned page.

Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org
Fixes: 7af446a841 ("HWPOISON, hugetlb: enable error handling path for hugepage")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-05 15:18:39 +01:00
Matthew Wilcox (Oracle)
fb21c9780a mm/memory-failure: cast index to loff_t before shifting it
commit 39ebd6dce62d8cfe3864e16148927a139f11bc9a upstream.

On 32-bit systems, we'll lose the top bits of index because arithmetic
will be performed in unsigned long instead of unsigned long long.  This
affects files over 4GB in size.

Link: https://lkml.kernel.org/r/20231218135837.3310403-4-willy@infradead.org
Fixes: 6100e34b25 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-05 15:18:39 +01:00
Greg Kroah-Hartman
0910193fd6 Merge 6.1.50 into android14-6.1-lts
Changes in 6.1.50
	NFSv4.2: fix error handling in nfs42_proc_getxattr
	NFSv4: fix out path in __nfs4_get_acl_uncached
	xprtrdma: Remap Receive buffers after a reconnect
	drm/ast: Use drm_aperture_remove_conflicting_pci_framebuffers
	fbdev/radeon: use pci aperture helpers
	drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
	drm/aperture: Remove primary argument
	video/aperture: Only kick vgacon when the pdev is decoding vga
	video/aperture: Move vga handling to pci function
	PCI: acpiphp: Reassign resources on bridge if necessary
	MIPS: cpu-features: Enable octeon_cache by cpu_type
	MIPS: cpu-features: Use boot_cpu_type for CPU type based features
	jbd2: remove t_checkpoint_io_list
	jbd2: remove journal_clean_one_cp_list()
	jbd2: fix a race when checking checkpoint buffer busy
	can: raw: fix receiver memory leak
	can: raw: fix lockdep issue in raw_release()
	s390/zcrypt: remove unnecessary (void *) conversions
	s390/zcrypt: fix reply buffer calculations for CCA replies
	drm/i915: Add the gen12_needs_ccs_aux_inv helper
	drm/i915/gt: Ensure memory quiesced before invalidation
	drm/i915/gt: Poll aux invalidation register bit on invalidation
	drm/i915/gt: Support aux invalidation on all engines
	tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
	tracing: Fix memleak due to race between current_tracer and trace
	octeontx2-af: SDP: fix receive link config
	devlink: move code to a dedicated directory
	devlink: add missing unregister linecard notification
	net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates
	sock: annotate data-races around prot->memory_pressure
	dccp: annotate data-races in dccp_poll()
	ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
	mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC
	mlxsw: reg: Fix SSPR register layout
	mlxsw: Fix the size of 'VIRT_ROUTER_MSB'
	selftests: mlxsw: Fix test failure on Spectrum-4
	net: dsa: mt7530: fix handling of 802.1X PAE frames
	net: bgmac: Fix return value check for fixed_phy_register()
	net: bcmgenet: Fix return value check for fixed_phy_register()
	net: validate veth and vxcan peer ifindexes
	ipv4: fix data-races around inet->inet_id
	ice: fix receive buffer size miscalculation
	Revert "ice: Fix ice VF reset during iavf initialization"
	ice: Fix NULL pointer deref during VF reset
	selftests: bonding: do not set port down before adding to bond
	can: isotp: fix support for transmission of SF without flow control
	igb: Avoid starting unnecessary workqueues
	igc: Fix the typo in the PTM Control macro
	net/sched: fix a qdisc modification with ambiguous command request
	i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
	netfilter: nf_tables: flush pending destroy work before netlink notifier
	netfilter: nf_tables: fix out of memory error handling
	rtnetlink: Reject negative ifindexes in RTM_NEWLINK
	bonding: fix macvlan over alb bond support
	KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated
	KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
	io_uring: get rid of double locking
	io_uring: extract a io_msg_install_complete helper
	io_uring/msg_ring: move double lock/unlock helpers higher up
	io_uring/msg_ring: fix missing lock on overflow for IOPOLL
	ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
	ASoC: cs35l41: Correct amp_gain_tlv values
	ibmveth: Use dcbf rather than dcbfl
	wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
	platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
	NFSv4: Fix dropped lock for racing OPEN and delegation return
	clk: Fix slab-out-of-bounds error in devm_clk_release()
	mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
	shmem: fix smaps BUG sleeping while atomic
	ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
	mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
	mm: add a call to flush_cache_vmap() in vmap_pfn()
	mm: memory-failure: fix unexpected return value in soft_offline_page()
	NFS: Fix a use after free in nfs_direct_join_group()
	nfsd: Fix race to FREE_STATEID and cl_revoked
	selinux: set next pointer before attaching to list
	batman-adv: Trigger events for auto adjusted MTU
	batman-adv: Don't increase MTU when set by user
	batman-adv: Do not get eth header before batadv_check_management_packet
	batman-adv: Fix TT global entry leak when client roamed back
	batman-adv: Fix batadv_v_ogm_aggr_send memory leak
	batman-adv: Hold rtnl lock during MTU update via netlink
	lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
	riscv: Handle zicsr/zifencei issue between gcc and binutils
	riscv: Fix build errors using binutils2.37 toolchains
	radix tree: remove unused variable
	of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
	of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
	pinctrl: amd: Mask wake bits on probe again
	media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
	PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
	drm/vmwgfx: Fix shader stage validation
	drm/i915/dgfx: Enable d3cold at s2idle
	drm/display/dp: Fix the DP DSC Receiver cap size
	x86/fpu: Invalidate FPU state correctly on exec()
	x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
	hwmon: (aquacomputer_d5next) Add selective 200ms delay after sending ctrl report
	selftests/net: mv bpf/nat6to4.c to net folder
	nfs: use vfs setgid helper
	nfsd: use vfs setgid helper
	cgroup/cpuset: Rename functions dealing with DEADLINE accounting
	sched/cpuset: Bring back cpuset_mutex
	sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
	cgroup/cpuset: Iterate only if DEADLINE tasks are present
	sched/deadline: Create DL BW alloc, free & check overflow interface
	cgroup/cpuset: Free DL BW in case can_attach() fails
	thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
	ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
	can: raw: add missing refcount for memory leak fix
	madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
	scsi: snic: Fix double free in snic_tgt_create()
	scsi: core: raid_class: Remove raid_component_add()
	clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
	pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
	pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
	pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
	dma-buf/sw_sync: Avoid recursive lock during fence signal
	gpio: sim: dispose of irq mappings before destroying the irq_sim domain
	gpio: sim: pass the GPIO device's software node to irq domain
	ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
	maple_tree: disable mas_wr_append() when other readers are possible
	ASoC: amd: vangogh: select CONFIG_SND_AMD_ACP_CONFIG
	Linux 6.1.50

Change-Id: I9b8e3da5baa106b08b2b90974c19128141817580
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-18 09:52:46 +00:00
Miaohe Lin
bdc544a87d mm: memory-failure: fix unexpected return value in soft_offline_page()
commit e2c1ab070fdc81010ec44634838d24fce9ff9e53 upstream.

When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.

Consequently the user will think soft_offline_page succeeds while it in
fact failed.  So the user will not try again later in this case.

Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822d ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:06 +02:00
Suren Baghdasaryan
3c187b4a12 BACKPORT: FROMGIT: mm: enable page walking API to lock vmas during the walk
walk_page_range() and friends often operate under write-locked mmap_lock.
With introduction of vma locks, the vmas have to be locked as well during
such walks to prevent concurrent page faults in these areas.  Add an
additional member to mm_walk_ops to indicate locking requirements for the
walk.

The change ensures that page walks which prevent concurrent page faults
by write-locking mmap_lock, operate correctly after introduction of
per-vma locks.  With per-vma locks page faults can be handled under vma
lock without taking mmap_lock at all, so write locking mmap_lock would
not stop them.  The change ensures vmas are properly locked during such
walks.

A sample issue this solves is do_mbind() performing queue_pages_range()
to queue pages for migration.  Without this change a concurrent page
can be faulted into the area and be left out of migration.

Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Suggested-by: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 2ebc368f59eedcef0de7c832fe1d62935cd3a7ff
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
[surenb: changed locking in break_ksm since it's done differently,
skipped the change in the missing __ksm_del_vma(),  skipped the change in
the missing walk_page_range_vma(), removed unused local variables]

Bug: 293665307
Change-Id: Iede9eaa950ea59a268a2e74a8d3022162f0bbd80
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-08-16 16:55:02 +00:00
Naoya Horiguchi
deab8114fb mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
commit 6da6b1d4a7df8c35770186b53ef65d388398e139 upstream.

After a memory error happens on a clean folio, a process unexpectedly
receives SIGBUS when it accesses the error page.  This SIGBUS killing is
pointless and simply degrades the level of RAS of the system, because the
clean folio can be dropped without any data lost on memory error handling
as we do for a clean pagecache.

When memory_failure() is called on a clean folio, try_to_unmap() is called
twice (one from split_huge_page() and one from hwpoison_user_mappings()).
The root cause of the issue is that pte conversion to hwpoisoned entry is
now done in the first call of try_to_unmap() because PageHWPoison is
already set at this point, while it's actually expected to be done in the
second call.  This behavior disturbs the error handling operation like
removing pagecache, which results in the malfunction described above.

So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
when we really intend to convert pte to hwpoison entry.  This can prevent
other callers of try_to_unmap() from accidentally converting to hwpoison
entries.

Link: https://lkml.kernel.org/r/20230221085905.1465385-1-naoya.horiguchi@linux.dev
Fixes: a42634a6c0 ("readahead: Use a folio in read_pages()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:34:25 +01:00
James Houghton
8625147caf hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.

Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache.  That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned.  As [1] states,
this is effectively memory corruption.

The fix is to leave the page in the page cache.  If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses.  For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.

[1]: commit a760542666 ("mm: shmem: don't truncate page if memory failure happens")

Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:22 -08:00
Matthew Wilcox (Oracle)
0c826c0b6a rmap: remove page_unlock_anon_vma_read()
This was simply an alias for anon_vma_unlock_read() since 2011.

Link: https://lkml.kernel.org/r/20220902194653.1739778-56-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:02:54 -07:00
Peter Xu
0d206b5d2e mm/swap: add swp_offset_pfn() to fetch PFN from swap entry
We've got a bunch of special swap entries that stores PFN inside the swap
offset fields.  To fetch the PFN, normally the user just calls
swp_offset() assuming that'll be the PFN.

Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the
max possible length of a PFN on the host, meanwhile doing proper check
with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the
PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry().

One reason to do so is we never tried to sanitize whether swap offset can
really fit for storing PFN.  At the meantime, this patch also prepares us
with the future possibility to store more information inside the swp
offset field, so assuming "swp_offset(entry)" to be the PFN will not stand
any more very soon.

Replace many of the swp_offset() callers to use swp_offset_pfn() where
proper.  Note that many of the existing users are not candidates for the
replacement, e.g.:

  (1) When the swap entry is not a pfn swap entry at all, or,
  (2) when we wanna keep the whole swp_offset but only change the swp type.

For the latter, it can happen when fork() triggered on a write-migration
swap entry pte, we may want to only change the migration type from
write->read but keep the rest, so it's not "fetching PFN" but "changing
swap type only".  They're left aside so that when there're more
information within the swp offset they'll be carried over naturally in
those cases.

Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what
the new swp_offset_pfn() is about.

Link: https://lkml.kernel.org/r/20220811161331.37055-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:05 -07:00
Miaohe Lin
9cf2819159 mm, hwpoison: cleanup some obsolete comments
1.Remove meaningless comment in kill_proc(). That doesn't tell anything.
2.Fix the wrong function name get_hwpoison_unless_zero(). It should be
get_page_unless_zero().
3.The gate keeper for free hwpoison page has moved to check_new_page().
Update the corresponding comment.

Link: https://lkml.kernel.org/r/20220830123604.25763-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:04 -07:00
Miaohe Lin
b680dae9a8 mm, hwpoison: check PageTable() explicitly in hwpoison_user_mappings()
PageTable can't be handled by memory_failure(). Filter it out explicitly in
hwpoison_user_mappings(). This will also make code more consistent with the
relevant check in unpoison_memory().

Link: https://lkml.kernel.org/r/20220830123604.25763-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:04 -07:00
Miaohe Lin
36537a67d3 mm, hwpoison: avoid unneeded page_mapped_in_vma() overhead in collect_procs_anon()
If vma->vm_mm != t->mm, there's no need to call page_mapped_in_vma() as
add_to_kill() won't be called in this case. Move up the mm check to avoid
possible unneeded calling to page_mapped_in_vma().

Link: https://lkml.kernel.org/r/20220830123604.25763-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:04 -07:00
Miaohe Lin
21c9e90ab9 mm, hwpoison: use num_poisoned_pages_sub() to decrease num_poisoned_pages
Use num_poisoned_pages_sub() to combine multiple atomic ops into one. Also
num_poisoned_pages_dec() can be killed as there's no caller now.

Link: https://lkml.kernel.org/r/20220830123604.25763-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:04 -07:00
Miaohe Lin
da29499124 mm, hwpoison: use __PageMovable() to detect non-lru movable pages
It's more recommended to use __PageMovable() to detect non-lru movable
pages. We can avoid bumping page refcnt via isolate_movable_page() for
the isolated lru pages. Also if pages become PageLRU just after they're
checked but before trying to isolate them, isolate_lru_page() will be
called to do the right work.

[linmiaohe@huawei.com: fixes per Naoya Horiguchi]
  Link: https://lkml.kernel.org/r/1f7ee86e-7d28-0d8c-e0de-b7a5a94519e8@huawei.com
Link: https://lkml.kernel.org/r/20220830123604.25763-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:03 -07:00
Miaohe Lin
2fe62e2226 mm, hwpoison: use ClearPageHWPoison() in memory_failure()
Patch series "A few cleanup patches for memory-failure".

his series contains a few cleanup patches to use __PageMovable() to detect
non-lru movable pages, use num_poisoned_pages_sub() to reduce multiple
atomic ops overheads and so on.  More details can be found in the
respective changelogs.


This patch (of 6):

Use ClearPageHWPoison() instead of TestClearPageHWPoison() to clear page
hwpoison flags to avoid unneeded full memory barrier overhead.

Link: https://lkml.kernel.org/r/20220830123604.25763-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220830123604.25763-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:03 -07:00
Andrew Morton
6d751329e7 Merge branch 'mm-hotfixes-stable' into mm-stable 2022-09-26 13:13:15 -07:00
Shuai Xue
77677cdbc2 mm,hwpoison: check mm when killing accessing process
The GHES code calls memory_failure_queue() from IRQ context to queue work
into workqueue and schedule it on the current CPU.  Then the work is
processed in memory_failure_work_func() by kworker and calls
memory_failure().

When a page is already poisoned, commit a3f5d80ea4 ("mm,hwpoison: send
SIGBUS with error virutal address") make memory_failure() call
kill_accessing_process() that:

    - holds mmap locking of current->mm
    - does pagetable walk to find the error virtual address
    - and sends SIGBUS to the current process with error info.

However, the mm of kworker is not valid, resulting in a null-pointer
dereference.  So check mm when killing the accessing process.

[akpm@linux-foundation.org: remove unrelated whitespace alteration]
Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
Fixes: a3f5d80ea4 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Bixuan Cui <cuibixuan@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 12:14:34 -07:00
Kefeng Wang
48309e1f6f mm: memory-failure: kill __soft_offline_page()
Squash the __soft_offline_page() into soft_offline_in_use_page() and kill
__soft_offline_page().

[wangkefeng.wang@huawei.com: update hpage when try_to_split_thp_page() succeeds]
  Link: https://lkml.kernel.org/r/20220830104654.28234-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20220819033402.156519-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:58 -07:00
Kefeng Wang
7adb45887c mm: memory-failure: kill soft_offline_free_page()
Open-code the page_handle_poison() into soft_offline_page() and kill
unneeded soft_offline_free_page().

Link: https://lkml.kernel.org/r/20220819033402.156519-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:58 -07:00
Miaohe Lin
e9ff3ba7ff mm, hwpoison: avoid trying to unpoison reserved page
For reserved pages, HWPoison flag will be set without increasing the page
refcnt.  So we shouldn't even try to unpoison these pages and thus
decrease the page refcnt unexpectly.  Add a PageReserved() check to filter
this case out and remove the below unneeded zero page (zero page is
reserved) check.

Link: https://lkml.kernel.org/r/20220818130016.45313-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:58 -07:00
Miaohe Lin
0792a4a619 mm, hwpoison: kill procs if unmap fails
If try_to_unmap() fails, the hwpoisoned page still resides in the address
space of some processes.  We should kill these processes or the hwpoisoned
page might be consumed later.  collect_procs() is always called to collect
relevant processes now so they can be killed later if unmap fails.

[linmiaohe@huawei.com: v2]
  Link: https://lkml.kernel.org/r/20220823032346.4260-6-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220818130016.45313-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:57 -07:00
Miaohe Lin
54f9555d40 mm, hwpoison: fix possible use-after-free in mf_dax_kill_procs()
After kill_procs(), tk will be freed without being removed from the
to_kill list.  In the next iteration, the freed list entry in the to_kill
list will be accessed, thus leading to use-after-free issue.  Adding
list_del() in kill_procs() to fix the issue.

Link: https://lkml.kernel.org/r/20220823032346.4260-5-linmiaohe@huawei.com
Fixes: c36e202495 ("mm: introduce mf_dax_kill_procs() for fsdax case")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:57 -07:00
Miaohe Lin
12f1dbcf8f mm, hwpoison: fix extra put_page() in soft_offline_page()
When hwpoison_filter() refuses to soft offline a page, the page refcnt
incremented previously by MF_COUNT_INCREASED would have been consumed via
get_hwpoison_page() if ret <= 0.  So the put_ref_page() here will put the
extra one.  Remove it to fix the issue.

Link: https://lkml.kernel.org/r/20220818130016.45313-4-linmiaohe@huawei.com
Fixes: 9113eaf331 ("mm/memory-failure.c: add hwpoison_filter for soft offline")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:57 -07:00
Miaohe Lin
6bbabd041d mm, hwpoison: fix page refcnt leaking in unpoison_memory()
When free_raw_hwp_pages() fails its work, the refcnt of the hugetlb page
would have been incremented if ret > 0.  Using put_page() to fix refcnt
leaking in this case.

Link: https://lkml.kernel.org/r/20220818130016.45313-3-linmiaohe@huawei.com
Fixes: debb6b9c3fdd ("mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:57 -07:00
Miaohe Lin
f36a5543a7 mm, hwpoison: fix page refcnt leaking in try_memory_failure_hugetlb()
Patch series "A few fixup patches for memory-failure", v2.

This series contains a few fixup patches to fix incorrect update of page
refcnt, fix possible use-after-free issue and so on.  More details can be
found in the respective changelogs.

This patch (of 6):

When hwpoison_filter() refuses to hwpoison a hugetlb page, the refcnt of
the page would have been incremented if res == 1.  Using put_page() to fix
the refcnt leaking in this case.

Link: https://lkml.kernel.org/r/20220823032346.4260-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220818130016.45313-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220818130016.45313-2-linmiaohe@huawei.com
Fixes: 405ce05123 ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:57 -07:00
Kefeng Wang
2ace36f0f5 mm: memory-failure: cleanup try_to_split_thp_page()
Since commit 5d1fd5dc87 ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP"),
the action_result(,MF_MSG_UNSPLIT_THP,) called to show memory error event
in memory_failure(), so the pr_info() in try_to_split_thp_page() is only
needed in soft_offline_in_use_page().

Meanwhile this could also fix the unexpected prefix for "thp split failed"
due to commit 96f96763de ("mm: memory-failure: convert to pr_fmt()").

Link: https://lkml.kernel.org/r/20220809111813.139690-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:48 -07:00
Dan Williams
ac87ca0ea0 mm/memory-failure: fall back to vma_address() when ->notify_failure() fails
In the case where a filesystem is polled to take over the memory failure
and receives -EOPNOTSUPP it indicates that page->index and page->mapping
are valid for reverse mapping the failure address.  Introduce
FSDAX_INVALID_PGOFF to distinguish when add_to_kill() is being called from
mf_dax_kill_procs() by a filesytem vs the typical memory_failure() path.

Otherwise, vma_pgoff_address() is called with an invalid fsdax_pgoff which
then trips this failing signature:

 kernel BUG at mm/memory-failure.c:319!
 invalid opcode: 0000 [#1] PREEMPT SMP PTI
 CPU: 13 PID: 1262 Comm: dax-pmd Tainted: G           OE    N 6.0.0-rc2+ #62
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 RIP: 0010:add_to_kill.cold+0x19d/0x209
 [..]
 Call Trace:
  <TASK>
  collect_procs.part.0+0x2c4/0x460
  memory_failure+0x71b/0xba0
  ? _printk+0x58/0x73
  do_madvise.part.0.cold+0xaf/0xc5

Link: https://lkml.kernel.org/r/166153429427.2758201.14605968329933175594.stgit@dwillia2-xfh.jf.intel.com
Fixes: c36e202495 ("mm: introduce mf_dax_kill_procs() for fsdax case")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 16:22:30 -07:00
Dan Williams
65d3440e8d mm/memory-failure: fix detection of memory_failure() handlers
Some pagemap types, like MEMORY_DEVICE_GENERIC (device-dax) do not even
have pagemap ops which results in crash signatures like this:

  BUG: kernel NULL pointer dereference, address: 0000000000000010
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8000000205073067 P4D 8000000205073067 PUD 2062b3067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 22 PID: 4535 Comm: device-dax Tainted: G           OE    N 6.0.0-rc2+ #59
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:memory_failure+0x667/0xba0
 [..]
  Call Trace:
   <TASK>
   ? _printk+0x58/0x73
   do_madvise.part.0.cold+0xaf/0xc5

Check for ops before checking if the ops have a memory_failure()
handler.

Link: https://lkml.kernel.org/r/166153428781.2758201.1990616683438224741.stgit@dwillia2-xfh.jf.intel.com
Fixes: 33a8f7f2b3 ("pagemap,pmem: introduce ->memory_failure()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 16:22:29 -07:00
Naoya Horiguchi
6f4614886b mm, hwpoison: enable memory error handling on 1GB hugepage
Now error handling code is prepared, so remove the blocking code and
enable memory error handling on 1GB hugepage.

Link: https://lkml.kernel.org/r/20220714042420.1847125-9-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:44 -07:00
Naoya Horiguchi
ceaf8fbea7 mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
Currently if memory_failure() (modified to remove blocking code with
subsequent patch) is called on a page in some 1GB hugepage, memory error
handling fails and the raw error page gets into leaked state.  The impact
is small in production systems (just leaked single 4kB page), but this
limits the testability because unpoison doesn't work for it.  We can no
longer create 1GB hugepage on the 1GB physical address range with such
leaked pages, that's not useful when testing on small systems.

When a hwpoison page in a 1GB hugepage is handled, it's caught by the
PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
broken down into raw error pages before coming to this point:

        if (unlikely(PageHWPoison(page)) && !order) {
                ...
                return false;
        }

Then, the page is not sent to buddy and the page refcount is left 0.

Originally this check is supposed to work when the error page is freed
from page_handle_poison() (that is called from soft-offline), but now we
are opening another path to call it, so the callers of
__page_handle_poison() need to handle the case by considering the return
value 0 as success.  Then page refcount for hwpoison is properly
incremented so unpoison works.

Link: https://lkml.kernel.org/r/20220714042420.1847125-8-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:44 -07:00
Naoya Horiguchi
7453bf621c mm, hwpoison: make __page_handle_poison returns int
__page_handle_poison() returns bool that shows whether
take_page_off_buddy() has passed or not now.  But we will want to
distinguish another case of "dissolve has passed but taking off failed" by
its return value.  So change the type of the return value.  No functional
change.

Link: https://lkml.kernel.org/r/20220714042420.1847125-7-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:44 -07:00
Naoya Horiguchi
38f6d29397 mm, hwpoison: set PG_hwpoison for busy hugetlb pages
If memory_failure() fails to grab page refcount on a hugetlb page because
it's busy, it returns without setting PG_hwpoison on it.  This not only
loses a chance of error containment, but breaks the rule that
action_result() should be called only when memory_failure() do any of
handling work (even if that's just setting PG_hwpoison).  This
inconsistency could harm code maintainability.

So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

Link: https://lkml.kernel.org/r/20220714042420.1847125-6-naoya.horiguchi@linux.dev
Fixes: 405ce05123 ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:44 -07:00
Naoya Horiguchi
ac5fcde0a9 mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
Raw error info list needs to be removed when hwpoisoned hugetlb is
unpoisoned.  And unpoison handler needs to know how many errors there are
in the target hugepage.  So add them.

HPageVmemmapOptimized(hpage) and HPageRawHwpUnreliable(hpage)) sometimes
can't be unpoisoned, so skip them.

Link: https://lkml.kernel.org/r/20220714042420.1847125-5-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:44 -07:00
Naoya Horiguchi
161df60e9e mm, hwpoison, hugetlb: support saving mechanism of raw error pages
When handling memory error on a hugetlb page, the error handler tries to
dissolve and turn it into 4kB pages.  If it's successfully dissolved,
PageHWPoison flag is moved to the raw error page, so that's all right. 
However, dissolve sometimes fails, then the error page is left as
hwpoisoned hugepage.  It's useful if we can retry to dissolve it to save
healthy pages, but that's not possible now because the information about
where the raw error pages is lost.

Use the private field of a few tail pages to keep that information.  The
code path of shrinking hugepage pool uses this info to try delayed
dissolve.  In order to remember multiple errors in a hugepage, a
singly-linked list originated from SUBPAGE_INDEX_HWPOISON-th tail page is
constructed.  Only simple operations (adding an entry or clearing all) are
required and the list is assumed not to be very long, so this simple data
structure should be enough.

If we failed to save raw error info, the hwpoison hugepage has errors on
unknown subpage, then this new saving mechanism does not work any more, so
disable saving new raw error info and freeing hwpoison hugepages.

Link: https://lkml.kernel.org/r/20220714042420.1847125-4-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:44 -07:00
Linus Torvalds
6614a3c316 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Linus Torvalds
f00654007f Folio changes for 6.0
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
    when running xfstests
 
  - Convert more of mpage to use folios
 
  - Remove add_to_page_cache() and add_to_page_cache_locked()
 
  - Convert find_get_pages_range() to filemap_get_folios()
 
  - Improvements to the read_cache_page() family of functions
 
  - Remove a few unnecessary checks of PageError
 
  - Some straightforward filesystem conversions to use folios
 
  - Split PageMovable users out from address_space_operations into their
    own movable_operations
 
  - Convert aops->migratepage to aops->migrate_folio
 
  - Remove nobh support (Christoph Hellwig)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp
 gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha
 mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS
 7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT
 /58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z
 C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M
 Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q==
 =DgUi
 -----END PGP SIGNATURE-----

Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
   when running xfstests

 - Convert more of mpage to use folios

 - Remove add_to_page_cache() and add_to_page_cache_locked()

 - Convert find_get_pages_range() to filemap_get_folios()

 - Improvements to the read_cache_page() family of functions

 - Remove a few unnecessary checks of PageError

 - Some straightforward filesystem conversions to use folios

 - Split PageMovable users out from address_space_operations into
   their own movable_operations

 - Convert aops->migratepage to aops->migrate_folio

 - Remove nobh support (Christoph Hellwig)

* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
  fs: remove the NULL get_block case in mpage_writepages
  fs: don't call ->writepage from __mpage_writepage
  fs: remove the nobh helpers
  jfs: stop using the nobh helper
  ext2: remove nobh support
  ntfs3: refactor ntfs_writepages
  mm/folio-compat: Remove migration compatibility functions
  fs: Remove aops->migratepage()
  secretmem: Convert to migrate_folio
  hugetlb: Convert to migrate_folio
  aio: Convert to migrate_folio
  f2fs: Convert to filemap_migrate_folio()
  ubifs: Convert to filemap_migrate_folio()
  btrfs: Convert btrfs_migratepage to migrate_folio
  mm/migrate: Add filemap_migrate_folio()
  mm/migrate: Convert migrate_page() to migrate_folio()
  nfs: Convert to migrate_folio
  btrfs: Convert btree_migratepage to migrate_folio
  mm/migrate: Convert expected_page_refs() to folio_expected_refs()
  mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
  ...
2022-08-03 10:35:43 -07:00
Kefeng Wang
96f96763de mm: memory-failure: convert to pr_fmt()
Use pr_fmt to prefix all pr_<level> output, but unpoison_memory() and
soft_offline_page() are used by error injection, which have own prefixes
like "Unpoison:" and "soft offline:", meanwhile, soft_offline_page() could
be used by memory hotremove, so reset pr_fmt before unpoison_pr_info
definition to keep the original output for them.

[wangkefeng.wang@huawei.com: v3]
  Link: https://lkml.kernel.org/r/20220729031919.72331-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20220726081046.10742-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 18:07:20 -07:00
Shiyang Ruan
c36e202495 mm: introduce mf_dax_kill_procs() for fsdax case
This new function is a variant of mf_generic_kill_procs that accepts a
file, offset pair instead of a struct to support multiple files sharing a
DAX mapping.  It is intended to be called by the file systems as part of
the memory_failure handler after the file system performed a reverse
mapping from the storage address to the file and file offset.

Link: https://lkml.kernel.org/r/20220603053738.1218681-6-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:30 -07:00
Shiyang Ruan
33a8f7f2b3 pagemap,pmem: introduce ->memory_failure()
When memory-failure occurs, we call this function which is implemented by
each kind of devices.  For the fsdax case, pmem device driver implements
it.  Pmem device driver will find out the filesystem in which the
corrupted page located in.

With dax_holder notify support, we are able to notify the memory failure
from pmem driver to upper layers.  If there is something not support in
the notify routine, memory_failure will fall back to the generic hanlder.

Link: https://lkml.kernel.org/r/20220603053738.1218681-4-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:30 -07:00
Shiyang Ruan
00cc790e00 mm: factor helpers for memory_failure_dev_pagemap
memory_failure_dev_pagemap code is a bit complex before introduce RMAP
feature for fsdax.  So it is needed to factor some helper functions to
simplify these code.

[akpm@linux-foundation.org: fix CONFIG_HUGETLB_PAGE=n build]
[zhengbin13@huawei.com: fix redefinition of mf_generic_kill_procs]
  Link: https://lkml.kernel.org/r/20220628112143.1170473-1-zhengbin13@huawei.com
Link: https://lkml.kernel.org/r/20220603053738.1218681-3-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:30 -07:00
Alex Sierra
f25cbb7a95 mm: add zone device coherent type memory support
Device memory that is cache coherent from device and CPU point of view. 
This is used on platforms that have an advanced system bus (like CAPI or
CXL).  Any page of a process can be migrated to such memory.  However, no
one should be allowed to pin such memory so that it can always be evicted.

[hch@lst.de: rebased ontop of the refcount changes, remove is_dev_private_or_coherent_page]
Link: https://lkml.kernel.org/r/20220715150521.18165-4-alex.sierra@amd.com
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:27 -07:00
Matthew Wilcox (Oracle)
75fa68a5d8 mm/swap: convert delete_from_swap_cache() to take a folio
All but one caller already has a folio, so convert it to use a folio.

Link: https://lkml.kernel.org/r/20220617175020.717127-22-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:48 -07:00
Miaohe Lin
7ce82f4c3f mm/migration: return errno when isolate_huge_page failed
We might fail to isolate huge page due to e.g.  the page is under
migration which cleared HPageMigratable.  We should return errno in this
case rather than always return 1 which could confuse the user, i.e.  the
caller might think all of the memory is migrated while the hugetlb page is
left behind.  We make the prototype of isolate_huge_page consistent with
isolate_lru_page as suggested by Huang Ying and rename isolate_huge_page
to isolate_hugetlb as suggested by Muchun to improve the readability.

Link: https://lkml.kernel.org/r/20220530113016.16663-4-linmiaohe@huawei.com
Fixes: e8db67eb0d ("mm: migrate: move_pages() supports thp migration")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Huang Ying <ying.huang@intel.com>
Reported-by: kernel test robot <lkp@intel.com> (build error)
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:37 -07:00
Matthew Wilcox (Oracle)
6ffcd825e7 mm: Remove __delete_from_page_cache()
This wrapper is no longer used.  Remove it and all references to it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29 08:51:05 -04:00