lineage-22.1
1241 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
liangjlee
|
29a00abe43 |
ANDROID: mm: Add restricted vendor hook in do_read_fault()
This patch add a restricted vendor hook in do_read_fault() for tracking which file and offsets are faulted. Bug: 336736235 Change-Id: I425690e58550c4ac44912daa10b5eac0728bfb4e Signed-off-by: liangjlee <liangjlee@google.com> |
||
Dezhi Huang
|
c7fcb9bf9a |
ANDROID: add vendor hook in do_read_fault to tune fault_around_bytes
with this vendor_hook, oem can dynamically adjust fault_around_bytes to balance memory usage and performance Bug: 340749845 Change-Id: I429f4302caf44a769696ccec84e9cc13ea8892ea Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com> |
||
Kalesh Singh
|
1537dbe21b |
ANDROID: 16K: Exclude ELF padding for fault around range
Userspace apps often analyze memory consumption by the use of mm rss_stat counters -- via the kmem/rss_stat trace event or from /proc/<pid>/statm. rss_stat counters are only updated when the PTEs are updated. What this means is that pages can be present in the page cache from readahead but not visible to userspace (not attributed to the app) as there is no corresponding VMA (PTEs) for the respective page cache pages. A side effect of the loader now extending ELF LOAD segments to be contiguously mapped in the virtual address space, means that the VMA is extended to cover the padding pages. When filesystems, such as f2fs and ext4, that implement vm_ops->map_pages() attempt to perform a do_fault_around() the extent of the fault around is restricted by the area of the enclosing VMA. Since the loader extends LOAD segment VMAs to be contiguously mapped, the extent of the fault around is also increased. The result of which, is that the PTEs corresponding to the padding pages are updated and reflected in the rss_stat counters. It is not common that userspace application developers be aware of this nuance in the kernel's memory accounting. To avoid apparent regressions in memory usage to userspace, restrict the fault around range to only valid data pages (i.e. exclude the padding pages at the end of the VMA). Bug: 330117029 Bug: 327600007 Bug: 330767927 Bug: 328266487 Bug: 329803029 Change-Id: I2c7a39ec1b040be2b9fb47801f95042f5dbf869d Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
Greg Kroah-Hartman
|
3ca4271578 |
Reapply "Merge tag 'android14-6.1.75_r00' into android14-6.1"
This reverts commit
|
||
Todd Kjos
|
6bad1052c2 |
Revert "Merge tag 'android14-6.1.75_r00' into android14-6.1"
This reverts commit
|
||
Greg Kroah-Hartman
|
e1b12db2de |
Merge 6.1.72 into android14-6.1-lts
Changes in 6.1.72 keys, dns: Fix missing size check of V1 server-list header block: Don't invalidate pagecache for invalid falloc modes ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6 mptcp: prevent tcp diag from closing listener subflows Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE cifs: cifs_chan_is_iface_active should be called with chan_lock held cifs: do not depend on release_iface for maintaining iface_list KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL wifi: iwlwifi: pcie: don't synchronize IRQs from IRQ drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes in AUX xfer netfilter: use skb_ip_totlen and iph_totlen netfilter: nf_tables: set transport offset from mac header for netdev/egress nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local octeontx2-af: Fix marking couple of structure as __packed drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern ice: Fix link_down_on_close message ice: Shut down VSI with "link-down-on-close" enabled i40e: Fix filter input checks to prevent config with invalid values igc: Report VLAN EtherType matching back to user igc: Check VLAN TCI mask igc: Check VLAN EtherType mask ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset mlxbf_gige: fix receive packet race condition net: sched: em_text: fix possible memory leak in em_text_destroy() r8169: Fix PCI error on system resume can: raw: add support for SO_MARK net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps net: annotate data-races around sk->sk_tsflags net: annotate data-races around sk->sk_bind_phc net: Implement missing getsockopt(SO_TIMESTAMPING_NEW) selftests: bonding: do not set port down when adding to bond ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init sfc: fix a double-free bug in efx_probe_filters net: bcmgenet: Fix FCS generation for fragmented skbuffs netfilter: nft_immediate: drop chain reference counter on error net: Save and restore msg_namelen in sock_sendmsg i40e: fix use-after-free in i40e_aqc_add_filters() ASoC: meson: g12a-toacodec: Validate written enum values ASoC: meson: g12a-tohdmitx: Validate written enum values ASoC: meson: g12a-toacodec: Fix event generation ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux i40e: Restore VF MSI-X state during PCI reset igc: Fix hicredit calculation net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues net/smc: fix invalid link access in dumping SMC-R connections octeontx2-af: Always configure NIX TX link credits based on max frame size octeontx2-af: Re-enable MAC TX in otx2_stop processing asix: Add check for usbnet_get_endpoints net: ravb: Wait for operating mode to be applied bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters() net: Implement missing SO_TIMESTAMPING_NEW cmsg support selftests: secretmem: floor the memory size to the multiple of page_size cpu/SMT: Create topology_smt_thread_allowed() cpu/SMT: Make SMT control more robust against enumeration failures srcu: Fix callbacks acceleration mishandling bpf, x64: Fix tailcall infinite loop bpf, x86: Simplify the parsing logic of structure parameters bpf, x86: save/restore regs with BPF_DW size net: Declare MSG_SPLICE_PAGES internal sendmsg() flag udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES splice, net: Add a splice_eof op to file-ops and socket-ops ipv4, ipv6: Use splice_eof() to flush udp: introduce udp->udp_flags udp: move udp->no_check6_tx to udp->udp_flags udp: move udp->no_check6_rx to udp->udp_flags udp: move udp->gro_enabled to udp->udp_flags udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO udp: annotate data-races around udp->encap_type wifi: iwlwifi: yoyo: swap cdb and jacket bits values arm64: dts: qcom: sdm845: align RPMh regulator nodes with bindings arm64: dts: qcom: sdm845: Fix PSCI power domain names fbdev: imsttfb: Release framebuffer and dealloc cmap on error path fbdev: imsttfb: fix double free in probe() bpf: decouple prune and jump points bpf: remove unnecessary prune and jump points bpf: Remove unused insn_cnt argument from visit_[func_call_]insn() bpf: clean up visit_insn()'s instruction processing bpf: Support new 32bit offset jmp instruction bpf: handle ldimm64 properly in check_cfg() bpf: fix precision backtracking instruction iteration blk-mq: make sure active queue usage is held for bio_integrity_prep() net/mlx5: Increase size of irq name buffer s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc() s390/cpumf: support user space events for counting f2fs: clean up i_compress_flag and i_compress_level usage f2fs: convert to use bitmap API f2fs: assign default compression level f2fs: set the default compress_level on ioctl selftests: mptcp: fix fastclose with csum failure selftests: mptcp: set FAILING_LINKS in run_tests media: camss: sm8250: Virtual channels for CSID media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3 ext4: convert move_extent_per_page() to use folios khugepage: replace try_to_release_page() with filemap_release_folio() memory-failure: convert truncate_error_page() to use folio mm: merge folio_has_private()/filemap_release_folio() call pairs mm, netfs, fscache: stop read optimisation when folio removed from pagecache filemap: add a per-mapping stable writes flag block: update the stable_writes flag in bdev_add smb: client: fix missing mode bits for SMB symlinks net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats dpaa2-eth: recycle the RX buffer only after all processing done ethtool: don't propagate EOPNOTSUPP from dumps bpf, sockmap: af_unix stream sockets need to hold ref for pair sock firmware: arm_scmi: Fix frequency truncation by promoting multiplier type ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 genirq/affinity: Remove the 'firstvec' parameter from irq_build_affinity_masks genirq/affinity: Pass affinity managed mask array to irq_build_affinity_masks genirq/affinity: Don't pass irq_affinity_desc array to irq_build_affinity_masks genirq/affinity: Rename irq_build_affinity_masks as group_cpus_evenly genirq/affinity: Move group_cpus_evenly() into lib/ lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly mm/memory_hotplug: add missing mem_hotplug_lock mm/memory_hotplug: fix error handling in add_memory_resource() net: sched: call tcf_ct_params_free to free params in tcf_ct_init netfilter: flowtable: allow unidirectional rules netfilter: flowtable: cache info of last offload net/sched: act_ct: offload UDP NEW connections net/sched: act_ct: Fix promotion of offloaded unreplied tuple netfilter: flowtable: GC pushes back packets to classic path net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table octeontx2-af: Fix pause frame configuration octeontx2-af: Support variable number of lmacs btrfs: fix qgroup_free_reserved_data int overflow btrfs: mark the len field in struct btrfs_ordered_sum as unsigned ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg() firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect i2c: core: Fix atomic xfer check for non-preempt config mm: fix unmap_mapping_range high bits shift bug drm/amdgpu: skip gpu_info fw loading on navi12 drm/amd/display: add nv12 bounding box mmc: meson-mx-sdhc: Fix initialization frozen issue mmc: rpmb: fixes pause retune on all RPMB partitions. mmc: core: Cancel delayed work before releasing host mmc: sdhci-sprd: Fix eMMC init failure after hw reset genirq/affinity: Only build SMP-only helper functions on SMP kernels f2fs: compress: fix to assign compress_level for lz4 correctly net/sched: act_ct: additional checks for outdated flows net/sched: act_ct: Always fill offloading tuple iifidx bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4 bpf: syzkaller found null ptr deref in unix_bpf proto add media: qcom: camss: Comment CSID dt_id field smb3: Replace smb2pdu 1-element arrays with flex-arrays Revert "interconnect: qcom: sm8250: Enable sync_state" Linux 6.1.72 Change-Id: Id00eb2ae1159d4d5fa0ef914e672c5669cbf5b0a Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Jiajun Xie
|
dafdeb7b91 |
mm: fix unmap_mapping_range high bits shift bug
commit 9eab0421fa94a3dde0d1f7e36ab3294fc306c99d upstream. The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Peng Zhang
|
ed9b660cd1 |
BACKPORT: FROMGIT fork: use __mt_dup() to duplicate maple tree in dup_mmap()
In dup_mmap(), using __mt_dup() to duplicate the old maple tree and then directly replacing the entries of VMAs in the new maple tree can result in better performance. __mt_dup() uses DFS pre-order to duplicate the maple tree, so it is efficient. The average time complexity of __mt_dup() is O(n), where n is the number of VMAs. The proof of the time complexity is provided in the commit log that introduces __mt_dup(). After duplicating the maple tree, each element is traversed and replaced (ignoring the cases of deletion, which are rare). Since it is only a replacement operation for each element, this process is also O(n). Analyzing the exact time complexity of the previous algorithm is challenging because each insertion can involve appending to a node, pushing data to adjacent nodes, or even splitting nodes. The frequency of each action is difficult to calculate. The worst-case scenario for a single insertion is when the tree undergoes splitting at every level. If we consider each insertion as the worst-case scenario, we can determine that the upper bound of the time complexity is O(n*log(n)), although this is a loose upper bound. However, based on the test data, it appears that the actual time complexity is likely to be O(n). As the entire maple tree is duplicated using __mt_dup(), if dup_mmap() fails, there will be a portion of VMAs that have not been duplicated in the maple tree. To handle this, we mark the failure point with XA_ZERO_ENTRY. In exit_mmap(), if this marker is encountered, stop releasing VMAs that have not been duplicated after this point. There is a "spawn" in byte-unixbench[1], which can be used to test the performance of fork(). I modified it slightly to make it work with different number of VMAs. Below are the test results. The first row shows the number of VMAs. The second and third rows show the number of fork() calls per ten seconds, corresponding to next-20231006 and the this patchset, respectively. The test results were obtained with CPU binding to avoid scheduler load balancing that could cause unstable results. There are still some fluctuations in the test results, but at least they are better than the original performance. 21 121 221 421 821 1621 3221 6421 12821 25621 51221 112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393 114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599 2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42% [1] https://github.com/kdlucas/byte-unixbench/tree/master Link: https://lkml.kernel.org/r/20231027033845.90608-11-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit d2406291483775ecddaee929231a39c70c08fda2 https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) [surenb: open-coded vma_iter_clear_gfp(), vma_iter_bulk_store(); replaced vma_next() with mas_find()] Bug: 308042511 Change-Id: I42d6620e8ce6a0b16211c231a9b72ba16ba9c0d2 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
4a518d8633 |
UPSTREAM: mm: handle write faults to RO pages under the VMA lock
I think this is a pretty rare occurrence, but for consistency handle faults with the VMA lock held the same way that we handle other faults with the VMA lock held. Link: https://lkml.kernel.org/r/20231006195318.4087158-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4a68fef16df9d88d528094116f8bbd2dbfa62089) Bug: 293665307 Change-Id: I69cec218c8a1fe14df3268722e6b1be6dffe7978 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
c1da94fa44 |
UPSTREAM: mm: handle read faults under the VMA lock
Most file-backed faults are already handled through ->map_pages(), but if we need to do I/O we'll come this way. Since filemap_fault() is now safe to be called under the VMA lock, we can handle these faults under the VMA lock now. Link: https://lkml.kernel.org/r/20231006195318.4087158-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 12214eba1992642eee5813a9cc9f626e5b2d1815) Bug: 293665307 Change-Id: Iee48af98b866d88d88ec01143eb26389ab373b6b Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
6541fffd92 |
UPSTREAM: mm: handle COW faults under the VMA lock
If the page is not currently present in the page tables, we need to call the page fault handler to find out which page we're supposed to COW, so we need to both check that there is already an anon_vma and that the fault handler doesn't need the mmap_lock. Link: https://lkml.kernel.org/r/20231006195318.4087158-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4de8c93a4751e10737b6af65db42c743228c67a6) Bug: 293665307 Change-Id: If749a6f8fcf69d83bbf872c1d45865d1b1b77ea0 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
c7fa581a79 |
UPSTREAM: mm: handle shared faults under the VMA lock
There are many implementations of ->fault and some of them depend on mmap_lock being held. All vm_ops that implement ->map_pages() end up calling filemap_fault(), which I have audited to be sure it does not rely on mmap_lock. So (for now) key off ->map_pages existing as a flag to indicate that it's safe to call ->fault while only holding the vma lock. Link: https://lkml.kernel.org/r/20231006195318.4087158-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4ed4379881aa62588aba6442a9f362a8cf7624e6) Bug: 293665307 Change-Id: Ifb5ab3df5d05fb182d0cb52820fa24e28e2d6496 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
95af8a80bb |
BACKPORT: mm: call wp_page_copy() under the VMA lock
It is usually safe to call wp_page_copy() under the VMA lock. The only unsafe situation is when no anon_vma has been allocated for this VMA, and we have to look at adjacent VMAs to determine if their anon_vma can be shared. Since this happens only for the first COW of a page in this VMA, the majority of calls to wp_page_copy() do not need to fall back to the mmap_sem. Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which will return RETRY if we currently hold the VMA lock and need to allocate an anon_vma. This lets us drop the check in do_wp_page(). Link: https://lkml.kernel.org/r/20231006195318.4087158-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 164b06f238b986317131e6b61b2f22aabcbc2cc0) [surenb: resolved merge conflicts due to folio/page differences] Bug: 293665307 Change-Id: I39bdc247b375bd3dae8078b52c60fd4ce12e1850 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox
|
9c4bc457ab |
UPSTREAM: mm/memory.c: fix mismerge
Fix a build issue. Link: https://lkml.kernel.org/r/ZNerqcNS4EBJA/2v@casper.infradead.org Fixes: 4aaa60dad4d1 ("mm: allow per-VMA locks on file-backed VMAs") Signed-off-by: Matthew Wilcox <willy@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308121909.XNYBtqNI-lkp@intel.com/ Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 08dff2810e8feb3096bf5c8242ab1649d1e8b1a4) Bug: 293665307 Change-Id: I07ce19f29c44831cdcf709fe1ce122d1963f0be2 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Greg Kroah-Hartman
|
c259cc9cb4 |
This is the 6.1.57 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUlrb4ACgkQONu9yGCS aT4b+hAAgvFC6P+XmyyNXJ9ISHLkgSlcIAdatb+qeOCUtdiWHqfxIha13FdnCdhL WS2c/O9ORfAzjFwnYWF6LBwH8ArxRSkAXrGCMuCkEFBP3cG/j2HD+XLAAYEuBjjb sf1fw8e8VSgaPEOnwXie5rTfAY4VnZKEtZjAxjyIQnJKVVKfxQRb8CyaWDPzPD0Z tL/iABt7UWNHZayHTHsh0YhF2UhXtOjHinWigEarcZQEvOB2qRQtFl71cnqosi+t 3ZRZzepH7/Fx3v6/H/6PNq+GSI/ZzhOiCQolVV5YcMGHXsW9cP6arjLUxco5pzpk pEg0vdMq47JOZYQ2pIewG4t7+NLmFIxCRFnKQVbxeFNSY9c1jhd8g5lhx9YEXwjT BzMtV5DnZoaoMdq2P1STw/+RVYrDI1Lm6jqfgw/D27b7LzQ13VsGM9BJ1rCs8Hm7 UhWyjwFcgo0vhpfML1RF0RtT9Mo5SOnpGPfpbFdjg8jdXlGknNH0QsH+EY/BpF8l h77P5BvoNIjsIN3B1YunfXtFXhx3h0sI8zZrqHR+zhOeWGsXcqQ5mZ/lYdYKkKuH R8LRB7shPndF4xdRX0uRXwomcXhs+60eA5xEvE9u0CqqdpXfQN5oTwixfCm2C8MS O5Fc7hfvK11XtR3ja+y3KRhiNG3YsfW2PXnlOfZxMZ6iPqXtA/o= =5/pn -----END PGP SIGNATURE----- Merge 6.1.57 into android14-6.1-lts Changes in 6.1.57 spi: zynqmp-gqspi: fix clock imbalance on probe failure ASoC: soc-utils: Export snd_soc_dai_is_dummy() symbol ASoC: tegra: Fix redundant PLLA and PLLA_OUT0 updates mptcp: rename timer related helper to less confusing names mptcp: fix dangling connection hang-up mptcp: annotate lockless accesses to sk->sk_err mptcp: move __mptcp_error_report in protocol.c mptcp: process pending subflow error on close ata,scsi: do not issue START STOP UNIT on resume scsi: sd: Differentiate system and runtime start/stop management scsi: sd: Do not issue commands to suspended disks on shutdown scsi: core: Improve type safety of scsi_rescan_device() scsi: Do not attempt to rescan suspended devices ata: libata-scsi: Fix delayed scsi_rescan_device() execution NFS: Cleanup unused rpc_clnt variable NFS: rename nfs_client_kset to nfs_kset NFSv4: Fix a state manager thread deadlock regression mm/memory: add vm_normal_folio() mm/mempolicy: convert queue_pages_pmd() to queue_folios_pmd() mm/mempolicy: convert queue_pages_pte_range() to queue_folios_pte_range() mm/mempolicy: convert migrate_page_add() to migrate_folio_add() mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified mm/page_alloc: always remove pages from temporary list mm/page_alloc: leave IRQs enabled for per-cpu page allocations mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list ring-buffer: remove obsolete comment for free_buffer_page() ring-buffer: Fix bytes info in per_cpu buffer stats btrfs: use struct qstr instead of name and namelen pairs btrfs: setup qstr from dentrys using fscrypt helper btrfs: use struct fscrypt_str instead of struct qstr Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path net: add sysctl accept_ra_min_rtr_lft net: change accept_ra_min_rtr_lft to affect all RA lifetimes net: release reference to inet6_dev pointer arm64: cpufeature: Fix CLRBHB and BC detection drm/amd/display: Adjust the MST resume flow iommu/arm-smmu-v3: Set TTL invalidation hint better iommu/arm-smmu-v3: Avoid constructing invalid range commands rbd: move rbd_dev_refresh() definition rbd: decouple header read-in from updating rbd_dev->header rbd: decouple parent info read-in from updating rbd_dev rbd: take header_rwsem in rbd_dev_refresh() only when updating block: fix use-after-free of q->q_usage_counter hwmon: (nzxt-smart2) Add device id hwmon: (nzxt-smart2) add another USB ID i40e: fix the wrong PTP frequency calculation scsi: zfcp: Fix a double put in zfcp_port_enqueue() iommu/vt-d: Avoid memory allocation in iommu_suspend() vringh: don't use vringh_kiov_advance() in vringh_iov_xfer() net: ethernet: mediatek: disable irq before schedule napi mptcp: userspace pm allow creating id 0 subflow qed/red_ll2: Fix undefined behavior bug in struct qed_ll2_info Bluetooth: hci_codec: Fix leaking content of local_codecs Bluetooth: hci_sync: Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER wifi: mwifiex: Fix tlv_buf_left calculation md/raid5: release batch_last before waiting for another stripe_head PCI: qcom: Fix IPQ8074 enumeration net: replace calls to sock->ops->connect() with kernel_connect() net: prevent rewrite of msg_name in sock_sendmsg() drm/amd: Fix detection of _PR3 on the PCIe root port drm/amd: Fix logic error in sienna_cichlid_update_pcie_parameters() arm64: Add Cortex-A520 CPU part definition arm64: errata: Add Cortex-A520 speculative unprivileged load workaround HID: sony: Fix a potential memory leak in sony_probe() ubi: Refuse attaching if mtd's erasesize is 0 erofs: fix memory leak of LZMA global compressed deduplication wifi: iwlwifi: dbg_ini: fix structure packing wifi: iwlwifi: mvm: Fix a memory corruption issue wifi: cfg80211: hold wiphy lock in auto-disconnect wifi: cfg80211: move wowlan disable under locks wifi: cfg80211: add a work abstraction with special semantics wifi: cfg80211: fix cqm_config access race wifi: cfg80211: add missing kernel-doc for cqm_rssi_work wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet leds: Drop BUG_ON check for LED_COLOR_ID_MULTI bpf: Fix tr dereferencing regulator: mt6358: Drop *_SSHUB regulators regulator: mt6358: Use linear voltage helpers for single range regulators regulator: mt6358: split ops for buck and linear range LDO regulators Bluetooth: Delete unused hci_req_prepare_suspend() declaration Bluetooth: ISO: Fix handling of listen for unicast drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close() wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling perf/x86/amd/core: Fix overflow reset on hotplug regmap: rbtree: Fix wrong register marked as in-cache when creating new node wifi: mac80211: fix potential key use-after-free perf/x86/amd: Do not WARN() on every IRQ iommu/mediatek: Fix share pgtable for iova over 4GB regulator/core: regulator_register: set device->class earlier ima: Finish deprecation of IMA_TRUSTED_KEYRING Kconfig scsi: target: core: Fix deadlock due to recursive locking ima: rework CONFIG_IMA dependency block NFSv4: Fix a nfs4_state_manager() race bpf: tcp_read_skb needs to pop skb regardless of seq bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets modpost: add missing else to the "of" check net: fix possible store tearing in neigh_periodic_work() bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup neighbour: annotate lockless accesses to n->nud_state neighbour: switch to standard rcu, instead of rcu_bh neighbour: fix data-races around n->output ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data() ptp: ocp: Fix error handling in ptp_ocp_device_init net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling net: usb: smsc75xx: Fix uninit-value access in __smsc75xx_read_reg net: nfc: llcp: Add lock when modifying device list net: ethernet: ti: am65-cpsw: Fix error code in am65_cpsw_nuss_init_tx_chns() ibmveth: Remove condition to recompute TCP header checksum. netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp selftests: netfilter: Test nf_tables audit logging selftests: netfilter: Extend nft_audit.sh netfilter: nf_tables: Deduplicate nft_register_obj audit logs netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure ipv4: Set offload_failed flag in fibmatch results net: stmmac: dwmac-stm32: fix resume on STM32 MCU tipc: fix a potential deadlock on &tx->lock tcp: fix quick-ack counting to count actual ACKs of new data tcp: fix delayed ACKs for MSS boundary condition sctp: update transport state when processing a dupcook packet sctp: update hb timer immediately after users change hb_interval netlink: split up copies in the ack construction netlink: Fix potential skb memleak in netlink_ack netlink: annotate data-races around sk->sk_err HID: sony: remove duplicate NULL check before calling usb_free_urb() HID: intel-ish-hid: ipc: Disable and reenable ACPI GPE bit intel_idle: add Emerald Rapids Xeon support smb: use kernel_connect() and kernel_bind() parisc: Fix crash with nr_cpus=1 option dm zoned: free dmz->ddev array in dmz_put_zoned_devices RDMA/core: Require admin capabilities to set system parameters of: dynamic: Fix potential memory leak in of_changeset_action() IB/mlx4: Fix the size of a buffer in add_port_entries() gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config() gpio: pxa: disable pinctrl calls for MMP_GPIO RDMA/cma: Initialize ib_sa_multicast structure to 0 when join RDMA/cma: Fix truncation compilation warning in make_cma_ports RDMA/uverbs: Fix typo of sizeof argument RDMA/srp: Do not call scsi_done() from srp_abort() RDMA/siw: Fix connection failure handling RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation RDMA/mlx5: Fix NULL string error x86/sev: Use the GHCB protocol when available for SNP CPUID requests ksmbd: fix race condition between session lookup and expire ksmbd: fix uaf in smb20_oplock_break_ack parisc: Restore __ldcw_align for PA-RISC 2.0 processors ipv6: remove nexthop_fib6_nh_bh() vrf: Fix lockdep splat in output path btrfs: fix an error handling path in btrfs_rename() btrfs: fix fscrypt name leak after failure to join log transaction netlink: remove the flex array from struct nlmsghdr btrfs: file_remove_privs needs an exclusive lock in direct io write ipv6: remove one read_lock()/read_unlock() pair in rt6_check_neigh() xen/events: replace evtchn_rwlock with RCU Linux 6.1.57 Change-Id: I2c200264df72a9043d91d31479c91b0d7f94863e Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Vishal Moola (Oracle)
|
6d6635749d |
mm/memory: add vm_normal_folio()
[ Upstream commit 318e9342fbbb6888d903d86e83865609901a1c65 ] Patch series "Convert deactivate_page() to folio_deactivate()", v4. Deactivate_page() has already been converted to use folios. This patch series modifies the callers of deactivate_page() to use folios. It also introduces vm_normal_folio() to assist with folio conversions, and converts deactivate_page() to folio_deactivate() which takes in a folio. This patch (of 4): Introduce a wrapper function called vm_normal_folio(). This function calls vm_normal_page() and returns the folio of the page found, or null if no page is found. This function allows callers to get a folio from a pte, which will eventually allow them to completely replace their struct page variables with struct folio instead. Link: https://lkml.kernel.org/r/20221221180848.20774-1-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20221221180848.20774-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 24526268f4e3 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Suren Baghdasaryan
|
4fcc13c1ff |
ANDROID: mm: add missing check in the backport for handling faults under VMA lock
While backporting, a check for vma locking inside do_wp_page() was
missed. Add it.
Fixes:
|
||
Matthew Wilcox (Oracle)
|
3ebafb7b46 |
BACKPORT: FROMGIT: mm: handle faults that merely update the accessed bit under the VMA lock
Move FAULT_FLAG_VMA_LOCK check out of handle_pte_fault(). This should have a significant performance improvement for mmaped files. Write faults (on read-only shared pages) still take the mmap lock as we do not want to audit all the implementations of ->pfn_mkwrite() and ->page_mkwrite(). However write-faults on private mappings are handled under the VMA lock. Link: https://lkml.kernel.org/r/20230724185410.1124082-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 88e2667632d43928d3ed50d0163ecd73aaa2d455 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) [surenb: replaced folio_put() with put_page() in wp_page_shared()] Bug: 293665307 Change-Id: I27ac40bb0f7347083f641e0cfc8ab33e182c4c5b Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
83ab986324 |
FROMGIT: mm: handle swap and NUMA PTE faults under the VMA lock
Move the FAULT_FLAG_VMA_LOCK check down in handle_pte_fault(). This is probably not a huge win in its own right, but is a nicely separable bit from the next patch. Link: https://lkml.kernel.org/r/20230724185410.1124082-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 51c4fdc72be2287960ab5c1f5beae84f3039fd01 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 293665307 Change-Id: I6cf9cb1d40c23287ce179a8c435427c3d88d2528 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
ffcebdef16 |
FROMGIT: mm: run the fault-around code under the VMA lock
The map_pages fs method should be safe to run under the VMA lock instead of the mmap lock. This should have a measurable reduction in contention on the mmap lock. Link: https://lkml.kernel.org/r/20230724185410.1124082-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 7456c15600264d635293c91df1e0c0b5a1e73578 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 293665307 Change-Id: Iaa1b0c2deeade361b34118f41b5deb591268a269 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
072c35fb69 |
FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down from do_fault()
Perform the check at the start of do_read_fault(), do_cow_fault() and do_shared_fault() instead. Should be no performance change from the last commit. Link: https://lkml.kernel.org/r/20230724185410.1124082-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4e105ec567c874c166a8e5a9b2dd849c8ec2055e https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 293665307 Change-Id: I37be370a0378afd094d880bb8e538e4e7874499e Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
fa9a8adff0 |
FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault()
Call do_pte_missing() under the VMA lock ... then immediately retry in do_fault(). Link: https://lkml.kernel.org/r/20230724185410.1124082-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4c753b25481499cd1cb6a8ddba18bc5585f34296 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 293665307 Change-Id: I8c8f2feaade7c40daf37b63e43111d22ec147e5f Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
dd621869c1 |
BACKPORT: FROMGIT: mm: handle some PMD faults under the VMA lock
Push the VMA_LOCK check down from __handle_mm_fault() to handle_pte_fault(). Once again, we refuse to call ->huge_fault() with the VMA lock held, but we will wait for a PMD migration entry with the VMA lock held, handle NUMA migration and set the accessed bit. We were already doing this for anonymous VMAs, so it should be safe. Link: https://lkml.kernel.org/r/20230724185410.1124082-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit b7b8f56db92f56ce812e305f84aef0404287b534 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) [surenb: resolved merge conflicts in create_huge_pmd() and wp_huge_pmd()] Bug: 293665307 Change-Id: I3ec9042b2e39a5caf6b6f3a478bf9ba337012aa4 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
8594d6a30f |
BACKPORT: FROMGIT: mm: handle PUD faults under the VMA lock
Postpone checking the VMA_LOCK flag until we've attempted to handle faults on PUDs. There's a mild upside to this patch in that we'll allocate the page tables while under the VMA lock rather than the mmap lock, reducing the hold time on the mmap lock, since the retry will find the page tables already populated. The real purpose here is to make a commit that shows we don't call ->huge_fault under the VMA lock. We do now handle setting the accessed bit on a PUD fault under the VMA lock, but that doesn't seem likely to be a measurable difference. Link: https://lkml.kernel.org/r/20230724185410.1124082-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 3c04dd18ba57c6753a7ddc6e6c902550a7ac54d9 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) [surenb: resolved merge conflicts in wp_huge_pud()] Bug: 293665307 Change-Id: Ife20ed7de6444c0e424e12f9fdcdc8f8ecaed2aa Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
66cbbe6b31 |
FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check from handle_mm_fault()
Handle a little more of the page fault path outside the mmap sem. The hugetlb path doesn't need to check whether the VMA is anonymous; the VM_HUGETLB flag is only set on hugetlbfs VMAs. There should be no performance change from the previous commit; this is simply a step to ease bisection of any problems. Link: https://lkml.kernel.org/r/20230724185410.1124082-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 51db5e8974cafee10b2252efa78f89af7d60cd11 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 293665307 Change-Id: I300c7105fa3530e8eb05862cb3f66b7adac99420 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Matthew Wilcox (Oracle)
|
e26044769f |
BACKPORT: FROMGIT: mm: allow per-VMA locks on file-backed VMAs
Remove the TCP layering violation by allowing per-VMA locks on all VMAs. The fault path will immediately fail in handle_mm_fault(). There may be a small performance reduction from this patch as a little unnecessary work will be done on each page fault. See later patches for the improvement. Link: https://lkml.kernel.org/r/20230724185410.1124082-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 698dcd77360a3ce15dfc6fe55f9b5572ad4c4291 https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) [surenb: skip tcp-related changes] Bug: 293665307 Change-Id: I73d9d1e4f96419d4723a920fc5960e806749c368 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
250f19771f |
FROMGIT: mm: handle userfaults under VMA lock
Enable handle_userfault to operate under VMA lock by releasing VMA lock instead of mmap_lock and retrying. Note that FAULT_FLAG_RETRY_NOWAIT should never be used when handling faults under per-VMA lock protection because that would break the assumption that lock is dropped on retry. Link: https://lkml.kernel.org/r/20230630211957.1341547-7-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit c3c986f59c814edecc096a049d67e5791083388b https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 161210518 Change-Id: I9df667dae39024e5473252d7347ec7929f7f999e Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
e704d0e4f9 |
FROMGIT: mm: handle swap page faults under per-VMA lock
When page fault is handled under per-VMA lock protection, all swap page faults are retried with mmap_lock because folio_lock_or_retry has to drop and reacquire mmap_lock if folio could not be immediately locked. Follow the same pattern as mmap_lock to drop per-VMA lock when waiting for folio and retrying once folio is available. With this obstacle removed, enable do_swap_page to operate under per-VMA lock protection. Drivers implementing ops->migrate_to_ram might still rely on mmap_lock, therefore we have to fall back to mmap_lock in that particular case. Note that the only time do_swap_page calls synchronous swap_readpage is when SWP_SYNCHRONOUS_IO is set, which is only set for QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and pmem). Therefore we don't sleep in this path, and there's no need to drop the mmap or per-VMA lock. Link: https://lkml.kernel.org/r/20230630211957.1341547-6-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit cc989adb5544594d8c12893eda3c6df8682de11b https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 161210518 Change-Id: I5d80f435b2dbdc3f3d02be056e893f6fedbc7a98 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
f8a65b694b |
FROMGIT: mm: change folio_lock_or_retry to use vm_fault directly
Change folio_lock_or_retry to accept vm_fault struct and return the vm_fault_t directly. Link: https://lkml.kernel.org/r/20230630211957.1341547-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit af27bb856a0a29a0673aabe163e4774df67a8bcd https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 161210518 Change-Id: I9d203e801f0d5517fba8430f9ab82d4063b517f3 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
693d905ec0 |
BACKPORT: FROMGIT: mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED
handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means mmap_lock has been released. However with per-VMA locks behavior is different and the caller should still release it. To make the rules consistent for the caller, drop the per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns VM_FAULT_COMPLETED for now. Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 5197d920745dd42eae023986dbf053107ac238db https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) [surenb: add the code from missing sanitize_fault_flags directly into handle_mm_fault, add the fix for riscv] Bug: 161210518 Change-Id: Iefd4e49bda940c457a70ecf40d074ad532959759 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
ad18923856 |
FROMGIT: mm: replace mmap with vma write lock assertions when operating on a vma
Vma write lock assertion always includes mmap write lock assertion and additional vma lock checks when per-VMA locks are enabled. Replace weaker mmap_assert_write_locked() assertions with stronger vma_assert_write_locked() ones when we are operating on a vma which is expected to be locked. Link: https://lkml.kernel.org/r/20230804152724.3090321-4-surenb@google.com Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 928a31b91cf64aa99a8999dcd66bec0ad02f64ef https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable) Bug: 293665307 Change-Id: I861db0510612f571f2ca44e0a9d7e01274d4eb36 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Jann Horn
|
890b1aabb1 |
BACKPORT: mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't be called in the VMA-locked page fault path by ensuring that vma->anon_vma is set. However, this check happens before the VMA is locked, which means a concurrent move_vma() can concurrently call unlink_anon_vmas(), which disassociates the VMA's anon_vma. This means we can get UAF in the following scenario: THREAD 1 THREAD 2 ======== ======== <page fault> lock_vma_under_rcu() rcu_read_lock() mas_walk() check vma->anon_vma mremap() syscall move_vma() vma_start_write() unlink_anon_vmas() <syscall end> handle_mm_fault() __handle_mm_fault() handle_pte_fault() do_pte_missing() do_anonymous_page() anon_vma_prepare() __anon_vma_prepare() find_mergeable_anon_vma() mas_walk() [looks up VMA X] munmap() syscall (deletes VMA X) reusable_anon_vma() [called on freed VMA X] This is a security bug if you can hit it, although an attacker would have to win two races at once where the first race window is only a few instructions wide. This patch is based on some previous discussion with Linus Torvalds on the security list. Cc: stable@vger.kernel.org Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 293665307 (cherry picked from commit 657b5146955eba331e01b9a6ae89ce2e716ba306) [surenb: removed vma_is_tcp() call not present in 6.1] Change-Id: I4bd91e1db337ff35eb7c1d436f4372944556dd7d Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Linus Torvalds
|
188ce9572f |
BACKPORT: mm: always expand the stack with the mmap write lock held
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [6.1: Patch drivers/iommu/io-pgfault.c instead] Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [surenb: change in io-pgfault.c was done in iommu-sva.c] Change-Id: Icdcdded08d7ad4eda8fae1120a3c8b3d957516c1 (cherry picked from commit 8d7071af890768438c14db6172cc8f9f4d04e184) Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Liam R. Howlett
|
1afccd4255 |
UPSTREAM: mm: make find_extend_vma() fail if write lock not held
commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream. Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: If12d2d68429b6d71393f02d5ed7e6939c3cd5405 (cherry picked from commit f440fa1ac955e2898893f9301568435eb5cdfc4b) Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Linus Torvalds
|
cf70cb4f1f |
UPSTREAM: mm: make the page fault mmap locking killable
commit eda0047296a16d65a7f2bc60a408f70d178b2014 upstream. This is done as a separate patch from introducing the new lock_mm_and_find_vma() helper, because while it's an obvious change, it's not what x86 used to do in this area. We already abort the page fault on fatal signals anyway, so why should we wait for the mmap lock only to then abort later? With the new helper function that returns without the lock held on failure anyway, this is particularly easy and straightforward. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: I9730b4543265a20253cbfc02de135cc77927f821 (cherry picked from commit eda0047296a16d65a7f2bc60a408f70d178b2014) Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Linus Torvalds
|
3803ae4a28 |
BACKPORT: mm: introduce new 'lock_mm_and_find_vma()' page fault helper
commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream.
.. and make x86 use it.
This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.
We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.
That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.
So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.
Note: I say "common helper", but it really only handles the normal
stack-grows-down case. Which is all architectures except for PA-RISC
and IA64. So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.
It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.
But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[surenb: this one is taken from 6.4.y stable branch]
Change-Id: I6e16e6751245ac24adcbe78114bc57c726463acb
(cherry-picked from commit
|
||
Tony Luck
|
53048f151c |
BACKPORT: mm, hwpoison: when copy-on-write hits poison, take page offline
commit d302c2398ba269e788a4f37ae57c07a7fcabaa42 upstream.
Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.
It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.
Use memory_failure_queue() to request a call to memory_failure() for the
page with the error.
Also provide a stub version for CONFIG_MEMORY_FAILURE=n
Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Due to missing commits
e591ef7d96d6e ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage")
5033091de814a ("mm/hwpoison: introduce per-memory_block hwpoison counter")
The impact of e591ef7d96d6e is its introduction of an additional flag in
__get_huge_page_for_hwpoison() that serves as an indication a hwpoisoned
hugetlb page should have its migratable bit cleared.
The impact of 5033091de814a is contexual.
Resolve by ignoring both missing commits. - jane]
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ica2c1970fe3cdfa9dc7d3f288e1e6a90378a9764
(cherry-picked from commit
|
||
Tony Luck
|
a2dff37b0c |
UPSTREAM: mm, hwpoison: try to recover from copy-on write faults
commit a873dfe1032a132bf89f9e19a6ac44f5a0b78754 upstream.
Patch series "Copy-on-write poison recovery", v3.
Part 1 deals with the process that triggered the copy on write fault with
a store to a shared read-only page. That process is send a SIGBUS with
the usual machine check decoration to specify the virtual address of the
lost page, together with the scope.
Part 2 sets up to asynchronously take the page with the uncorrected error
offline to prevent additional machine check faults. H/t to Miaohe Lin
<linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com> for
pointing me to the existing function to queue a call to memory_failure().
On x86 there is some duplicate reporting (because the error is also
signalled by the memory controller as well as by the core that triggered
the machine check). Console logs look like this:
This patch (of 2):
If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.
It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.
I wrapped that neatly into a test at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git
just enable ACPI error injection and run:
# ./einj_mem-uc -f copy-on-write
Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.
Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?
On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.
[tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin]
Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Igned-off-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I7c35cd47de59611fcc0550b0a7fd4e3911bbb110
(cherry-picked from commit
|
||
Peter Collingbourne
|
50fb32197f |
mm: call arch_swap_restore() from do_swap_page()
commit 6dca4ac6fc91fd41ea4d6c4511838d37f4e0eab2 upstream. Commit |
||
Linus Torvalds
|
e6bbad7571 |
mm: always expand the stack with the mmap write lock held
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [6.1: Patch drivers/iommu/io-pgfault.c instead] Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Liam R. Howlett
|
6a6b5616c3 |
mm: make find_extend_vma() fail if write lock not held
commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream. Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Linus Torvalds
|
755aa1bc6a |
mm: make the page fault mmap locking killable
commit eda0047296a16d65a7f2bc60a408f70d178b2014 upstream. This is done as a separate patch from introducing the new lock_mm_and_find_vma() helper, because while it's an obvious change, it's not what x86 used to do in this area. We already abort the page fault on fatal signals anyway, so why should we wait for the mmap lock only to then abort later? With the new helper function that returns without the lock held on failure anyway, this is particularly easy and straightforward. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Linus Torvalds
|
d6a5c7a1a6 |
mm: introduce new 'lock_mm_and_find_vma()' page fault helper
commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream. .. and make x86 use it. This basically extracts the existing x86 "find and expand faulting vma" code, but extends it to also take the mmap lock for writing in case we actually do need to expand the vma. We've historically short-circuited that case, and have some rather ugly special logic to serialize the stack segment expansion (since we only hold the mmap lock for reading) that doesn't match the normal VM locking. That slight violation of locking worked well, right up until it didn't: the maple tree code really does want proper locking even for simple extension of an existing vma. So extract the code for "look up the vma of the fault" from x86, fix it up to do the necessary write locking, and make it available as a helper function for other architectures that can use the common helper. Note: I say "common helper", but it really only handles the normal stack-grows-down case. Which is all architectures except for PA-RISC and IA64. So some rare architectures can't use the helper, but if they care they'll just need to open-code this logic. It's also worth pointing out that this code really would like to have an optimistic "mmap_upgrade_trylock()" to make it quicker to go from a read-lock (for the common case) to taking the write lock (for having to extend the vma) in the normal single-threaded situation where there is no other locking activity. But that _is_ all the very uncommon special case, so while it would be nice to have such an operation, it probably doesn't matter in reality. I did put in the skeleton code for such a possible future expansion, even if it only acts as pseudo-documentation for what we're doing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [6.1: Ignore CONFIG_PER_VMA_LOCK context] Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Tony Luck
|
84f077802e |
mm, hwpoison: when copy-on-write hits poison, take page offline
commit d302c2398ba269e788a4f37ae57c07a7fcabaa42 upstream. Cannot call memory_failure() directly from the fault handler because mmap_lock (and others) are held. It is important, but not urgent, to mark the source page as h/w poisoned and unmap it from other tasks. Use memory_failure_queue() to request a call to memory_failure() for the page with the error. Also provide a stub version for CONFIG_MEMORY_FAILURE=n Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Due to missing commits e591ef7d96d6e ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage") 5033091de814a ("mm/hwpoison: introduce per-memory_block hwpoison counter") The impact of e591ef7d96d6e is its introduction of an additional flag in __get_huge_page_for_hwpoison() that serves as an indication a hwpoisoned hugetlb page should have its migratable bit cleared. The impact of 5033091de814a is contexual. Resolve by ignoring both missing commits. - jane] Signed-off-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Tony Luck
|
4af5960d7c |
mm, hwpoison: try to recover from copy-on write faults
commit a873dfe1032a132bf89f9e19a6ac44f5a0b78754 upstream. Patch series "Copy-on-write poison recovery", v3. Part 1 deals with the process that triggered the copy on write fault with a store to a shared read-only page. That process is send a SIGBUS with the usual machine check decoration to specify the virtual address of the lost page, together with the scope. Part 2 sets up to asynchronously take the page with the uncorrected error offline to prevent additional machine check faults. H/t to Miaohe Lin <linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com> for pointing me to the existing function to queue a call to memory_failure(). On x86 there is some duplicate reporting (because the error is also signalled by the memory controller as well as by the core that triggered the machine check). Console logs look like this: This patch (of 2): If the kernel is copying a page as the result of a copy-on-write fault and runs into an uncorrectable error, Linux will crash because it does not have recovery code for this case where poison is consumed by the kernel. It is easy to set up a test case. Just inject an error into a private page, fork(2), and have the child process write to the page. I wrapped that neatly into a test at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git just enable ACPI error injection and run: # ./einj_mem-uc -f copy-on-write Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel() on architectures where that is available (currently x86 and powerpc). When an error is detected during the page copy, return VM_FAULT_HWPOISON to caller of wp_page_copy(). This propagates up the call stack. Both x86 and powerpc have code in their fault handler to deal with this code by sending a SIGBUS to the application. Note that this patch avoids a system crash and signals the process that triggered the copy-on-write action. It does not take any action for the memory error that is still in the shared page. To handle that a call to memory_failure() is needed. But this cannot be done from wp_page_copy() because it holds mmap_lock(). Perhaps the architecture fault handlers can deal with this loose end in a subsequent patch? On Intel/x86 this loose end will often be handled automatically because the memory controller provides an additional notification of the h/w poison in memory, the handler for this will call memory_failure(). This isn't a 100% solution. If there are multiple errors, not all may be logged in this way. [tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin] Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Igned-off-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Suren Baghdasaryan
|
a264d8efcb |
BACKPORT: mm: do not increment pgfault stats when page fault handler retries
If the page fault handler requests a retry, we will count the fault
multiple times. This is a relatively harmless problem as the retry paths
are not often requested, and the only user-visible problem is that the
fault counter will be slightly higher than it should be. Nevertheless,
userspace only took one fault, and should not see the fact that the kernel
had to retry the fault multiple times.
Move page fault accounting into mm_account_fault() and skip incomplete
faults which will be accounted upon completion.
Link: https://lkml.kernel.org/r/20230419175836.3857458-1-surenb@google.com
Fixes:
|
||
Suren Baghdasaryan
|
ebbbcdfeaf |
UPSTREAM: mm: introduce per-VMA lock statistics
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock. Link: https://lkml.kernel.org/r/20230227173632.3292573-29-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 52f238653e452e0fda61e880f263a173d219acd1) Bug: 161210518 Change-Id: I1bc9ab9bc0307af26e0c51ba12f9ad561af5b6c8 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
4e4c6989ae |
UPSTREAM: mm: prevent userfaults to be handled under per-vma lock
Due to the possibility of handle_userfault dropping mmap_lock, avoid fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Link: https://lkml.kernel.org/r/20230227173632.3292573-28-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 444eeb17437a0ef526c606e9141a415d3b7dfddd) Bug: 161210518 Change-Id: I383603d637497ea9917ad08908530f91052a17cc Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
6e306e82ac |
UPSTREAM: mm: prevent do_swap_page from handling page faults under VMA lock
Due to the possibility of do_swap_page dropping mmap_lock, abort fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Link: https://lkml.kernel.org/r/20230227173632.3292573-27-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 17c05f18e54158a3eed0c22c85b7a756b63dcc01) Bug: 161210518 Change-Id: I047f4d0e0ca3b3bf9505e5cda2da768c88bed20e Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Suren Baghdasaryan
|
c06661eab5 |
UPSTREAM: mm: fall back to mmap_lock if vma->anon_vma is not yet set
When vma->anon_vma is not set, page fault handler will set it by either reusing anon_vma of an adjacent VMA if VMAs are compatible or by allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a compatible adjacent VMA and that requires not only the faulting VMA to be stable but also the tree structure and other VMAs inside that tree. Therefore locking just the faulting VMA is not enough for this search. Fall back to taking mmap_lock when vma->anon_vma is not set. This situation happens only on the first page fault and should not affect overall performance. Link: https://lkml.kernel.org/r/20230227173632.3292573-25-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 2ac0af1b66e3b66307f53b1cc446514308ec466d) Bug: 161210518 Change-Id: Iafacad5bda7bb138b290f38421a22d828051b067 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |