android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Zichun Zheng	98a66e87c1	ANDROID: Export symbols to do reverse mapping within memcg in kernel modules. Export the symbols below to do reverse mapping within memcg: root_mem_cgroup page_referenced Bug: 296526618 Change-Id: Ia9c5876bd97d3f13c92b28af2ca5e74b3f91bd5a Signed-off-by: Zichun Zheng <zhengzichun@oppo.com>	2023-08-23 12:33:26 +00:00
Lee Jones	e36eef3783	Revert "Revert "mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse"" This reverts commit 4f35cec76058557d9eaec0d501d03c7657eb56b4 and does so in an abi-safe way. This is done by adding the new fields only to the end of the structure and this structure is only passed around to other functions as a pointer, the internal structure layout is only touched by the core kernel, so adding it to the end is safe. Update ABI using The Button: Leaf changes summary: 1 artifact changed Changed leaf types summary: 1 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable 'struct anon_vma at rmap.h:33:1' changed: type size changed from 832 to 960 (in bits) 2 data member insertions: 'unsigned long int num_children', at offset 832 (in bits) at rmap.h:74:1 'unsigned long int num_active_vmas', at offset 896 (in bits) at rmap.h:76:1 5406 impacted interfaces Bug: 260678056 Bug: 253167854 Change-Id: Ib1d45625cbc2e0b21330ca3dc2aa7aff34666d31 Signed-off-by: Lee Jones <joneslee@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-05-11 12:39:32 +00:00
Minchan Kim	c35cda5280	BACKPORT: mm: don't be stuck to rmap lock on reclaim path The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended under memory pressure if processes keep working on their vmas(e.g., fork, mmap, munmap). It makes reclaim path stuck. In our real workload traces, we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it makes other processes entering direct reclaim, which were also stuck on the lock. This patch makes lru aging path try_lock mode like shink_page_list so the reclaim context will keep working with next lru pages without being stuck. if it found the rmap lock contended, it rotates the page back to head of lru in both active/inactive lrus to make them consistent behavior, which is basic starting point rather than adding more heristic. Since this patch introduces a new "contended" field as out-param along with try_lock in-param in rmap_walk_control, it's not immutable any longer if the try_lock is set so remove const keywords on rmap related functions. Since rmap walking is already expensive operation, I doubt the const would help sizable benefit( And we didn't have it until 5.17). In a heavy app workload in Android, trace shows following statistics. It almost removes rmap lock contention from reclaim path. Martin Liu reported: Before: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function 1632 0 1631 151.542173 31672 209 page_lock_anon_vma_read 601 0 601 145.544681 28817 198 rmap_walk_file After: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function NaN NaN NaN NaN NaN 0.0 NaN 0 0 0 0.127645 1 12 rmap_walk_file [minchan@kernel.org: add comment, per Matthew] Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: John Dias <joaodias@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: folio->page (cherry picked from commit 6d4675e601357834dadd2ba1d803f6484596015c) Bug: 239681156 Bug: 252333201 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I0c63e0291120c8a1b5f2d83b8a7b210cb56c27a2 Signed-off-by: chenxin <chenxinxin@xiaomi.corp-partner.google.com>	2022-10-11 16:33:36 +00:00
Peifeng Li	f50f24e781	ANDROID: vendor_hooks: Add hooks for lookaround Add hooks for support lookaround in memory reclamation. - android_vh_test_clear_look_around_ref - android_vh_check_page_look_around_ref - android_vh_look_around_migrate_page - android_vh_look_around Bug: 241079328 Signed-off-by: Peifeng Li <lipeifeng@oppo.com> Change-Id: I9a606ae71d2f1303df3b02403b30bc8fdc9d06dd	2022-09-13 19:07:41 +08:00
Peifeng Li	1f8f6d59a2	ANDROID: vendor_hook: Add hook to not be stuck ro rmap lock in kswapd or direct_reclaim Add hooks to support trylock in rmaplock when reclaiming in kswapd or direct_reclaim, in order to avoid wait lock for a long time. - android_vh_handle_failed_page_trylock - android_vh_page_trylock_set - android_vh_page_trylock_clear - android_vh_page_trylock_get_result - android_vh_do_page_trylock Bug: 240003372 Signed-off-by: Peifeng Li <lipeifeng@oppo.com> Change-Id: I0f605b35ae41f15b3ca7bc72cd5f003175c318a5	2022-08-14 23:08:07 +08:00
Peifeng Li	3f775b9367	ANDROID: vendor_hooks: account page-mapcount Support five hooks as follows to account the amount of multi-mapped pages in kernel: - android_vh_show_mapcount_pages - android_vh_do_traversal_lruvec - android_vh_update_page_mapcount - android_vh_add_page_to_lrulist - android_vh_del_page_from_lrulist Bug: 236578020 Signed-off-by: Peifeng Li <lipeifeng@oppo.com> Change-Id: Ia2c7015aab442be7dbb496b8b630b9dff59ab935	2022-08-03 20:10:45 +00:00
Greg Kroah-Hartman	9c2a5eef8f	Merge tag 'android12-5.10.117_r00' into 'android12-5.10' This is the merge of the upstream LTS release of 5.10.117 into the android12-5.10 branch. It contains the following commits: `fdd06dc6b0` ANDROID: GKI: db845c: Update symbols list and ABI `0974b8411a` Merge 5.10.117 into android12-5.10-lts `7686a5c2a8` Linux 5.10.117 `937c6b0e3e` SUNRPC: Fix fall-through warnings for Clang `29f077d070` io_uring: always use original task when preparing req identity `1444e0568b` usb: gadget: uvc: allow for application to cleanly shutdown `42505e3622` usb: gadget: uvc: rename function to be more consistent `002e7223dc` ping: fix address binding wrt vrf `d9a1e82bf6` arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map `49750c5e9a` net: phy: Fix race condition on link status change `e68b60ae29` SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() `dbe6974a39` SUNRPC: Don't call connect() more than once on a TCP socket `47541ed4d4` SUNRPC: Prevent immediate close+reconnect `2ab569edd8` SUNRPC: Clean up scheduling of autoclose `85844ea29f` drm/vmwgfx: Initialize drm_mode_fb_cmd2 `7e849dbe60` cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() `6aa239d82e` net: atlantic: always deep reset on pm op, fixing up my null deref regression `6158df4fa5` i40e: i40e_main: fix a missing check on list iterator `819796024c` drm/nouveau/tegra: Stop using iommu_present() `e06605af8b` ceph: fix setting of xattrs on async created inodes `86db01f373` serial: 8250_mtk: Fix register address for XON/XOFF character `84ad84e495` serial: 8250_mtk: Fix UART_EFR register address `f8d8440f13` slimbus: qcom: Fix IRQ check in qcom_slim_probe `d7b7c5532a` USB: serial: option: add Fibocom MA510 modem `2ba0034e36` USB: serial: option: add Fibocom L610 modem `319b312edb` USB: serial: qcserial: add support for Sierra Wireless EM7590 `994395f356` USB: serial: pl2303: add device id for HP LM930 Display `8276a3dbe2` usb: typec: tcpci_mt6360: Update for BMC PHY setting `54979aa49e` usb: typec: tcpci: Don't skip cleanup in .remove() on error `7335a6b11d` usb: cdc-wdm: fix reading stuck on device close `6d47eceaf3` tty: n_gsm: fix mux activation issues in gsm_config() `69139a45b8` tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe() `5a73581116` firmware_loader: use kernel credentials when reading firmware `d254309aab` tcp: resalt the secret every 10 seconds `3abbfac1ab` net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT `48f1dd67a8` net: emaclite: Don't advertise 1000BASE-T and do auto negotiation `5c09dbdfd4` s390: disable -Warray-bounds `03ebc6fd5c` ASoC: ops: Validate input values in snd_soc_put_volsw_range() `31606a73ba` ASoC: max98090: Generate notifications on changes for custom control `ce154bd3bc` ASoC: max98090: Reject invalid values in custom control put() `5ecaaaeb2c` hwmon: (f71882fg) Fix negative temperature `88091c0275` gfs2: Fix filesystem block deallocation for short writes `fccf4bf3f2` tls: Fix context leak on tls_device_down `161c4edeca` net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() `d5e1b41bf7` net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending `e417a8fcea` net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() `9012209f43` net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral `abe35bf3be` net/sched: act_pedit: really ensure the skb is writable `b816ed53f3` s390/lcs: fix variable dereferenced before check `4d3c6d7418` s390/ctcm: fix potential memory leak `5497f87edc` s390/ctcm: fix variable dereferenced before check `cc71c9f17c` selftests: vm: Makefile: rename TARGETS to VMTARGETS `ce12e5ff8d` hwmon: (ltq-cputemp) restrict it to SOC_XWAY `ceb3db723f` dim: initialize all struct fields `8b1b8fc819` ionic: fix missing pci_release_regions() on error in ionic_probe() `2cb8689f45` nfs: fix broken handling of the softreval mount option `49c10784b9` mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection `79432d2237` net: sfc: fix memory leak due to ptp channel `bdb8d4aed1` sfc: Use swap() instead of open coding it `33c93f6e55` netlink: do not reset transport header in netlink_recvmsg() `9e40f2c513` drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name() `54f26fc07e` ipv4: drop dst in multicast routing path `c07a84492f` net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters `abb237c544` net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0 `f9674c52a1` net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups `c1184d2888` net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted `e2cdde89d2` net: Fix features skip in for_each_netdev_feature() `c420d66047` mac80211: Reset MBSSID parameters upon connection `9cbf2a7d5d` hwmon: (tmp401) Add OF device ID table `85eba08be2` iwlwifi: iwl-dbg: Use del_timer_sync() before freeing `a6a73781b4` batman-adv: Don't skb_split skbuffs with frag_list `0577ff1c69` Merge 5.10.116 into android12-5.10-lts `3f70116e5f` Merge 5.10.115 into android12-5.10-lts `07a4d3649a` Linux 5.10.116 `d1ac096f88` mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() `c6cbf5431a` mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() `308ff6a6e7` mm: fix missing cache flush for all tail pages of compound page `185fa5984d` Bluetooth: Fix the creation of hdev->name `9ff4a6b806` arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL `dfb55dcf9d` nfp: bpf: silence bitwise vs. logical OR warning `f89f76f4b0` drm/amd/display/dc/gpio/gpio_service: Pass around correct dce_{version, environment} types `efd1429fa9` block: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit `a71658c7db` regulator: consumer: Add missing stubs to regulator/consumer.h `7648f42d1a` MIPS: Use address-of operator on section symbols `2ed28105c6` ANDROID: GKI: update the abi .xml file due to hex_to_bin() changes `ee8877df71` Revert "tcp: ensure to use the most recently sent skb when filling the rate sample" `6273d79c86` Merge 5.10.114 into android12-5.10-lts `e61686bb77` Linux 5.10.115 `8528806abe` mmc: rtsx: add 74 Clocks in power on flow `e1ab92302b` PCI: aardvark: Fix reading MSI interrupt number `49143c9ed2` PCI: aardvark: Clear all MSIs at setup `7676a5b99f` dm: interlock pending dm_io and dm_wait_for_bios_completion `a439819f47` block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern `a22d66eb51` rcu: Apply callbacks processing time limit only on softirq `40fb3812d9` rcu: Fix callbacks processing time limit retaining cond_resched() `43dbc3edad` KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised `9c8474fa34` KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs `a474ee5ece` KVM: x86: Do not change ICR on write to APIC_SELF_IPI `64e3e16dbc` x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume `5f884e0c2e` net/mlx5: Fix slab-out-of-bounds while reading resource dump menu `599fc32e74` kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU `0a960a3672` net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() `4fd45ef704` btrfs: always log symlinks in full mode `687167eef9` smsc911x: allow using IRQ0 `b280877eab` selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer `a9fd5d6cd5` bnxt_en: Fix unnecessary dropping of RX packets `72e4fc1a4e` bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag `9ac9f07f0f` selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational `475237e807` hinic: fix bug of wq out of bound access `1b9f1f455d` net: emaclite: Add error handling for of_address_to_resource() `8459485db7` net: cpsw: add missing of_node_put() in cpsw_probe_dt() `4eee980950` net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() `2347e9c922` net: dsa: mt7530: add missing of_node_put() in mt7530_setup() `1092656cc4` net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() `408fb2680e` NFSv4: Don't invalidate inode attributes on delegation return `c1b480e6be` RDMA/siw: Fix a condition race issue in MPA request processing `5bf2a45e33` selftests/seccomp: Don't call read() on TTY from background pgrp `3ea0b44c01` net/mlx5: Avoid double clear or set of sync reset requested `2455331591` net/mlx5e: Fix the calling of update_buffer_lossy() API `e07c13fbdd` net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release `d8338a7a09` net/mlx5e: Don't match double-vlan packets if cvlan is not set `c7f87ad115` net/mlx5e: Fix trust state reset in reload `87f0d9a518` ASoC: dmaengine: Restore NULL prepare_slave_config() callback `ad87f8498e` hwmon: (adt7470) Fix warning on module removal `997b8605e8` gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) `879b075a9a` NFC: netlink: fix sleep in atomic bug when firmware download timeout `1961c5a688` nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs `8a9e7c64f4` nfc: replace improper check device_is_registered() in netlink related functions `11adc9ab3e` can: grcan: only use the NAPI poll budget for RX `4df5e498e0` can: grcan: grcan_probe(): fix broken system id check for errata workaround needs `dd973c0185` can: grcan: use ofdev->dev when allocating DMA memory `45bdcb5ca4` can: isotp: remove re-binding of bound socket `13959b9117` can: grcan: grcan_close(): fix deadlock `6c7c0e131e` s390/dasd: Fix read inconsistency for ESE DASD devices `6e02c0413a` s390/dasd: Fix read for ESE with blksize < 4k `ecc8396827` s390/dasd: prevent double format of tracks for ESE devices `30e008ab3f` s390/dasd: fix data corruption for ESE devices `d53d47fadd` ASoC: meson: Fix event generation for AUI CODEC mux `93a1f0755e` ASoC: meson: Fix event generation for G12A tohdmi mux `e8b08e2f17` ASoC: meson: Fix event generation for AUI ACODEC mux `954d55170f` ASoC: wm8958: Fix change notifications for DSP controls `f45359824a` ASoC: da7219: Fix change notifications for tone generator frequency `e6e61aab49` genirq: Synchronize interrupt thread startup `dcf1150f2e` net: stmmac: disable Split Header (SPH) for Intel platforms `68f35987d4` firewire: core: extend card->lock in fw_core_handle_bus_reset `629b4003a7` firewire: remove check of list iterator against head past the loop body `e757ff4bbc` firewire: fix potential uaf in outbound_phy_packet_callback() `70d25d4fba` Revert "SUNRPC: attempt AF_LOCAL connect on setup" `466721d767` drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT `2e6f3d665a` iommu/vt-d: Calculate mask for non-aligned flushes `fbb7c61e76` KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id `b085afe226` gpiolib: of: fix bounds check for 'gpio-reserved-ranges' `2b7cb072d0` mmc: core: Set HS clock speed before sending HS CMD13 `66651d7199` mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC `2906c73632` ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes `03ab174805` ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers `a196f277c5` parisc: Merge model and model name into one line in /proc/cpuinfo `326f02f172` MIPS: Fix CP0 counter erratum detection for R4k CPUs `681997eca1` Revert "ipv6: make ip6_rt_gc_expire an atomic_t" `141fbd343b` Revert "oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup" `ca9b002a16` Merge 5.10.113 into android12-5.10-lts `f64cd19a00` Merge branch 'android12-5.10' into `android12-5.10-lts` `f40e35e79c` Linux 5.10.114 `2d74f61787` perf symbol: Remove arch__symbols__fixup_end() `bf98302e68` tty: n_gsm: fix software flow control handling `95b267271a` tty: n_gsm: fix incorrect UA handling `70b045d9ae` tty: n_gsm: fix reset fifo race condition `320a24c4ef` tty: n_gsm: fix wrong command frame length field encoding `935f314b6f` tty: n_gsm: fix wrong command retry handling `17b86db43c` tty: n_gsm: fix missing explicit ldisc flush `a2baa907c2` tty: n_gsm: fix wrong DLCI release order `705925e693` tty: n_gsm: fix insufficient txframe size `842a9bbbef` netfilter: nft_socket: only do sk lookups when indev is available `7346e54dbf` tty: n_gsm: fix malformed counter for out of frame data `d19613895e` tty: n_gsm: fix wrong signal octet encoding in convergence layer type 2 `26f127f6d9` tty: n_gsm: fix mux cleanup after unregister tty device `f26c271492` tty: n_gsm: fix decoupled mux resource `47132f9f7f` tty: n_gsm: fix restart handling via CLD command `b3c88d46db` perf symbol: Update symbols__fixup_end() `3d0a3168a3` perf symbol: Pass is_kallsyms to symbols__fixup_end() `2ab14625b8` x86/cpu: Load microcode during restore_processor_state() `795afbe8b4` thermal: int340x: Fix attr.show callback prototype `11d16498d7` net: ethernet: stmmac: fix write to sgmii_adapter_base `236dd62230` drm/i915: Fix SEL_FETCH_PLANE_(PIPE_B+) register addresses `78d4dccf16` kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time `5fef6df273` zonefs: Clear inode information flags on inode creation `92ed64a920` zonefs: Fix management of open zones `42e8ec3b4b` powerpc/perf: Fix 32bit compile `ac3d077043` drivers: net: hippi: Fix deadlock in rr_close() `5399e7b80c` cifs: destage any unwritten data to the server before calling copychunk_write `80fc45377f` x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 `585ef03c9e` ext4: fix bug_on in start_this_handle during umount filesystem `07da0be588` ASoC: wm8731: Disable the regulator when probing fails `1b1747ad7e` ASoC: Intel: soc-acpi: correct device endpoints for max98373 `aa138efd2b` tcp: fix F-RTO may not work correctly when receiving DSACK `9d56e369bd` Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" `96904c8289` ibmvnic: fix miscellaneous checks `17f71272ef` ixgbe: ensure IPsec VF<->PF compatibility `c33d717e06` net: fec: add missing of_node_put() in fec_enet_init_stop_mode() `9591967ac4` bnx2x: fix napi API usage sequence `1781beb879` tls: Skip tls_append_frag on zero copy size `77b922683e` drm/amd/display: Fix memory leak in dcn21_clock_source_create `18068e0527` drm/amdkfd: Fix GWS queue count `c0396f5e5b` net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK `1204386e26` net: phy: marvell10g: fix return value on error `e974c730f0` net: bcmgenet: hide status block before TX timestamping `ee71b47da5` clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() `8dacbef4fe` bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() `9f29f6f8da` tcp: make sure treq->af_specific is initialized `8a9d6ca360` tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT `720b6ced85` ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode `41661b4c1a` ip6_gre: Make o_seqno start from 0 in native mode `7b187fbd7e` ip_gre: Make o_seqno start from 0 in native mode `83d128daff` net/smc: sync err code when tcp connection was refused `9eb25e00f5` net: hns3: add return value for mailbox handling in PF `929c30c02d` net: hns3: add validity check for message data length `e3ec78d82d` net: hns3: modify the return code of hclge_get_ring_chain_from_mbx `06a40e7105` cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe `fb172e93f8` pinctrl: pistachio: fix use of irq_of_parse_and_map() `8f042884af` arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock `73c35379db` ARM: dts: imx6ull-colibri: fix vqmmc regulator `61a89d0a5b` sctp: check asoc strreset_chunk in sctp_generate_reconf_event `41d6ac687d` wireguard: device: check for metadata_dst with skb_valid_dst() `3c464db03c` tcp: ensure to use the most recently sent skb when filling the rate sample `ce4c3f7087` pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested `0c60271df0` tcp: md5: incorrect tcp_header_len for incoming connections `f4dad5a48d` pinctrl: rockchip: fix RK3308 pinmux bits `9ef33d23f8` bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook `6ac03e6ddd` netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion `72ae15d5ce` net: dsa: Add missing of_node_put() in dsa_port_link_register_of `14cc2044c1` memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode `690c1bc4bf` pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI `6f2bf9c5dd` mtd: fix 'part' field data corruption in mtd_info `4da421035b` mtd: rawnand: Fix return value check of wait_for_completion_timeout `94ca69b702` pinctrl: mediatek: moore: Fix build error `123b7e0388` ipvs: correctly print the memory size of ip_vs_conn_tab `f4446f2136` ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35 `4a526cc29c` ARM: dts: am3517-evm: Fix misc pinmuxing `b622bca852` ARM: dts: Fix mmc order for omap3-gta04 `9419d27fe1` phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe `9e00a6e1fd` phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe `eb659608e6` ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name `bb524f5a95` ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek `4691ce8f28` phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks `76d1591a38` bus: ti-sysc: Make omap3 gpt12 quirk handling SoC specific `1b9855bf31` ARM: OMAP2+: Fix refcount leak in omap_gic_of_init `93cc8f184e` phy: samsung: exynos5250-sata: fix missing device put in probe error paths `3ca7491570` phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe `8f7644ac24` ARM: dts: imx6qdl-apalis: Fix sgtl5000 detection issue `23b0711fcd` USB: Fix xhci event ring dequeue pointer ERDP update issue `712302aed1` mtd: rawnand: fix ecc parameters for mt7622 `207c7af341` iio:imu:bmi160: disable regulator in error path `70d2df257e` arm64: dts: meson: remove CPU opps below 1GHz for SM1 boards `2d320609be` arm64: dts: meson: remove CPU opps below 1GHz for G12B boards `c4fb41bdf4` video: fbdev: udlfb: properly check endpoint type `0967830e72` iocost: don't reset the inuse weight of under-weighted debtors `ad604cbd1d` x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests `8fcce58c59` riscv: patch_text: Fixup last cpu should be master `51477d3b38` hex2bin: fix access beyond string end `616d354fb9` hex2bin: make the function hex_to_bin constant-time `1633cb2d4a` pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config `bdc3ad9251` arch_topology: Do not set llc_sibling if llc_id is invalid `aaee3f6617` serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device `662f945a20` serial: 8250: Also set sticky MCR bits in console restoration `8be962c89d` serial: imx: fix overrun interrupts in DMA mode `d22d92230f` usb: phy: generic: Get the vbus supply `b820764c64` usb: cdns3: Fix issue for clear halt endpoint `bd7f84708e` usb: dwc3: gadget: Return proper request status `a633b8c341` usb: dwc3: core: Only handle soft-reset in DCTL `5fa59bb867` usb: dwc3: core: Fix tx/rx threshold settings `140801d3fb` usb: dwc3: Try usb-role-switch first in dwc3_drd_init `4dd5feb279` usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind() `6c3da0e19c` usb: gadget: uvc: Fix crash when encoding data for usb request `fb1fe1a455` usb: typec: ucsi: Fix role swapping `06826eb063` usb: typec: ucsi: Fix reuse of completion structure `7b510d4bb4` usb: misc: fix improper handling of refcount in uss720_probe() `bb8ecca2dd` iio: imu: inv_icm42600: Fix I2C init possible nack `ca2b54b6ad` iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on() `1060604fc7` iio: dac: ad5446: Fix read_raw not returning set value `6ff33c01be` iio: dac: ad5592r: Fix the missing return value. `06ada9487f` xhci: increase usb U3 -> U0 link resume timeout from 100ms to 500ms `e1be000166` xhci: stop polling roothubs after shutdown `2eb6c86891` xhci: Enable runtime PM on second Alderlake controller `63eda431b2` USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions `e9971dac69` USB: serial: option: add support for Cinterion MV32-WA/MV32-WB `34ff5455ee` USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader `729a81ae10` USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS `008ba29f33` USB: quirks: add STRING quirk for VCOM device `ac6ad0ef83` USB: quirks: add a Realtek card reader `8ba02cebb7` usb: mtu3: fix USB 3.0 dual-role-switch from device to host `549209caab` lightnvm: disable the subsystem `54c028cfc4` floppy: disable FDRAWCMD by default `de64d941a7` Merge 5.10.112 into android12-5.10-lts `54af9dd2b9` Linux 5.10.113 `7992fdb045` Revert "net: micrel: fix KS8851_MLL Kconfig" `8bedbc8f7f` block/compat_ioctl: fix range check in BLKGETSIZE `fea24b07ed` staging: ion: Prevent incorrect reference counting behavour `dccee748af` spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller `572761645b` jbd2: fix a potential race while discarding reserved buffers after an abort `50aac44273` can: isotp: stop timeout monitoring when no first frame was sent `e1e96e3727` ext4: force overhead calculation if the s_overhead_cluster makes no sense `4789149b9e` ext4: fix overhead calculation to account for the reserved gdt blocks `0c54b09376` ext4, doc: fix incorrect h_reserved size `22c450d39f` ext4: limit length to bitmap_maxbytes - blocksize in punch_hole `75ac724684` ext4: fix use-after-free in ext4_search_dir `a46b3d8498` ext4: fix symlink file size not match to file content `f6038d43b2` ext4: fix fallocate to use file_modified to update permissions consistently `19590bbc69` perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event `e012f9d1af` powerpc/perf: Fix power9 event alternatives `0a2cef65b3` drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage `f8f8b3124b` KVM: PPC: Fix TCE handling for VFIO `405d984274` drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare `231381f521` drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised `51d9cbbb0f` perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled `88fcfd6ee6` sched/pelt: Fix attach_entity_load_avg() corner case `c55327bc37` arm_pmu: Validate single/group leader events `5580b974a8` ARC: entry: fix syscall_trace_exit argument `7082650eb8` e1000e: Fix possible overflow in LTR decoding `43a2a3734a` ASoC: soc-dapm: fix two incorrect uses of list iterator `54e6180c8c` gpio: Request interrupts after IRQ is initialized `0837ff17d0` openvswitch: fix OOB access in reserve_sfa_size() `19f6dcb1f0` xtensa: fix a7 clobbering in coprocessor context load/store `f399ab11dd` xtensa: patch_text: Fixup last cpu should be master `ba2716da23` net: atlantic: invert deep par in pm functions, preventing null derefs `358a3846f6` dma: at_xdmac: fix a missing check on list iterator `cf23a960c5` ata: pata_marvell: Check the 'bmdma_addr' beforing reading `9ca66d7914` mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() `ed5d4efb4d` oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup `6b932920b9` mm, hugetlb: allow for "high" userspace addresses `50cbc583fa` EDAC/synopsys: Read the error count from the correct register `7ec6e06ee4` nvme-pci: disable namespace identifiers for Qemu controllers `316bd86c22` nvme: add a quirk to disable namespace identifiers `76101c8e0c` stat: fix inconsistency between struct stat and struct compat_stat `bf28bba304` scsi: qedi: Fix failed disconnect handling `a284cca3d8` net: macb: Restart tx only if queue pointer is lagging `9581e07b54` drm/msm/mdp5: check the return of kzalloc() `8d71edabb0` dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info() `b3afe5a7fd` brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant `202748f441` mt76: Fix undefined behavior due to shift overflowing the constant `0de9c104d0` net: atlantic: Avoid out-of-bounds indexing `5bef9fc38f` cifs: Check the IOCB_DIRECT flag, not O_DIRECT `e129c55153` vxlan: fix error return code in vxlan_fdb_append `8e7ea11364` arm64: dts: imx: Fix imx8-var-som touchscreen property sizes `cd227ac03f` ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant `490815f0b5` platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative `cb17b56a9b` reset: tegra-bpmp: Restore Handle errors in BPMP response `d513ea9b7e` ARM: vexpress/spc: Avoid negative array index when !SMP `052e4a661f` arm64: mm: fix p?d_leaf() `18ff7a2efa` arm64/mm: Remove [PUD\|PMD]_TABLE_BIT from [pud\|pmd]_bad() `3bf8ca3501` selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets `520aab8b72` dmaengine: idxd: add RO check for wq max_transfer_size write `9a3c026dc3` dmaengine: idxd: add RO check for wq max_batch_size write `f593f49fcd` net: stmmac: Use readl_poll_timeout_atomic() in atomic state `3d55b19574` netlink: reset network and mac headers in netlink_dump() `49516e6ed9` ipv6: make ip6_rt_gc_expire an atomic_t `078d839f11` l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu `0ac8f83d8f` net/sched: cls_u32: fix possible leak in u32_init_knode() `93366275be` ip6_gre: Fix skb_under_panic in __gre6_xmit() `200f96ebb3` ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() `8fb76adb89` net/packet: fix packet_sock xmit return value checking `a499cb5f3e` net/smc: Fix sock leak when release after smc_shutdown() `60592f16a4` rxrpc: Restore removed timer deletion `fc7116a79a` igc: Fix BUG: scheduling while atomic `46b0e4f998` igc: Fix infinite loop in release_swfw_sync `c075c3ea03` esp: limit skb_page_frag_refill use to a single page `3f7914dbea` spi: spi-mtk-nor: initialize spi controller after resume `f714abf28f` dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources `9bc949a181` dmaengine: imx-sdma: Fix error checking in sdma_event_remap `12aa8021c7` ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use `b6f474cd30` ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component `608fc58858` ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek `d29c78d3f9` dm: fix mempool NULL pointer race when completing IO `cf9b195464` ALSA: hda/realtek: Add quirk for Clevo NP70PNP `8ce3820fc9` ALSA: usb-audio: Clear MIDI port active flag after draining `43ce33a68e` net/sched: cls_u32: fix netns refcount changes in u32_change() `04dd45d977` gfs2: assign rgrp glock before compute_bitstructs `378061c9b8` perf tools: Fix segfault accessing sample_id xyarray `5e8446e382` tracing: Dump stacktrace trigger to the corresponding instance `69848f9488` mm: page_alloc: fix building error on -Werror=array-compare `08ad7a770e` etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead `904c5c08bb` ANDROID: fix up gpio change in 5.10.111 `5dadf6321c` Merge 5.10.111 into android12-5.10-lts `1052f9bce6` Linux 5.10.112 `5c62d3bf14` ax25: Fix UAF bugs in ax25 timers `f934fa478d` ax25: Fix NULL pointer dereferences in ax25 timers `145ea8d213` ax25: fix NPD bug in ax25_disconnect `a4942c6fea` ax25: fix UAF bug in ax25_send_control() `b20a5ab0f5` ax25: Fix refcount leaks caused by ax25_cb_del() `57cc15f5fd` ax25: fix UAF bugs of net_device caused by rebinding operation `5ddae8d064` ax25: fix reference count leaks of ax25_dev `5ea00fc606` ax25: add refcount in ax25_dev to avoid UAF bugs `361288633b` scsi: iscsi: Fix unbound endpoint error handling `129db30599` scsi: iscsi: Fix endpoint reuse regression `26f827e095` dma-direct: avoid redundant memory sync for swiotlb `9a5a4d23e2` timers: Fix warning condition in __run_timers() `84837f43e5` i2c: pasemi: Wait for write xfers to finish `89496d80bf` smp: Fix offline cpu check in flush_smp_call_function_queue() `cd02b2687d` dm integrity: fix memory corruption when tag_size is less than digest size `0a312ec66a` ARM: davinci: da850-evm: Avoid NULL pointer dereference `0806f19305` tick/nohz: Use WARN_ON_ONCE() to prevent console saturation `0275c75955` genirq/affinity: Consider that CPUs on nodes can be unbalanced `1fcfe37d17` drm/amdgpu: Enable gfxoff quirk on MacBook Pro `68ae52efa1` drm/amd/display: don't ignore alpha property on pre-multiplied mode `a263712ba8` ipv6: fix panic when forwarding a pkt with no in6 dev `659214603b` nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size `912797e54c` ALSA: pcm: Test for "silence" field in struct "pcm_format_data" `48d070ca5e` ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakers `163e162471` ALSA: hda/realtek: Add quirk for Clevo PD50PNT `5e4dd17998` btrfs: mark resumed async balance as writing `1d2eda18f6` btrfs: fix root ref counts in error handling in btrfs_get_root_ref `9b7ec35253` ath9k: Fix usage of driver-private space in tx_info `0f65cedae5` ath9k: Properly clear TX status area before reporting to mac80211 `cc21ae9326` gcc-plugins: latent_entropy: use /dev/urandom `c089ffc846` memory: renesas-rpc-if: fix platform-device leak in error path `342454231e` KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded `06c348fde5` mm: kmemleak: take a full lowmem check in kmemleak_*_phys() `20ed94f818` mm: fix unexpected zeroed page mapping with zram swap `192e507ef8` mm, page_alloc: fix build_zonerefs_node() `000b3921b4` perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant `ca24c5e8f0` drivers: net: slip: fix NPD bug in sl_tx_timeout() `e8cf1e4d95` scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan `5b7ce74b6b` scsi: mvsas: Add PCI ID of RocketRaid 2640 `4b44cd5840` drm/amd/display: Fix allocate_mst_payload assert on resume `34ea097fb6` drm/amd/display: Revert FEC check in validation `fa5ee7c423` myri10ge: fix an incorrect free for skb in myri10ge_sw_tso `d90df6da50` net: usb: aqc111: Fix out-of-bounds accesses in RX fixup `9c12fcf1d8` net: axienet: setup mdio unconditionally `b643807a73` tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry `98973d2bdd` arm64: alternatives: mark patch_alternative() as `noinstr` `2462faffbf` regulator: wm8994: Add an off-on delay for WM8994 variant `aa8cdedaf7` gpu: ipu-v3: Fix dev_dbg frequency output `150fe861c5` ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs `1ff5359afa` net: micrel: fix KS8851_MLL Kconfig `d3478709ed` scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024 `b9a110fa75` scsi: lpfc: Fix queue failures when recovering from PCI parity error `aec36b98a1` scsi: target: tcmu: Fix possible page UAF `4366679805` Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer `1d7a5aae88` drm/amdkfd: Check for potential null return of kmalloc_array() `e5afacc826` drm/amdgpu/vcn: improve vcn dpg stop procedure `d2e0931e6d` drm/amdkfd: Fix Incorrect VMIDs passed to HWS `7fc0610ad8` drm/amd/display: Update VTEM Infopacket definition `6906e05cf3` drm/amd/display: FEC check in timing validation `756c61c168` drm/amd/display: fix audio format not updated after edid updated `76e086ce7b` btrfs: do not warn for free space inode in cow_file_range `217190dc66` btrfs: fix fallocate to use file_modified to update permissions consistently `9b5d1b3413` drm/amd: Add USBC connector ID `6f9c06501d` net: bcmgenet: Revert "Use stronger register read/writes to assure ordering" `504c15f07f` dm mpath: only use ktime_get_ns() in historical selector `4e166a4118` cifs: potential buffer overflow in handling symlinks `67677050ce` nfc: nci: add flush_workqueue to prevent uaf `bfba9722cf` perf tools: Fix misleading add event PMU debug message `280f721edc` testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set `eb8873b324` sctp: Initialize daddr on peeled off socket `45226fac4d` scsi: iscsi: Fix conn cleanup and stop race during iscsid restart `73805795c9` scsi: iscsi: Fix offload conn cleanup when iscsid restarts `699bd835c3` scsi: iscsi: Move iscsi_ep_disconnect() `46f37a34a5` scsi: iscsi: Fix in-kernel conn failure handling `8125738967` scsi: iscsi: Rel ref after iscsi_lookup_endpoint() `22608545b8` scsi: iscsi: Use system_unbound_wq for destroy_work `4029a1e992` scsi: iscsi: Force immediate failure during shutdown `17d14456f6` scsi: iscsi: Stop queueing during ep_disconnect `da9cf24aa7` scsi: pm80xx: Enable upper inbound, outbound queues `e08d269712` scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63 `35b91e49bc` net/smc: Fix NULL pointer dereference in smc_pnet_find_ib() `98a7f6c4ad` drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init() `5f78ad9383` drm/msm: Fix range size vs end confusion `5513f9a0b0` cfg80211: hold bss_lock while updating nontrans_list `a44938950e` net/sched: taprio: Check if socket flags are valid `08d5e3e954` net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link `2ad9d890d8` net: dsa: felix: suppress -EPROBE_DEFER errors `f2cc341fcc` net/sched: fix initialization order when updating chain 0 head `7a7cf84148` mlxsw: i2c: Fix initialization error flow `43e58e119a` net: mdio: Alphabetically sort header inclusion `9709c8b5cd` gpiolib: acpi: use correct format characters `d67c900f19` veth: Ensure eth header is in skb's linear part `845f44ce3d` net/sched: flower: fix parsing of ethertype following VLAN header `85ee17ca21` SUNRPC: Fix the svc_deferred_event trace class `af12dd7123` media: rockchip/rga: do proper error checking in probe `5637129712` firmware: arm_scmi: Fix sorting of retrieved clock rates `16c628b0c6` memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe `cb66641f81` drm/msm: Add missing put_task_struct() in debugfs path `921fdc45a0` btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups() `5d131318bb` ACPI: processor idle: Check for architectural support for LPI `503934df31` cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function `cfa98ffc42` hamradio: remove needs_free_netdev to avoid UAF `80a4df1464` hamradio: defer 6pack kfree after unregister_netdev `f0c31f192f` drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu `6c8e5cb264` Linux 5.10.111 `d36febbcd5` powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit `5c672073bc` mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning `5973f7507a` irqchip/gic, gic-v3: Prevent GSI to SGI translations `000e09462f` Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() `e1f540b752` arm64: module: remove (NOLOAD) from linker script `919823bd67` selftests: cgroup: Test open-time cgroup namespace usage for migration checks `637eca44b8` selftests: cgroup: Test open-time credential usage for migration checks `9dd39d2c65` selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 `e74da71e66` selftests/cgroup: Fix build on older distros `4665722d36` cgroup: Use open-time credentials for process migraton perm checks `f089471d1b` mm: don't skip swap entry even if zap_details specified `58823a9b09` ubsan: remove CONFIG_UBSAN_OBJECT_SIZE `03b39bbbec` dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" `40e00885a6` tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts `75c8558d41` tools build: Filter out options and warnings not supported by clang `6374faf49e` perf python: Fix probing for some clang command line options `79abc219ba` perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13 `82e4395014` drm/amdkfd: Create file descriptor after client is added to smi_clients list `326b408e7e` drm/nouveau/pmu: Add missing callbacks for Tegra devices `786ae8de3a` drm/amdgpu/smu10: fix SoC/fclk units in auto mode `ff24114bb0` irqchip/gic-v3: Fix GICR_CTLR.RWP polling `451214b266` perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator `fc629224aa` ata: sata_dwc_460ex: Fix crash due to OOB write `7e88a50704` gpio: Restrict usage of GPIO chip irq members before initialization `5f54364ff6` RDMA/hfi1: Fix use-after-free bug for mm struct `8bb4168291` arm64: patch_text: Fixup last cpu should be master `a044bca8ef` btrfs: prevent subvol with swapfile from being deleted `82ae73ac96` btrfs: fix qgroup reserve overflow the qgroup limit `fc4bdaed4d` x86/speculation: Restore speculation related MSRs during S3 resume `8c9e26c890` x86/pm: Save the MSR validity status at context setup `2827328e64` io_uring: fix race between timeout flush and removal `f7e183b0a7` mm/mempolicy: fix mpol_new leak in shared_policy_replace `7d659cb176` mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) `6adc01a7aa` lz4: fix LZ4_decompress_safe_partial read out of bound `8b6f04b4c9` mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete `029b417073` mmc: mmci: stm32: correctly check all elements of sg list `41a519c05b` Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" `9de98470db` arm64: Add part number for Arm Cortex-A78AE `4604b5738d` perf session: Remap buf if there is no space for event `362ced3769` perf tools: Fix perf's libperf_print callback `65210fac63` perf: arm-spe: Fix perf report --mem-mode `bd905fed87` iommu/omap: Fix regression in probe for NULL pointer dereference `b3c00be2ff` SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() `9a45e08636` SUNRPC: Handle low memory situations in call_status() `132cbe2f18` SUNRPC: Handle ENOMEM in call_transmit_status() `aed30a2054` io_uring: don't touch scm_fp_list after queueing skb `594205b493` drbd: Fix five use after free bugs in get_initial_state `970a6bb729` bpf: Support dual-stack sockets in bpf_tcp_check_syncookie `6c17f4ef3c` spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() `8928239e5e` qede: confirm skb is allocated before using `b7893388bb` net: phy: mscc-miim: reject clause 45 register accesses `08ff0e74fa` rxrpc: fix a race in rxrpc_exit_net() `5ae05b5eb5` net: openvswitch: fix leak of nested actions `42ab401d22` net: openvswitch: don't send internal clone attribute to the userspace. `e54ea8fc51` ice: synchronize_rcu() when terminating rings `e3dd1202ab` ipv6: Fix stats accounting in ip6_pkt_drop `ffce126c95` ice: Do not skip not enabled queues in ice_vc_dis_qs_msg `b003fc4913` ice: Set txq_teid to ICE_INVAL_TEID on ring creation `ebd1e3458d` dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe `43c2d7890e` IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition `3a57babfb6` RDMA/mlx5: Don't remove cache MRs when a delay is needed `d8992b393f` sfc: Do not free an empty page_ring `0ac74169eb` bnxt_en: reserve space inside receive page for skb_shared_info `f8b0ef0a58` drm/imx: Fix memory leak in imx_pd_connector_get_modes `25bc9fd4c8` drm/imx: imx-ldb: Check for null pointer after calling kmemdup `02ab4abe5b` net: stmmac: Fix unset max_speed difference between DT and non-DT platforms `63ea57478a` net: ipv4: fix route with nexthop object delete warning `4be6ed0310` ice: Clear default forwarding VSI during VSI release `589154d0f1` net/tls: fix slab-out-of-bounds bug in decrypt_internal `c5f77b5953` scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() `45b9932b4d` NFSv4: fix open failure with O_ACCMODE flag `c688705a39` Revert "NFSv4: Handle the special Linux file open access mode" `cf580d2e38` Drivers: hv: vmbus: Fix potential crash on module unload `0c122eb3a1` drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() `84e5dfc05f` Revert "hv: utils: add PTP_1588_CLOCK to Kconfig to fix build" `3c3fbfa6dd` mm: fix race between MADV_FREE reclaim and blkdev direct IO read `1753a49e26` parisc: Fix patch code locking and flushing `f7c3522030` parisc: Fix CPU affinity for Lasi, WAX and Dino chips `c74e2f6ecc` NFS: Avoid writeback threads getting stuck in mempool_alloc() `34681aeddc` NFS: nfsiod should not block forever in mempool_alloc() `7a506fabcf` SUNRPC: Fix socket waits for write buffer space `b9c5ac0a15` jfs: prevent NULL deref in diFree `c69b442125` virtio_console: eliminate anonymous module_init & module_exit `3309b32217` serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() `9cb90f9ad5` x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy `b3882e78aa` NFS: swap-out must always use STABLE writes. `d4170a2821` NFS: swap IO handling is slightly different for O_DIRECT IO `4b6f122bdf` SUNRPC: remove scheduling boost for "SWAPPER" tasks. `f4fc47e71e` SUNRPC/xprt: async tasks mustn't block waiting for memory `f9244d31e0` SUNRPC/call_alloc: async tasks mustn't block waiting for memory `e2b2542f74` clk: Enforce that disjoints limits are invalid `1e9b5538cf` clk: ti: Preserve node in ti_dt_clocks_register() `a2a0e04f64` xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 `4a2544ce24` NFSv4: Protect the state recovery thread against direct reclaim `9b9feec97c` NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify() `2e16895d06` w1: w1_therm: fixes w1_seq for ds28ea00 sensors `93498c6e77` staging: wfx: fix an error handling in wfx_init_common() `8f1d24f85f` phy: amlogic: meson8b-usb2: Use dev_err_probe() `aa0b729678` staging: vchiq_core: handle NULL result of find_service_by_handle `be4ecca958` clk: si5341: fix reported clk_rate when output divider is 2 `c9cf6baabf` minix: fix bug when opening a file with O_DIRECT `8d9efd4434` init/main.c: return 1 from handled __setup() functions `f442978612` ceph: fix memory leak in ceph_readdir when note_last_dentry returns error `d745512d54` netlabel: fix out-of-bounds memory accesses `2cc803804e` Bluetooth: Fix use after free in hci_send_acl `789621df19` MIPS: ingenic: correct unit node address `61e25021e6` xtensa: fix DTC warning unit_address_format `f6b9550f53` usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm `a4dd3e9e5a` net: sfp: add 2500base-X quirk for Lantech SFP module `278b652f0a` net: limit altnames to 64k total `423e7107f6` net: account alternate interface name memory `74c4d50255` can: isotp: set default value for N_As to 50 micro seconds `1d7effe5ff` scsi: libfc: Fix use after free in fc_exch_abts_resp() `02222bf4f0` powerpc/secvar: fix refcount leak in format_show() `fd416c3f5a` MIPS: fix fortify panic when copying asm exception handlers `7c657c0694` PCI: endpoint: Fix misused goto label `79cfc0052f` bnxt_en: Eliminate unintended link toggle during FW reset `9567d54e70` Bluetooth: use memset avoid memory leaks `f9b183f133` Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg} `647b35aaf4` tuntap: add sanity checks about msg_controllen in sendmsg `797b4ea951` macvtap: advertise link netns via netlink `142ae7d4f2` mips: ralink: fix a refcount leak in ill_acc_of_setup() `f2565cb40e` net/smc: correct settings of RMB window update limit `224903cc60` scsi: hisi_sas: Free irq vectors in order for v3 HW `f49ffaa85d` scsi: aha152x: Fix aha152x_setup() __setup handler return value `91ee8a14ef` mt76: mt7615: Fix assigning negative values to unsigned variable `d83574666b` scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() `a0bb65eadb` scsi: pm8001: Fix tag leaks on error `2051044d79` scsi: pm8001: Fix task leak in pm8001_send_abort_all() `3bd9a28798` scsi: pm8001: Fix pm8001_mpi_task_abort_resp() `ef969095c4` scsi: pm8001: Fix pm80xx_pci_mem_copy() interface `fe4b6d5a0d` drm/amdkfd: make CRAT table missing message informational only `2f2f017ea8` dm: requeue IO if mapping table not yet available `71c8df33fd` dm ioctl: prevent potential spectre v1 gadget `f655b724b4` ipv4: Invalidate neighbour for broadcast address upon address addition `bae03957e8` iwlwifi: mvm: Correctly set fragmented EBS `9538563d31` power: supply: axp288-charger: Set Vhold to 4.4V `c66cc04043` PCI: pciehp: Add Qualcomm quirk for Command Completed erratum `b1b27b0e8d` tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. `b02a1a6502` PCI: endpoint: Fix alignment fault error in copy tests `4820847e8b` usb: ehci: add pci device support for Aspeed platforms `0b9cf0b599` iommu/arm-smmu-v3: fix event handling soft lockup `e07e420a00` PCI: aardvark: Fix support for MSI interrupts `6694b8643b` drm/amdgpu: Fix recursive locking warning `ea21eaea7f` powerpc: Set crashkernel offset to mid of RMA region `fb5ac62fbe` ipv6: make mc_forwarding atomic `5baf92a2c4` libbpf: Fix build issue with llvm-readelf `26a1e4739e` cfg80211: don't add non transmitted BSS to 6GHz scanned channels `9a56e2b271` mt76: dma: initialize skip_unmap in mt76_dma_rx_fill `b42b6d0ec3` power: supply: axp20x_battery: properly report current when discharging `de9505936c` scsi: bfa: Replace snprintf() with sysfs_emit() `ed7db95920` scsi: mvsas: Replace snprintf() with sysfs_emit() `995f517888` bpf: Make dst_port field in struct bpf_sock 16-bit wide `339bd0b55e` ath11k: mhi: use mhi_sync_power_up() `c6a815f5ab` ath11k: fix kernel panic during unload/load ath11k modules `e4d2d72013` powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 `02e2ee8619` ptp: replace snprintf with sysfs_emit `9ea17b9f1d` usb: gadget: tegra-xudc: Fix control endpoint's definitions `07971b818e` usb: gadget: tegra-xudc: Do not program SPARAM `927beb05aa` drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj `85313d9bc7` drm/amd/display: Add signal type check when verify stream backends same `9d7d83d039` ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 `850c4351e8` drm: Add orientation quirk for GPD Win Max `a24479c5e9` KVM: x86/emulator: Emulate RDPID only if it is enabled in guest `66b0fa6b22` KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs `2e52a29470` rtc: wm8350: Handle error for wm8350_register_irq `0777fe98a4` gfs2: gfs2_setattr_size error path fix `f349d7f9ee` gfs2: Fix gfs2_release for non-writers regression `3f53715fd5` gfs2: Check for active reservation in gfs2_release `2dc49f58a2` ubifs: Rectify space amount budget for mkdir/tmpfile operations Update the .xml file with the following needed changes that came in from the -lts branch to handle ABI issues with LTS security fixes: Leaf changes summary: 3 artifacts changed Changed leaf types summary: 2 leaf types changed Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable 1 function with some sub-type change: [C] 'function int hex_to_bin(char)' at hexdump.c:53:1 has some sub-type changes: parameter 1 of type 'char' changed: type name changed from 'char' to 'unsigned char' type size hasn't changed 'struct gpio_chip at driver.h:362:1' changed (indirectly): type size hasn't changed there are data member changes: type 'struct gpio_irq_chip' of 'gpio_chip::irq' changed: type size hasn't changed there are data member changes: data member u64 android_kabi_reserved1 at offset 2304 (in bits) became anonymous data member 'union {bool initialized; struct {u64 android_kabi_reserved1;}; union {};}' 1265 impacted interfaces 1265 impacted interfaces 'struct gpio_irq_chip at driver.h:32:1' changed: details were reported earlier Change-Id: Iface7385c5d82fbcdaeb92fda79ac3cd1835d323 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2022-07-27 11:21:05 +02:00
Bing Han	1aa26f0017	ANDROID: vendor_hook: Add hook in page_referenced_one() Add android_vh_page_referenced_one_end at the end of function page_referenced_one to update the status that whether the page need to be reclaimed to a specified swap location. Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: Ia06a229956328ef776da5d163708dcb011a327fb	2022-06-30 03:00:23 +00:00
Greg Kroah-Hartman	5dadf6321c	This is the 5.10.111 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmJXHgUACgkQONu9yGCS aT7BohAAx7alIKg1d4gbIHhO6eimWLWLj95ncyeq6xtNT+qKqdYgp+w8xKAJ8QLG sG9sbGcoWYkgOLcSy4rztgh9HBQuGvY6vLFqygRw5HXN1iAirYlr7DJCCKRc1pPZ E5ASOzbkfmBw9HI/w41up5vosSkNAf1qqbL9lJxfx11ms5t7s/11gYg+xSH61NUI gBE4GyJSq91p161F4ql+dJqYrU+gAIY9zKVSAqB97z9D3d01tZkr4LGNjbqtu7Kb 3d+vjiKfMda09X16US3nx9PaxikfQn5IB8JA9mpWgI+Q7H6R9Ri+rQxnv/ghpEPc U9BvK9p7+zYu6dyNUZYbGCsHAQ3WFoatJPO+JTxXllJ99ORrN85WfvFMWZq49f3k XxYMbECcLJfsYUJycKcPJJfGFLfxw2cDfJmzNJEvzX9KK6ObZxSeYeVHjrdC8XwA WZlt1zNObE2IyH3pkqSyxKubnpu4Z0UdxDdIeowsdI9iD7oGlhtfzhTVa+JbxnuY HHtHIKweeyYeUTRIKe1w/ZE24LQjF0fpy9M2ZGxzy6YQTqFsjGNzsfcPBWWFOXqp XGTgaKoIqA+1ov2nVGk8j1BOTHKqYx4gwhb5Y58kX8hvHQGr6d3eoyMKui7wPT5f 9RjU/9+eZ1DT8LLnUZFNJHIrhwIUCR/wEY1Y9gPi58RE8Wj9TlY= =3r9Y -----END PGP SIGNATURE----- Merge 5.10.111 into android12-5.10-lts Changes in 5.10.111 ubifs: Rectify space amount budget for mkdir/tmpfile operations gfs2: Check for active reservation in gfs2_release gfs2: Fix gfs2_release for non-writers regression gfs2: gfs2_setattr_size error path fix rtc: wm8350: Handle error for wm8350_register_irq KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86/emulator: Emulate RDPID only if it is enabled in guest drm: Add orientation quirk for GPD Win Max ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 drm/amd/display: Add signal type check when verify stream backends same drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj usb: gadget: tegra-xudc: Do not program SPARAM usb: gadget: tegra-xudc: Fix control endpoint's definitions ptp: replace snprintf with sysfs_emit powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 ath11k: fix kernel panic during unload/load ath11k modules ath11k: mhi: use mhi_sync_power_up() bpf: Make dst_port field in struct bpf_sock 16-bit wide scsi: mvsas: Replace snprintf() with sysfs_emit() scsi: bfa: Replace snprintf() with sysfs_emit() power: supply: axp20x_battery: properly report current when discharging mt76: dma: initialize skip_unmap in mt76_dma_rx_fill cfg80211: don't add non transmitted BSS to 6GHz scanned channels libbpf: Fix build issue with llvm-readelf ipv6: make mc_forwarding atomic powerpc: Set crashkernel offset to mid of RMA region drm/amdgpu: Fix recursive locking warning PCI: aardvark: Fix support for MSI interrupts iommu/arm-smmu-v3: fix event handling soft lockup usb: ehci: add pci device support for Aspeed platforms PCI: endpoint: Fix alignment fault error in copy tests tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. PCI: pciehp: Add Qualcomm quirk for Command Completed erratum power: supply: axp288-charger: Set Vhold to 4.4V iwlwifi: mvm: Correctly set fragmented EBS ipv4: Invalidate neighbour for broadcast address upon address addition dm ioctl: prevent potential spectre v1 gadget dm: requeue IO if mapping table not yet available drm/amdkfd: make CRAT table missing message informational only scsi: pm8001: Fix pm80xx_pci_mem_copy() interface scsi: pm8001: Fix pm8001_mpi_task_abort_resp() scsi: pm8001: Fix task leak in pm8001_send_abort_all() scsi: pm8001: Fix tag leaks on error scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() mt76: mt7615: Fix assigning negative values to unsigned variable scsi: aha152x: Fix aha152x_setup() __setup handler return value scsi: hisi_sas: Free irq vectors in order for v3 HW net/smc: correct settings of RMB window update limit mips: ralink: fix a refcount leak in ill_acc_of_setup() macvtap: advertise link netns via netlink tuntap: add sanity checks about msg_controllen in sendmsg Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg} Bluetooth: use memset avoid memory leaks bnxt_en: Eliminate unintended link toggle during FW reset PCI: endpoint: Fix misused goto label MIPS: fix fortify panic when copying asm exception handlers powerpc/secvar: fix refcount leak in format_show() scsi: libfc: Fix use after free in fc_exch_abts_resp() can: isotp: set default value for N_As to 50 micro seconds net: account alternate interface name memory net: limit altnames to 64k total net: sfp: add 2500base-X quirk for Lantech SFP module usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm xtensa: fix DTC warning unit_address_format MIPS: ingenic: correct unit node address Bluetooth: Fix use after free in hci_send_acl netlabel: fix out-of-bounds memory accesses ceph: fix memory leak in ceph_readdir when note_last_dentry returns error init/main.c: return 1 from handled __setup() functions minix: fix bug when opening a file with O_DIRECT clk: si5341: fix reported clk_rate when output divider is 2 staging: vchiq_core: handle NULL result of find_service_by_handle phy: amlogic: meson8b-usb2: Use dev_err_probe() staging: wfx: fix an error handling in wfx_init_common() w1: w1_therm: fixes w1_seq for ds28ea00 sensors NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify() NFSv4: Protect the state recovery thread against direct reclaim xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 clk: ti: Preserve node in ti_dt_clocks_register() clk: Enforce that disjoints limits are invalid SUNRPC/call_alloc: async tasks mustn't block waiting for memory SUNRPC/xprt: async tasks mustn't block waiting for memory SUNRPC: remove scheduling boost for "SWAPPER" tasks. NFS: swap IO handling is slightly different for O_DIRECT IO NFS: swap-out must always use STABLE writes. x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() virtio_console: eliminate anonymous module_init & module_exit jfs: prevent NULL deref in diFree SUNRPC: Fix socket waits for write buffer space NFS: nfsiod should not block forever in mempool_alloc() NFS: Avoid writeback threads getting stuck in mempool_alloc() parisc: Fix CPU affinity for Lasi, WAX and Dino chips parisc: Fix patch code locking and flushing mm: fix race between MADV_FREE reclaim and blkdev direct IO read Revert "hv: utils: add PTP_1588_CLOCK to Kconfig to fix build" drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() Drivers: hv: vmbus: Fix potential crash on module unload Revert "NFSv4: Handle the special Linux file open access mode" NFSv4: fix open failure with O_ACCMODE flag scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() net/tls: fix slab-out-of-bounds bug in decrypt_internal ice: Clear default forwarding VSI during VSI release net: ipv4: fix route with nexthop object delete warning net: stmmac: Fix unset max_speed difference between DT and non-DT platforms drm/imx: imx-ldb: Check for null pointer after calling kmemdup drm/imx: Fix memory leak in imx_pd_connector_get_modes bnxt_en: reserve space inside receive page for skb_shared_info sfc: Do not free an empty page_ring RDMA/mlx5: Don't remove cache MRs when a delay is needed IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe ice: Set txq_teid to ICE_INVAL_TEID on ring creation ice: Do not skip not enabled queues in ice_vc_dis_qs_msg ipv6: Fix stats accounting in ip6_pkt_drop ice: synchronize_rcu() when terminating rings net: openvswitch: don't send internal clone attribute to the userspace. net: openvswitch: fix leak of nested actions rxrpc: fix a race in rxrpc_exit_net() net: phy: mscc-miim: reject clause 45 register accesses qede: confirm skb is allocated before using spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() bpf: Support dual-stack sockets in bpf_tcp_check_syncookie drbd: Fix five use after free bugs in get_initial_state io_uring: don't touch scm_fp_list after queueing skb SUNRPC: Handle ENOMEM in call_transmit_status() SUNRPC: Handle low memory situations in call_status() SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() iommu/omap: Fix regression in probe for NULL pointer dereference perf: arm-spe: Fix perf report --mem-mode perf tools: Fix perf's libperf_print callback perf session: Remap buf if there is no space for event arm64: Add part number for Arm Cortex-A78AE Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" mmc: mmci: stm32: correctly check all elements of sg list mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete lz4: fix LZ4_decompress_safe_partial read out of bound mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) mm/mempolicy: fix mpol_new leak in shared_policy_replace io_uring: fix race between timeout flush and removal x86/pm: Save the MSR validity status at context setup x86/speculation: Restore speculation related MSRs during S3 resume btrfs: fix qgroup reserve overflow the qgroup limit btrfs: prevent subvol with swapfile from being deleted arm64: patch_text: Fixup last cpu should be master RDMA/hfi1: Fix use-after-free bug for mm struct gpio: Restrict usage of GPIO chip irq members before initialization ata: sata_dwc_460ex: Fix crash due to OOB write perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator irqchip/gic-v3: Fix GICR_CTLR.RWP polling drm/amdgpu/smu10: fix SoC/fclk units in auto mode drm/nouveau/pmu: Add missing callbacks for Tegra devices drm/amdkfd: Create file descriptor after client is added to smi_clients list perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13 perf python: Fix probing for some clang command line options tools build: Filter out options and warnings not supported by clang tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" ubsan: remove CONFIG_UBSAN_OBJECT_SIZE mm: don't skip swap entry even if zap_details specified cgroup: Use open-time credentials for process migraton perm checks selftests/cgroup: Fix build on older distros selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 selftests: cgroup: Test open-time credential usage for migration checks selftests: cgroup: Test open-time cgroup namespace usage for migration checks arm64: module: remove (NOLOAD) from linker script Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() irqchip/gic, gic-v3: Prevent GSI to SGI translations mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit Linux 5.10.111 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I9b4c1d30ae226b865494df03d871db2a2b9281c7	2022-04-21 14:27:41 +02:00
Mauricio Faria de Oliveira	3c3fbfa6dd	mm: fix race between MADV_FREE reclaim and blkdev direct IO read commit 6c8e2a256915a223f6289f651d6b926cd7135c9e upstream. Problem: ======= Userspace might read the zero-page instead of actual data from a direct IO read on a block device if the buffers have been called madvise(MADV_FREE) on earlier (this is discussed below) due to a race between page reclaim on MADV_FREE and blkdev direct IO read. - Race condition: ============== During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks if the page is not dirty, then discards its rmap PTE(s) (vs. remap back if the page is dirty). However, after try_to_unmap_one() returns to shrink_page_list(), it might keep the page _anyway_ if page_ref_freeze() fails (it expects exactly _one_ page reference, from the isolation for page reclaim). Well, blkdev_direct_IO() gets references for all pages, and on READ operations it only sets them dirty _later_. So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct IO read from block devices, and page reclaim happens during __blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages() returns, but BEFORE the pages are set dirty, the situation happens. The direct IO read eventually completes. Now, when userspace reads the buffers, the PTE is no longer there and the page fault handler do_anonymous_page() services that with the zero-page, NOT the data! A synthetic reproducer is provided. - Page faults: =========== If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't happen, because that faults-in all pages as writeable, so do_anonymous_page() sets up a new page/rmap/PTE, and that is used by direct IO. The userspace reads don't fault as the PTE is there (thus zero-page is not used/setup). But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE is no longer there; the subsequent page faults can't help: The data-read from the block device probably won't generate faults due to DMA (no MMU) but even in the case it wouldn't use DMA, that happens on different virtual addresses (not user-mapped addresses) because `struct bio_vec` stores `struct page` to figure addresses out (which are different from user-mapped addresses) for the read. Thus userspace reads (to user-mapped addresses) still fault, then do_anonymous_page() gets another `struct page` that would address/ map to other memory than the `struct page` used by `struct bio_vec` for the read. (The original `struct page` is not available, since it wasn't freed, as page_ref_freeze() failed due to more page refs. And even if it were available, its data cannot be trusted anymore.) Solution: ======== One solution is to check for the expected page reference count in try_to_unmap_one(). There should be one reference from the isolation (that is also checked in shrink_page_list() with page_ref_freeze()) plus one or more references from page mapping(s) (put in discard: label). Further references mean that rmap/PTE cannot be unmapped/nuked. (Note: there might be more than one reference from mapping due to fork()/clone() without CLONE_VM, which use the same `struct page` for references, until the copy-on-write page gets copied.) So, additional page references (e.g., from direct IO read) now prevent the rmap/PTE from being unmapped/dropped; similarly to the page is not freed per shrink_page_list()/page_ref_freeze()). - Races and Barriers: ================== The new check in try_to_unmap_one() should be safe in races with bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's done under the PTE lock. The fast path doesn't take the lock, but it checks if the PTE has changed and if so, it drops the reference and leaves the page for the slow path (which does take that lock). The fast path requires synchronization w/ full memory barrier: it writes the page reference count first then it reads the PTE later, while try_to_unmap() writes PTE first then it reads page refcount. And a second barrier is needed, as the page dirty flag should not be read before the page reference count (as in __remove_mapping()). (This can be a load memory barrier only; no writes are involved.) Call stack/comments: - try_to_unmap_one() - page_vma_mapped_walk() - map_pte() # see pte_offset_map_lock(): pte_offset_map() spin_lock() - ptep_get_and_clear() # write PTE - smp_mb() # (new barrier) GUP fast path - page_ref_count() # (new check) read refcount - page_vma_mapped_walk_done() # see pte_unmap_unlock(): pte_unmap() spin_unlock() - bio_iov_iter_get_pages() - __bio_iov_iter_get_pages() - iov_iter_get_pages() - get_user_pages_fast() - internal_get_user_pages_fast() # fast path - lockless_pages_from_mm() - gup_{pgd,p4d,pud,pmd,pte}_range() ptep = pte_offset_map() # not _lock() pte = ptep_get_lockless(ptep) page = pte_page(pte) try_grab_compound_head(page) # inc refcount # (RMW/barrier # on success) if (pte_val(pte) != pte_val(ptep)) # read PTE put_compound_head(page) # dec refcount # go slow path # slow path - __gup_longterm_unlocked() - get_user_pages_unlocked() - __get_user_pages_locked() - __get_user_pages() - follow_{page,p4d,pud,pmd}_mask() - follow_page_pte() ptep = pte_offset_map_lock() pte = ptep page = vm_normal_page(pte) try_grab_page(page) # inc refcount pte_unmap_unlock() - Huge Pages: ========== Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE (aka lazyfree) pages are PageAnon() && !PageSwapBacked() (madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn()) thus should reach shrink_page_list() -> split_huge_page_to_list() before try_to_unmap[_one](), so it deals with normal pages only. (And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens, which should not or be rare, the page refcount should be greater than mapcount: the head page is referenced by tail pages. That also prevents checking the head `page` then incorrectly call page_remove_rmap(subpage) for a tail page, that isn't even in the shrink_page_list()'s page_list (an effect of split huge pmd/pmvw), as it might happen today in this unlikely scenario.) MADV_FREE'd buffers: =================== So, back to the "if MADV_FREE pages are used as buffers" note. The case is arguable, and subject to multiple interpretations. The madvise(2) manual page on the MADV_FREE advice value says: 1) 'After a successful MADV_FREE ... data will be lost when the kernel frees the pages.' 2) 'the free operation will be canceled if the caller writes into the page' / 'subsequent writes ... will succeed and then [the] kernel cannot free those dirtied pages' 3) 'If there is no subsequent write, the kernel can free the pages at any time.' Thoughts, questions, considerations... respectively: 1) Since the kernel didn't actually free the page (page_ref_freeze() failed), should the data not have been lost? (on userspace read.) 2) Should writes performed by the direct IO read be able to cancel the free operation? - Should the direct IO read be considered as 'the caller' too, as it's been requested by 'the caller'? - Should the bio technique to dirty pages on return to userspace (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) be considered in another/special way here? 3) Should an upcoming write from a previously requested direct IO read be considered as a subsequent write, so the kernel should not free the pages? (as it's known at the time of page reclaim.) And lastly: Technically, the last point would seem a reasonable consideration and balance, as the madvise(2) manual page apparently (and fairly) seem to assume that 'writes' are memory access from the userspace process (not explicitly considering writes from the kernel or its corner cases; again, fairly).. plus the kernel fix implementation for the corner case of the largely 'non-atomic write' encompassed by a direct IO read operation, is relatively simple; and it helps. Reproducer: ========== @ test.c (simplified, but works) #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> int main() { int fd, i; char buf; fd = open(DEV, O_RDONLY \| O_DIRECT); buf = mmap(NULL, BUF_SIZE, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0); for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) buf[i] = 1; // init to non-zero madvise(buf, BUF_SIZE, MADV_FREE); read(fd, buf, BUF_SIZE); for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) printf("%p: 0x%x\n", &buf[i], buf[i]); return 0; } @ block/fops.c (formerly fs/block_dev.c) +#include <linux/swap.h> ... ... __blkdev_direct_IO[_simple](...) { ... + if (!strcmp(current->comm, "good")) + shrink_all_memory(ULONG_MAX); + ret = bio_iov_iter_get_pages(...); + + if (!strcmp(current->comm, "bad")) + shrink_all_memory(ULONG_MAX); ... } @ shell # NUM_PAGES=4 # PAGE_SIZE=$(getconf PAGE_SIZE) # yes \| dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES} # DEV=$(losetup -f --show test.img) # gcc -DDEV=\"$DEV\" \ -DBUF_SIZE=$((PAGE_SIZE NUM_PAGES)) \ -DPAGE_SIZE=${PAGE_SIZE} \ test.c -o test # od -tx1 $DEV 0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a * 0040000 # mv test good # ./good 0x7f7c10418000: 0x79 0x7f7c10419000: 0x79 0x7f7c1041a000: 0x79 0x7f7c1041b000: 0x79 # mv good bad # ./bad 0x7fa1b8050000: 0x0 0x7fa1b8051000: 0x0 0x7fa1b8052000: 0x0 0x7fa1b8053000: 0x0 Note: the issue is consistent on v5.17-rc3, but it's intermittent with the support of MADV_FREE on v4.5 (60%-70% error; needs swap). [wrap do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c]. - v5.17-rc3: # for i in {1..1000}; do ./good; done \ \| cut -d: -f2 \| sort \| uniq -c 4000 0x79 # mv good bad # for i in {1..1000}; do ./bad; done \ \| cut -d: -f2 \| sort \| uniq -c 4000 0x0 # free \| grep Swap Swap: 0 0 0 - v4.5: # for i in {1..1000}; do ./good; done \ \| cut -d: -f2 \| sort \| uniq -c 4000 0x79 # mv good bad # for i in {1..1000}; do ./bad; done \ \| cut -d: -f2 \| sort \| uniq -c 2702 0x0 1298 0x79 # swapoff -av swapoff /swap # for i in {1..1000}; do ./bad; done \ \| cut -d: -f2 \| sort \| uniq -c 4000 0x79 Ceph/TCMalloc: ============= For documentation purposes, the use case driving the analysis/fix is Ceph on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to release unused memory to the system from the mmap'ed page heap (might be committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan() -> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() -> TCMalloc_SystemCommit() -> do nothing. Note: TCMalloc switched back to MADV_DONTNEED a few commits after the release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue just 'disappeared' on Ceph on later Ubuntu releases but is still present in the kernel, and can be hit by other use cases. The observed issue seems to be the old Ceph bug #22464 [1], where checksum mismatches are observed (and instrumentation with buffer dumps shows zero-pages read from mmap'ed/MADV_FREE'd page ranges). The issue in Ceph was reasonably deemed a kernel bug (comment #50) and mostly worked around with a retry mechanism, but other parts of Ceph could still hit that (rocksdb). Anyway, it's less likely to be hit again as TCMalloc switched out of MADV_FREE by default. (Some kernel versions/reports from the Ceph bug, and relation with the MADV_FREE introduction/changes; TCMalloc versions not checked.) - 4.4 good - 4.5 (madv_free: introduction) - 4.9 bad - 4.10 good? maybe a swapless system - 4.12 (madv_free: no longer free instantly on swapless systems) - 4.13 bad [1] https://tracker.ceph.com/issues/22464 Thanks: ====== Several people contributed to analysis/discussions/tests/reproducers in the first stages when drilling down on ceph/tcmalloc/linux kernel: - Dan Hill - Dan Streetman - Dongdong Tao - Gavin Guo - Gerald Yang - Heitor Alves de Siqueira - Ioanna Alifieraki - Jay Vosburgh - Matthew Ruffell - Ponnuvel Palaniyappan Reviews, suggestions, corrections, comments: - Minchan Kim - Yu Zhao - Huang, Ying - John Hubbard - Christoph Hellwig [mfo@canonical.com: v4] Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com Fixes: `802a3a92ad` ("mm: reclaim MADV_FREE pages") Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Hill <daniel.hill@canonical.com> Cc: Dan Streetman <dan.streetman@canonical.com> Cc: Dongdong Tao <dongdong.tao@canonical.com> Cc: Gavin Guo <gavin.guo@canonical.com> Cc: Gerald Yang <gerald.yang@canonical.com> Cc: Heitor Alves de Siqueira <halves@canonical.com> Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan@canonical.com> Cc: <stable@vger.kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [mfo: backport: replace folio/test_flag with page/flag equivalents; real Fixes: `854e9ed09d` ("mm: support madvise(MADV_FREE)") in v4.] Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-13 21:01:03 +02:00
Greg Kroah-Hartman	639159686b	Merge branch 'android12-5.10' into `android12-5.10-lts` Sync up with android12-5.10 for the following commits: `6c3417436a` ANDROID: kernel: fix module info for debug_kinfo `774f1bd29c` ANDROID: Disable CFI on restricted vendor hooks in TRACE_HEADER_MULTI_READ `90c60a51f5` UPSTREAM: f2fs: guarantee to write dirty data when enabling checkpoint back `8cf5bb6946` UPSTREAM: mm: memblock: fix section mismatch warning again `f00a543047` FROMLIST: usb: gadget: u_audio: EP-OUT bInterval in fback frequency `3f26745cae` FROMLIST: usb: gadget: f_uac2: fixed EP-IN wMaxPacketSize `ab9ceb4334` FROMLIST: usb: gadget: f_uac2: Add missing companion descriptor for feedback EP `35afadf0da` FROMLIST: thermal: Fix a NULL pointer dereference `3d371f087c` UPSTREAM: usb: gadget: f_uac2: fixup feedback endpoint stop `406a51b486` UPSTREAM: usb: gadget: u_audio: add real feedback implementation `d33287acf3` UPSTREAM: usb: gadget: f_uac2: add adaptive sync support for capture `c71892dd9e` UPSTREAM: usb: gadget: f_uac2/u_audio: add feedback endpoint support `a844dfbbcb` UPSTREAM: usb: gadget: u_audio: convert to strscpy `955f917251` ANDROID: vendor_hooks: Add hook in try_to_unmap_one() `878e0caa77` ANDROID: vendor_hooks: Add hook in mmap_region() `b0778aaff4` ANDROID: vendor_hooks: export android_vh_kmalloc_slab `94fbab9d6c` ANDROID: vendor_hooks: Add hook in kmalloc_slab() `73839b71c8` FROMGIT: usb: dwc3: Decouple USB 2.0 L1 & L2 events `2c68b9071d` ANDROID: GKI: update xiaomi symbol list `8da32d526d` ANDROID: cfi: explicitly clear diag in __cfi_slowpath Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I71c4d03d373d9cef0c467ceea52b62ada755e701	2021-09-13 18:12:07 +02:00
Jiewen Wang	955f917251	ANDROID: vendor_hooks: Add hook in try_to_unmap_one() Add hook in try_to_unmap_one() to trace this function for debug memory swap bugs. Bug: 198385827 Change-Id: I1fdbe60e09bb491b949e06a07133710453ecca03 Signed-off-by: Jiewen Wang <jiewen.wang@vivo.com>	2021-09-06 17:00:04 +08:00
Greg Kroah-Hartman	194be71cc6	Linux 5.10.47 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcbDgACgkQ3qZv95d3 LNwUuQ//VDlmBPk/3w1FYvg9N9q/t1GHkVJXmD8TY/ClLdJtgxPYeoRu1VNLR/xf Y2kwZEF07yMA88RME56Zwt3p+LBbacrp5MoNdzEA48kb7auGBPk1HIscBg2PXC+C AnlC/O4/NAW6Okb+lLFL7XFM4xrlDBkNr5yTz2HmQSQC3JFfov0FcrON3KKTL5Bi aeyWjhn1NnhkKCDaKUl7kKlCQ2x7buu/YmvJK2OdGmuLZVywzto76RS+Xx3X0CnK pPkmciZSS7Gxi6UJel/zza0UwlKg5+IhFzfYVt0nsTFMjk/4QStAoevu7TFnbVD3 yM7nzJpAQmVLs/X9sC0rgg9rCBUyp9d4ddba8bCUqaxpPQfMObWI8S5F8tfzBnK/ h8P5xfs8u4O1AzpRr+YSN2Hbvak47e/4c5UOvvYj6z8NaIEb7DGbaUv/JE5YRVZ0 ZEVZ1auEpHcSVAGz6DUwuwzc8Rk0NdskES5DD53QxNLXDoF1CVdsD44xUuJH1/HA //3S1SWxvwF9UQ+w+sk/Z6pzUj+CdFigou3QzwB+vAZ04n0JawRSuqyijkCqFOP5 88iCMgxZ5qAYJ1TzQ6gV7cA0tSbteBF/HERNmdadyGvse9KtxkaBAfFmvIpd9D8J fTepYVuneP9cGqaGDUtsqHzM5YLrIkCSxABjgIvpNTt0eJL5c2k= =TKhf -----END PGP SIGNATURE----- Merge 5.10.47 into android12-5.10-lts Changes in 5.10.47 module: limit enabling module.sig_enforce Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue." Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell." drm: add a locked version of drm_is_current_master drm/nouveau: wait for moving fence after pinning v2 drm/radeon: wait for moving fence after pinning drm/amdgpu: wait for moving fence after pinning ARM: 9081/1: fix gcc-10 thumb2-kernel regression mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk MIPS: generic: Update node names to avoid unit addresses arm64: Ignore any DMA offsets in the max_zone_phys() calculation arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required spi: spi-nxp-fspi: move the register operation after the clock enable Revert "PCI: PM: Do not read power state in pci_enable_device_flags()" drm/vc4: hdmi: Move the HSM clock enable to runtime_pm drm/vc4: hdmi: Make sure the controller is powered in detect x86/entry: Fix noinstr fail in __do_fast_syscall_32() x86/xen: Fix noinstr fail in exc_xen_unknown_trap() locking/lockdep: Improve noinstr vs errors perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context perf/x86/intel/lbr: Zero the xstate buffer on allocation dmaengine: zynqmp_dma: Fix PM reference leak in zynqmp_dma_alloc_chan_resourc() dmaengine: stm32-mdma: fix PM reference leak in stm32_mdma_alloc_chan_resourc() dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits mac80211: remove warning in ieee80211_get_sband() mac80211_hwsim: drop pending frames on stop cfg80211: call cfg80211_leave_ocb when switching away from OCB dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe() dmaengine: mediatek: free the proper desc in desc_free handler dmaengine: mediatek: do not issue a new desc if one is still current dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma net: ipv4: Remove unneed BUG() function mac80211: drop multicast fragments net: ethtool: clear heap allocations for ethtool function inet: annotate data race in inet_send_prepare() and inet_dgram_connect() ping: Check return value of function 'ping_queue_rcv_skb' net: annotate data race in sock_error() inet: annotate date races around sk->sk_txhash net/packet: annotate data race in packet_sendmsg() net: phy: dp83867: perform soft reset and retain established link riscv32: Use medany C model for modules net: caif: fix memory leak in ldisc_open net/packet: annotate accesses to po->bind net/packet: annotate accesses to po->ifindex r8152: Avoid memcpy() over-reading of ETH_SS_STATS sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS r8169: Avoid memcpy() over-reading of ETH_SS_STATS KVM: selftests: Fix kvm_check_cap() assertion net: qed: Fix memcpy() overflow of qed_dcbx_params() mac80211: reset profile_periodicity/ema_ap mac80211: handle various extensible elements correctly recordmcount: Correct st_shndx handling PCI: Add AMD RS690 quirk to enable 64-bit DMA net: ll_temac: Add memory-barriers for TX BD access net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY perf/x86: Track pmu in per-CPU cpu_hw_events pinctrl: stm32: fix the reported number of GPIO lines per bank i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access gpiolib: cdev: zero padding during conversion to gpioline_info_changed scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART) nilfs2: fix memory leak in nilfs_sysfs_delete_device_group s390/stack: fix possible register corruption with stack switch helper KVM: do not allow mapping valid but non-reference-counted pages i2c: robotfuzz-osif: fix control-request directions ceph: must hold snap_rwsem when filling inode for async create kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate() x86/fpu: Make init_fpstate correct with optimized XSAVE mm: add VM_WARN_ON_ONCE_PAGE() macro mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: use page_not_mapped in try_to_unmap() mm, thp: use head page in __migration_entry_wait() mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset mm/thp: fix page_address_in_vma() on file THP tails mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm, futex: fix shared futex pgoff on shmem huge page KVM: SVM: Call SEV Guest Decommission if ASID binding fails swiotlb: manipulate orig_addr when tlb_addr has offset netfs: fix test for whether we can skip read when writing beyond EOF Revert "drm: add a locked version of drm_is_current_master" certs: Add EFI_CERT_X509_GUID support for dbx entries certs: Move load_system_certificate_list to a common function certs: Add ability to preload revocation certs integrity: Load mokx variables into the blacklist keyring Linux 5.10.47 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I68f731ad78a5db003c41093e4faf59f6f9f2e446	2021-06-30 19:38:46 +02:00
Jue Wang	38cda6b5ab	mm/thp: fix page_address_in_vma() on file THP tails commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 upstream. Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it. hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later. Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: `800d8c63b2` ("shmem: add huge pages support") Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:27 -04:00
Hugh Dickins	37ffe9f4d7	mm/thp: fix vma_address() if virtual address below file offset commit 494334e43c16d63b878536a26505397fce6ff3a2 upstream. Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start \|\| start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap(). vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX. Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too. Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch. An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as commit `4b0ece6fa0` ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one(). Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered. Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: `a8fa41ad2f` ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:27 -04:00
Hugh Dickins	66be14a926	mm/thp: try_to_unmap() use TTU_SYNC for safe splitting commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 upstream. Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: `fec89c109f` ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:27 -04:00
Miaohe Lin	bfd90b56d7	mm/rmap: use page_not_mapped in try_to_unmap() [ Upstream commit b7e188ec98b1644ff70a6d3624ea16aadc39f5e0 ] page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error. Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:26 -04:00
Miaohe Lin	ff81af8259	mm/rmap: remove unneeded semicolon in page_not_mapped() [ Upstream commit e0af87ff7afcde2660be44302836d2d5618185af ] Remove extra semicolon without any functional change intended. Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:26 -04:00
Laurent Dufour	a1dbf20e8e	FROMLIST: mm: introduce __page_add_new_anon_rmap() When dealing with speculative page fault handler, we may race with VMA being split or merged. In this case the vma->vm_start and vm->vm_end fields may not match the address the page fault is occurring. This can only happens when the VMA is split but in that case, the anon_vma pointer of the new VMA will be the same as the original one, because in __split_vma the new->anon_vma is set to src->anon_vma when new = vma. So even if the VMA boundaries are not correct, the anon_vma pointer is still valid. If the VMA has been merged, then the VMA in which it has been merged must have the same anon_vma pointer otherwise the merge can't be done. So in all the case we know that the anon_vma is valid, since we have checked before starting the speculative page fault that the anon_vma pointer is valid for this VMA and since there is an anon_vma this means that at one time a page has been backed and that before the VMA is cleaned, the page table lock would have to be grab to clean the PTE, and the anon_vma field is checked once the PTE is locked. This patch introduce a new __page_add_new_anon_rmap() service which doesn't check for the VMA boundaries, and create a new inline one which do the check. When called from a page fault handler, if this is not a speculative one, there is a guarantee that vm_start and vm_end match the faulting address, so this check is useless. In the context of the speculative page fault handler, this check may be wrong but anon_vma is still valid as explained above. Change-Id: I72c47830181579f8c9618df879077d321653b5f1 Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-17-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2021-01-22 18:00:48 +00:00
Shakeel Butt	dd156e3fca	mm/rmap: always do TTU_IGNORE_ACCESS [ Upstream commit 013339df116c2ee0d796dd8bfb8f293a2030c063 ] Since commit `369ea8242c` ("mm/rmap: update to new mmu_notifier semantic v2"), the code to check the secondary MMU's page table access bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the secondary MMU's page table before the check. More specifically for those secondary MMUs which unmap the memory in mmu_notifier_invalidate_range_start() like kvm. However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the absence of TTU_IGNORE_ACCESS and it explicitly performs the page table access check before trying to unmap the page. So, at worst the reclaim will miss accesses in a very short window if we remove page table access check in unmapping code. There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg reclaim. From memcg reclaim the page_referenced() only account the accesses from the processes which are in the same memcg of the target page but the unmapping code is considering accesses from all the processes, so, decreasing the effectiveness of memcg reclaim. The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping code. Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com Fixes: `369ea8242c` ("mm/rmap: update to new mmu_notifier semantic v2") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Mike Kravetz	336bf30eb7	hugetlbfs: fix anon huge page migration race Qian Cai reported the following BUG in [1] LTP: starting move_pages12 BUG: unable to handle page fault for address: ffffffffffffffe0 ... RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63 Call Trace: rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864 try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763 migrate_pages+0x1005/0x1fb0 move_pages_and_store_status.isra.47+0xd7/0x1a0 __x64_sys_move_pages+0xa5c/0x1100 do_syscall_64+0x5f/0x310 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Hugh Dickins diagnosed this as a migration bug caused by code introduced to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for anon pages as the anon_vma_lock should be held in this case. Further analysis suggested that i_mmap_rwsem was not required to he held at all when calling try_to_unmap for anon pages as an anon page could never be part of a shared pmd mapping. Discussion also revealed that the hack in hugetlb_page_mapping_lock_write to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to keep mapping valid while dropping page lock. This patch does the following: - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when calling try_to_unmap. - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine will now simply do a 'trylock' while still holding the page lock. If the trylock fails, it will return NULL. This could impact the callers: - migration calling code will receive -EAGAIN and retry up to the hard coded limit (10). - memory error code will treat the page as BUSY. This will force killing (SIGKILL) instead of SIGBUS any mapping tasks. Do note that this change in behavior only happens when there is a race. None of the standard kernel testing suites actually hit this race, but it is possible. [1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/ [2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Fixes: `c0d0381ade` ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Reported-by: Qian Cai <cai@lca.pw> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-11-14 11:26:04 -08:00
Matthew Wilcox (Oracle)	5eaf35ab12	mm/rmap: fix assumptions of THP size Ask the page what size it is instead of assuming it's PMD size. Do this for anon pages as well as file pages for when someone decides to support that. Leave the assumption alone for pages which are PMD mapped; we don't currently grow THPs beyond PMD size, so we don't need to change this code yet. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: SeongJae Park <sjpark@amazon.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Link: https://lkml.kernel.org/r/20200908195539.25896-9-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-16 11:11:15 -07:00
Alistair Popple	ad7df764b7	mm/rmap: fixup copying of soft dirty and uffd ptes During memory migration a pte is temporarily replaced with a migration swap pte. Some pte bits from the existing mapping such as the soft-dirty and uffd write-protect bits are preserved by copying these to the temporary migration swap pte. However these bits are not stored at the same location for swap and non-swap ptes. Therefore testing these bits requires using the appropriate helper function for the given pte type. Unfortunately several code locations were found where the wrong helper function is being used to test soft_dirty and uffd_wp bits which leads to them getting incorrectly set or cleared during page-migration. Fix these by using the correct tests based on pte type. Fixes: `a5430dda8a` ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Fixes: `8c3328f1f3` ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Fixes: `f45ec5ff16` ("userfaultfd: wp: support swap and page migration") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-05 12:14:30 -07:00
Qian Cai	9c1177b62a	mm/rmap: annotate a data race at tlb_flush_batched mm->tlb_flush_batched could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6: try_to_unmap_one+0x59a/0x1ab0 set_tlb_ubc_flush_pending at mm/rmap.c:635 (inlined by) try_to_unmap_one at mm/rmap.c:1538 rmap_walk_anon+0x296/0x650 rmap_walk+0xdf/0x100 try_to_unmap+0x18a/0x2f0 shrink_page_list+0xef6/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 balance_pgdat+0x652/0xd90 kswapd+0x396/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4: flush_tlb_batched_pending+0x29/0x90 flush_tlb_batched_pending at mm/rmap.c:682 change_p4d_range+0x5dd/0x1030 change_pte_range at mm/mprotect.c:44 (inlined by) change_pmd_range at mm/mprotect.c:212 (inlined by) change_pud_range at mm/mprotect.c:240 (inlined by) change_p4d_range at mm/mprotect.c:260 change_protection+0x222/0x310 change_prot_numa+0x3e/0x60 task_numa_work+0x219/0x350 task_work_run+0xed/0x140 prepare_exit_to_usermode+0x2cc/0x2e0 ret_from_intr+0x32/0x42 Reported by Kernel Concurrency Sanitizer on: CPU: 4 PID: 6364 Comm: mtest01 Tainted: G W L 5.5.0-next-20200210+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 flush_tlb_batched_pending() is under PTL but the write is not, but mm->tlb_flush_batched is only a bool type, so the value is unlikely to be shattered. Thus, mark it as an intentional data race by using the data race macro. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-14 19:56:57 -07:00
Matthew Wilcox (Oracle)	6c357848b4	mm: replace hpage_nr_pages with thp_nr_pages The thp prefix is more frequently used than hpage and we should be consistent between the various functions. [akpm@linux-foundation.org: fix mm/migrate.c] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-14 19:56:56 -07:00
Mike Kravetz	34ae204f18	hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem Commit `c0d0381ade` ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem in at least read mode. This is because the explicit locking in huge_pmd_share (called by huge_pte_alloc) was removed. When restructuring the code, the call to huge_pte_alloc in the else block at the beginning of hugetlb_fault was missed. Unfortunately, that else clause is exercised when there is no page table entry. This will likely lead to a call to huge_pmd_share. If huge_pmd_share thinks pmd sharing is possible, it will traverse the mapping tree (i_mmap) without holding i_mmap_rwsem. If someone else is modifying the tree, bad things such as addressing exceptions or worse could happen. Simply remove the else clause. It should have been removed previously. The code following the else will call huge_pte_alloc with the appropriate locking. To prevent this type of issue in the future, add routines to assert that i_mmap_rwsem is held, and call these routines in huge pmd sharing routines. Fixes: `c0d0381ade` ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A.Shutemov" <kirill.shutemov@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/e670f327-5cf9-1959-96e4-6dc7cc30d3d5@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:56 -07:00
Michel Lespinasse	c1e8d7c6a7	mmap locking API: convert mmap_sem comments Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-09 09:39:14 -07:00
Johannes Weiner	468c398233	mm: memcontrol: switch to native NR_ANON_THPS counter With rmap memcg locking already in place for NR_ANON_MAPPED, it's just a small step to remove the MEMCG_RSS_HUGE wart and switch memcg to the native NR_ANON_THPS accounting sites. [hannes@cmpxchg.org: fixes] Link: http://lkml.kernel.org/r/20200512121750.GA397968@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> [build-tested] Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-12-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:47 -07:00
Johannes Weiner	be5d0a74c6	mm: memcontrol: switch to native NR_ANON_MAPPED counter Memcg maintains a private MEMCG_RSS counter. This divergence from the generic VM accounting means unnecessary code overhead, and creates a dependency for memcg that page->mapping is set up at the time of charging, so that page types can be told apart. Convert the generic accounting sites to mod_lruvec_page_state and friends to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED. We use lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the same way we do for NR_FILE_MAPPED. With the previous patch removing MEMCG_CACHE and the private NR_SHMEM counter, this patch finally eliminates the need to have page->mapping set up at charge time. However, we need to have page->mem_cgroup set up by the time rmap runs and does the accounting, so switch the commit and the rmap callbacks around. v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo) Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:47 -07:00
Palmer Dabbelt	4708f31885	mm: prevent a warning when casting void* -> enum I recently build the RISC-V port with LLVM trunk, which has introduced a new warning when casting from a pointer to an enum of a smaller size. This patch simply casts to a long in the middle to stop the warning. I'd be surprised this is the only one in the kernel, but it's the only one I saw. Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200227211741.83165-1-palmer@dabbelt.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:41 -07:00
Peter Xu	f45ec5ff16	userfaultfd: wp: support swap and page migration For either swap and page migration, we all use the bit 2 of the entry to identify whether this entry is uffd write-protected. It plays a similar role as the existing soft dirty bit in swap entries but only for keeping the uffd-wp tracking for a specific PTE/PMD. Something special here is that when we want to recover the uffd-wp bit from a swap/migration entry to the PTE bit we'll also need to take care of the _PAGE_RW bit and make sure it's cleared, otherwise even with the _PAGE_UFFD_WP bit we can't trap it at all. In change_pte_range() we do nothing for uffd if the PTE is a swap entry. That can lead to data mismatch if the page that we are going to write protect is swapped out when sending the UFFDIO_WRITEPROTECT. This patch also applies/removes the uffd-wp bit even for the swap entries. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-11-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:39 -07:00
Matthew Wilcox (Oracle)	396bcc5299	mm: remove CONFIG_TRANSPARENT_HUGE_PAGECACHE Commit `e496cf3d78` ("thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE") notes that it should be reverted when the PowerPC problem was fixed. The commit fixing the PowerPC problem (`953c66c2b2`) did not revert the commit; instead setting CONFIG_TRANSPARENT_HUGE_PAGECACHE to the same as CONFIG_TRANSPARENT_HUGEPAGE. Checking with Kirill and Aneesh, this was an oversight, so remove the Kconfig symbol and undo the work of commit `e496cf3d78`. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/20200318140253.6141-6-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:38 -07:00
Li Xinhai	23ab76bf90	Revert "mm/rmap.c: reuse mergeable anon_vma as parent when fork" This reverts commit `4e4a9eb921` ("mm/rmap.c: reuse mergeable anon_vma as parent when fork"). In dup_mmap(), anon_vma_fork() is called for attaching anon_vma and parameter 'tmp' (i.e., the new vma of child) has same ->vm_next and ->vm_prev as its parent vma. That causes the anon_vma used by parent been mistakenly shared by child (In anon_vma_clone(), the code added by that commit will do this reuse work). Besides this issue, the design of reusing anon_vma from vma which has gone through fork should be avoided ([1]). So, this patch reverts that commit and maintains the consistent logic of reusing anon_vma for fork/split/merge vma. Reusing anon_vma within the process is fine. But if a vma has gone through fork(), then that vma's anon_vma should not be shared with its neighbor vma. As explained in [1], when vma gone through fork(), the check for list_is_singular(vma->anon_vma_chain) will be false, and don't share anon_vma. With current issue, one example can clarify more. Parent process do below two steps: 1. p_vma_1 is created and p_anon_vma_1 is prepared; 2. p_vma_2 is created and share p_anon_vma_1; (this is allowed, becaues p_vma_1 didn't gothrough fork()); parent process do fork(): 3. c_vma_1 is dup from p_vma_1, and has its own c_anon_vma_1 prepared; at this point, c_vma_1->anon_vma_chain has two items, one for p_anon_vma_1 and one for c_anon_vma_1; 4. c_vma_2 is dup from p_vma_2, it is not allowed to share c_anon_vma_1, because c_vma_1->anon_vma_chain has two items. [1] commit `d0e9fe1758` ("Simplify and comment on anon_vma re-use for anon_vma_prepare()") explains the test of "list_is_singular()". Fixes: `4e4a9eb921` ("mm/rmap.c: reuse mergeable anon_vma as parent when fork") Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1581150928-3214-3-git-send-email-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:37 -07:00
Mike Kravetz	c0d0381ade	hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2. While discussing the issue with huge_pte_offset [1], I remembered that there were more outstanding hugetlb races. These issues are: 1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become invalid via a call to huge_pmd_unshare by another thread. 2) hugetlbfs page faults can race with truncation causing invalid global reserve counts and state. A previous attempt was made to use i_mmap_rwsem in this manner as described at [2]. However, those patches were reverted starting with [3] due to locking issues. To effectively use i_mmap_rwsem to address the above issues it needs to be held (in read mode) during page fault processing. However, during fault processing we need to lock the page we will be adding. Lock ordering requires we take page lock before i_mmap_rwsem. Waiting until after taking the page lock is too late in the fault process for the synchronization we want to do. To address this lock ordering issue, the following patches change the lock ordering for hugetlb pages. This is not too invasive as hugetlbfs processing is done separate from core mm in many places. However, I don't really like this idea. Much ugliness is contained in the new routine hugetlb_page_mapping_lock_write() of patch 1. The only other way I can think of to address these issues is by catching all the races. After catching a race, cleanup, backout, retry ... etc, as needed. This can get really ugly, especially for huge page reservations. At one time, I started writing some of the reservation backout code for page faults and it got so ugly and complicated I went down the path of adding synchronization to avoid the races. Any other suggestions would be welcome. [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/ [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/ [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/ [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/ This patch (of 2): While looking at BUGs associated with invalid huge page map counts, it was discovered and observed that a huge pte pointer could become 'invalid' and point to another task's page table. Consider the following: A task takes a page fault on a shared hugetlbfs file and calls huge_pte_alloc to get a ptep. Suppose the returned ptep points to a shared pmd. Now, another task truncates the hugetlbfs file. As part of truncation, it unmaps everyone who has the file mapped. If the range being truncated is covered by a shared pmd, huge_pmd_unshare will be called. For all but the last user of the shared pmd, huge_pmd_unshare will clear the pud pointing to the pmd. If the task in the middle of the page fault is not the last user, the ptep returned by huge_pte_alloc now points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invalid memory references. To fix, expand the use of i_mmap_rwsem as follows: - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called. huge_pmd_share is only called via huge_pte_alloc, so callers of huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers of huge_pte_alloc continue to hold the semaphore until finished with the ptep. - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called. One problem with this scheme is that it requires taking i_mmap_rwsem before taking the page lock during page faults. This is not the order specified in the rest of mm code. Handling of hugetlbfs pages is mostly isolated today. Therefore, we use this alternative locking order for PageHuge() pages. mapping->i_mmap_rwsem hugetlb_fault_mutex (hugetlbfs specific page fault mutex) page->flags PG_locked (lock_page) To help with lock ordering issues, hugetlb_page_mapping_lock_write() is introduced to write lock the i_mmap_rwsem associated with a page. In most cases it is easy to get address_space via vma->vm_file->f_mapping. However, in the case of migration or memory errors for anon pages we do not have an associated vma. A new routine _get_hugetlb_page_mapping() will use anon_vma to get address_space in these cases. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:32 -07:00
Anshuman Khandual	222100eed2	mm/vma: make is_vma_temporary_stack() available for general use Currently the declaration and definition for is_vma_temporary_stack() are scattered. Lets make is_vma_temporary_stack() helper available for general use and also drop the declaration from (include/linux/huge_mm.h) which is no longer required. While at this, rename this as vma_is_temporary_stack() in line with existing helpers. This should not cause any functional change. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1582782965-3274-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:29 -07:00
John Hubbard	47e29d32af	mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages For huge pages (and in fact, any compound page), the GUP_PIN_COUNTING_BIAS scheme tends to overflow too easily, each tail page increments the head page->_refcount by GUP_PIN_COUNTING_BIAS (1024). That limits the number of huge pages that can be pinned. This patch removes that limitation, by using an exact form of pin counting for compound pages of order > 1. The "order > 1" is required because this approach uses the 3rd struct page in the compound page, and order 1 compound pages only have two pages, so that won't work there. A new struct page field, hpage_pinned_refcount, has been added, replacing a padding field in the union (so no new space is used). This enhancement also has a useful side effect: huge pages and compound pages (of order > 1) do not suffer from the "potential false positives" problem that is discussed in the page_dma_pinned() comment block. That is because these compound pages have extra space for tracking things, so they get exact pin counts instead of overloading page->_refcount. Documentation/core-api/pin_user_pages.rst is updated accordingly. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-8-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:27 -07:00
Kirill A. Shutemov	f1fe80d4ae	mm, thp: do not queue fully unmapped pages for deferred split Adding fully unmapped pages into deferred split queue is not productive: these pages are about to be freed or they are pinned and cannot be split anyway. Link: http://lkml.kernel.org/r/20190913091849.11151-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-12-01 12:59:09 -08:00
Yang Shi	30c4638285	mm/rmap.c: use VM_BUG_ON_PAGE() in __page_check_anon_rmap() The __page_check_anon_rmap() just calls two BUG_ON()s protected by CONFIG_DEBUG_VM, the #ifdef could be eliminated by using VM_BUG_ON_PAGE(). Link: http://lkml.kernel.org/r/1573157346-111316-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-12-01 06:29:19 -08:00
Miles Chen	091e429954	mm/rmap.c: fix outdated comment in page_get_anon_vma() Replace DESTROY_BY_RCU with SLAB_TYPESAFE_BY_RCU because SLAB_DESTROY_BY_RCU has been renamed to SLAB_TYPESAFE_BY_RCU by commit `5f0d5a3ae7` ("mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU") Link: http://lkml.kernel.org/r/20191017093554.22562-1-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-12-01 06:29:19 -08:00
Wei Yang	4e4a9eb921	mm/rmap.c: reuse mergeable anon_vma as parent when fork In __anon_vma_prepare(), we will try to find anon_vma if it is possible to reuse it. While on fork, the logic is different. Since commit `5beb493052` ("mm: change anon_vma linking to fix multi-process server scalability issue"), function anon_vma_clone() tries to allocate new anon_vma for child process. But the logic here will allocate a new anon_vma for each vma, even in parent this vma is mergeable and share the same anon_vma with its sibling. This may do better for scalability issue, while it is not necessary to do so especially after interval tree is used. Commit `7a3ef208e6` ("mm: prevent endless growth of anon_vma hierarchy") tries to reuse some anon_vma by counting child anon_vma and attached vmas. While for those mergeable anon_vmas, we can just reuse it and not necessary to go through the logic. After this change, kernel build test reduces 20% anon_vma allocation. Do the same kernel build test, it shows run time in sys reduced 11.6%. Origin: real 2m50.467s user 17m52.002s sys 1m51.953s real 2m48.662s user 17m55.464s sys 1m50.553s real 2m51.143s user 17m59.687s sys 1m53.600s Patched: real 2m39.933s user 17m1.835s sys 1m38.802s real 2m39.321s user 17m1.634s sys 1m39.206s real 2m39.575s user 17m1.420s sys 1m38.845s Link: http://lkml.kernel.org/r/20191011072256.16275-2-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Qian Cai <cai@lca.pw> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-12-01 06:29:19 -08:00
Wei Yang	47b390d23b	mm/rmap.c: don't reuse anon_vma if we just want a copy Before commit `7a3ef208e6` ("mm: prevent endless growth of anon_vma hierarchy"), anon_vma_clone() doesn't change dst->anon_vma. While after this commit, anon_vma_clone() will try to reuse an exist one on forking. But this commit go a little bit further for the case not forking. anon_vma_clone() is called from __vma_split(), __split_vma(), copy_vma() and anon_vma_fork(). For the first three places, the purpose here is get a copy of src and we don't expect to touch dst->anon_vma even it is NULL. While after that commit, it is possible to reuse an anon_vma when dst->anon_vma is NULL. This is not we intend to have. This patch stops reuse of anon_vma for non-fork cases. Link: http://lkml.kernel.org/r/20191011072256.16275-1-richardw.yang@linux.intel.com Fixes: `7a3ef208e6` ("mm: prevent endless growth of anon_vma hierarchy") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Qian Cai <cai@lca.pw> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-12-01 06:29:19 -08:00
Ben Dooks	444f84fd2a	mm: include <linux/huge_mm.h> for is_vma_temporary_stack Include <linux/huge_mm.h> for the definition of is_vma_temporary_stack to fix the following sparse warning: mm/rmap.c:1673:6: warning: symbol 'is_vma_temporary_stack' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20191009151155.27763-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-10-19 06:32:32 -04:00
Song Liu	99cb0dbd47	mm,thp: add read-only THP support for (non-shmem) FS This patch is (hopefully) the first step to enable THP for non-shmem filesystems. This patch enables an application to put part of its text sections to THP via madvise, for example: madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE); We tried to reuse the logic for THP on tmpfs. Currently, write is not supported for non-shmem THP. khugepaged will only process vma with VM_DENYWRITE. sys_mmap() ignores VM_DENYWRITE requests (see ksys_mmap_pgoff). The only way to create vma with VM_DENYWRITE is execve(). This requirement limits non-shmem THP to text sections. The next patch will handle writes, which would only happen when the all the vmas with VM_DENYWRITE are unmapped. An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this feature. [songliubraving@fb.com: fix build without CONFIG_SHMEM] Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com [songliubraving@fb.com: fix double unlock in collapse_file()] Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:11 -07:00
Matthew Wilcox (Oracle)	d8c6546b1a	mm: introduce compound_nr() Replace 1 << compound_order(page) with compound_nr(page). Minor improvements in readability. Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:08 -07:00
Matthew Wilcox (Oracle)	a50b854e07	mm: introduce page_size() Patch series "Make working with compound pages easier", v2. These three patches add three helpers and convert the appropriate places to use them. This patch (of 3): It's unnecessarily hard to find out the size of a potentially huge page. Replace 'PAGE_SIZE << compound_order(page)' with page_size(page). Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:08 -07:00
YueHaibing	1f18b29669	mm/rmap.c: remove set but not used variable 'cstart' Fixes gcc '-Wunused-but-set-variable' warning: mm/rmap.c: In function page_mkclean_one: mm/rmap.c:906:17: warning: variable cstart set but not used [-Wunused-but-set-variable] It is not used any more since commit `cdb07bdea2` ("mm/rmap.c: remove redundant variable cend") Link: http://lkml.kernel.org/r/20190724141453.38536-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:08 -07:00
Ralph Campbell	1de13ee592	mm/hmm: fix bad subpage pointer in try_to_unmap_one When migrating an anonymous private page to a ZONE_DEVICE private page, the source page->mapping and page->index fields are copied to the destination ZONE_DEVICE struct page and the page_mapcount() is increased. This is so rmap_walk() can be used to unmap and migrate the page back to system memory. However, try_to_unmap_one() computes the subpage pointer from a swap pte which computes an invalid page pointer and a kernel panic results such as: BUG: unable to handle page fault for address: ffffea1fffffffc8 Currently, only single pages can be migrated to device private memory so no subpage computation is needed and it can be set to "page". [rcampbell@nvidia.com: add comment] Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com Fixes: `a5430dda8a` ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Huang Shijie	059d8442ea	mm/rmap.c: use the pra.mapcount to do the check We have the pra.mapcount already, and there is no need to call the page_mapped() which may do some complicated computing for compound page. Link: http://lkml.kernel.org/r/20190404054828.2731-1-sjhuang@iluvatar.ai Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:49 -07:00
Jérôme Glisse	7269f99993	mm/mmu_notifier: use correct mmu_notifier events for each invalidation This updates each existing invalidation to use the correct mmu notifier event that represent what is happening to the CPU page table. See the patch which introduced the events to see the rational behind this. Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:49 -07:00
Jérôme Glisse	6f4f13e8d9	mm/mmu_notifier: contextual information for event triggering invalidation CPU page table update can happens for many reasons, not only as a result of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as a result of kernel activities (memory compression, reclaim, migration, ...). Users of mmu notifier API track changes to the CPU page table and take specific action for them. While current API only provide range of virtual address affected by the change, not why the changes is happening. This patchset do the initial mechanical convertion of all the places that calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP event as well as the vma if it is know (most invalidation happens against a given vma). Passing down the vma allows the users of mmu notifier to inspect the new vma page protection. The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier should assume that every for the range is going away when that event happens. A latter patch do convert mm call path to use a more appropriate events for each call. This is done as 2 patches so that no call site is forgotten especialy as it uses this following coccinelle patch: %<---------------------------------------------------------------------- @@ identifier I1, I2, I3, I4; @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range I1, +enum mmu_notifier_event event, +unsigned flags, +struct vm_area_struct vma, struct mm_struct I2, unsigned long I3, unsigned long I4) { ... } @@ @@ -#define mmu_notifier_range_init(range, mm, start, end) +#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end) @@ expression E1, E3, E4; identifier I1; @@ <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, I1, I1->vm_mm, E3, E4) ...> @@ expression E1, E2, E3, E4; identifier FN, VMA; @@ FN(..., struct vm_area_struct VMA, ...) { <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, VMA, E2, E3, E4) ...> } @@ expression E1, E2, E3, E4; identifier FN, VMA; @@ FN(...) { struct vm_area_struct *VMA; <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, VMA, E2, E3, E4) ...> } @@ expression E1, E2, E3, E4; identifier FN; @@ FN(...) { <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, NULL, E2, E3, E4) ...> } ---------------------------------------------------------------------->% Applied with: spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place spatch --sp-file mmu-notifier.spatch --dir mm --in-place Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:49 -07:00

1 2 3 4 5 ...

431 Commits