08ed4cb090
141 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Greg Kroah-Hartman
|
08ed4cb090 |
Merge 5.10.67 into android12-5.10-lts
Changes in 5.10.67 rtc: tps65910: Correct driver module alias io_uring: limit fixed table size by RLIMIT_NOFILE io_uring: place fixed tables under memcg limits io_uring: add ->splice_fd_in checks io_uring: fail links of cancelled timeouts io-wq: fix wakeup race when adding new work btrfs: wake up async_delalloc_pages waiters after submit btrfs: reset replace target device to allocation state on close blk-zoned: allow zone management send operations without CAP_SYS_ADMIN blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN PCI/MSI: Skip masking MSI-X on Xen PV powerpc/perf/hv-gpci: Fix counter value parsing xen: fix setting of max_pfn in shared_info 9p/xen: Fix end of loop tests for list_for_each_entry ceph: fix dereference of null pointer cf selftests/ftrace: Fix requirement check of README file tools/thermal/tmon: Add cross compiling support clk: socfpga: agilex: fix the parents of the psi_ref_clk clk: socfpga: agilex: fix up s2f_user0_clk representation clk: socfpga: agilex: add the bypass register for s2f_usr0 clock pinctrl: stmfx: Fix hazardous u8[] to unsigned long cast pinctrl: ingenic: Fix incorrect pull up/down info soc: qcom: aoss: Fix the out of bound usage of cooling_devs soc: aspeed: lpc-ctrl: Fix boundary check for mmap soc: aspeed: p2a-ctrl: Fix boundary check for mmap arm64: mm: Fix TLBI vs ASID rollover arm64: head: avoid over-mapping in map_memory iio: ltc2983: fix device probe wcn36xx: Ensure finish scan is not requested before start scan crypto: public_key: fix overflow during implicit conversion block: bfq: fix bfq_set_next_ioprio_data() power: supply: max17042: handle fails of reading status register dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() crypto: ccp - shutdown SEV firmware on kexec VMCI: fix NULL pointer dereference when unmapping queue pair media: uvc: don't do DMA on stack media: rc-loopback: return number of emitters rather than error s390/qdio: fix roll-back after timeout on ESTABLISH ccw s390/qdio: cancel the ESTABLISH ccw after timeout Revert "dmaengine: imx-sdma: refine to load context only once" dmaengine: imx-sdma: remove duplicated sdma_load_context libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs ARM: 9105/1: atags_to_fdt: don't warn about stack size f2fs: fix to do sanity check for sb/cp fields correctly PCI/portdrv: Enable Bandwidth Notification only if port supports it PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure PCI: xilinx-nwl: Enable the clock through CCF PCI: aardvark: Configure PCIe resources from 'ranges' DT property PCI: Export pci_pio_to_address() for module use PCI: aardvark: Fix checking for PIO status PCI: aardvark: Fix masking and unmasking legacy INTx interrupts HID: input: do not report stylus battery state as "full" f2fs: quota: fix potential deadlock pinctrl: remove empty lines in pinctrl subsystem pinctrl: armada-37xx: Correct PWM pins definitions scsi: bsg: Remove support for SCSI_IOCTL_SEND_COMMAND clk: rockchip: drop GRF dependency for rk3328/rk3036 pll types IB/hfi1: Adjust pkey entry in index 0 RDMA/iwcm: Release resources if iw_cm module initialization fails docs: Fix infiniband uverbs minor number scsi: BusLogic: Use %X for u32 sized integer rather than %lX pinctrl: samsung: Fix pinctrl bank pin count vfio: Use config not menuconfig for VFIO_NOIOMMU scsi: ufs: Fix memory corruption by ufshcd_read_desc_param() cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards powerpc/stacktrace: Include linux/delay.h RDMA/efa: Remove double QP type assignment RDMA/mlx5: Delete not-available udata check cpuidle: pseries: Mark pseries_idle_proble() as __init f2fs: reduce the scope of setting fsck tag when de->name_len is zero openrisc: don't printk() unconditionally dma-debug: fix debugfs initialization order NFSv4/pNFS: Fix a layoutget livelock loop NFSv4/pNFS: Always allow update of a zero valued layout barrier NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid SUNRPC: Fix potential memory corruption SUNRPC/xprtrdma: Fix reconnection locking SUNRPC query transport's source port sunrpc: Fix return value of get_srcport() scsi: fdomain: Fix error return code in fdomain_probe() pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry() powerpc/numa: Consider the max NUMA node for migratable LPAR scsi: smartpqi: Fix an error code in pqi_get_raid_map() scsi: qedi: Fix error codes in qedi_alloc_global_queues() scsi: qedf: Fix error codes in qedf_alloc_global_queues() powerpc/config: Renable MTD_PHYSMAP_OF iommu/vt-d: Update the virtual command related registers HID: i2c-hid: Fix Elan touchpad regression clk: imx8m: fix clock tree update of TF-A managed clocks KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines scsi: ufs: ufs-exynos: Fix static checker warning KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call powerpc/smp: Update cpu_core_map on all PowerPc systems RDMA/hns: Fix QP's resp incomplete assignment fscache: Fix cookie key hashing clk: at91: clk-generated: Limit the requested rate to our range KVM: PPC: Fix clearing never mapped TCEs in realmode soc: mediatek: cmdq: add address shift in jump f2fs: fix to account missing .skipped_gc_rwsem f2fs: fix unexpected ENOENT comes from f2fs_map_blocks() f2fs: fix to unmap pages from userspace process in punch_hole() f2fs: deallocate compressed pages when error happens f2fs: should put a page beyond EOF when preparing a write MIPS: Malta: fix alignment of the devicetree buffer kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y userfaultfd: prevent concurrent API initialization drm/vc4: hdmi: Set HD_CTL_WHOLSMP and HD_CTL_CHALIGN_SET drm/amdgpu: Fix amdgpu_ras_eeprom_init() ASoC: atmel: ATMEL drivers don't need HAS_DMA media: dib8000: rewrite the init prbs logic libbpf: Fix reuse of pinned map on older kernel x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable crypto: mxs-dcp - Use sg_mapping_iter to copy data PCI: Use pci_update_current_state() in pci_enable_device_flags() tipc: keep the skb in rcv queue until the whole data is read net: phy: Fix data type in DP83822 dp8382x_disable_wol() iio: dac: ad5624r: Fix incorrect handling of an optional regulator. iavf: do not override the adapter state in the watchdog task iavf: fix locking of critical sections ARM: dts: qcom: apq8064: correct clock names video: fbdev: kyro: fix a DoS bug by restricting user input netlink: Deal with ESRCH error in nlmsg_notify() Smack: Fix wrong semantics in smk_access_entry() drm: avoid blocking in drm_clients_info's rcu section drm: serialize drm_file.master with a new spinlock drm: protect drm_master pointers in drm_lease.c rcu: Fix macro name CONFIG_TASKS_RCU_TRACE igc: Check if num of q_vectors is smaller than max before array access usb: host: fotg210: fix the endpoint's transactional opportunities calculation usb: host: fotg210: fix the actual_length of an iso packet usb: gadget: u_ether: fix a potential null pointer dereference USB: EHCI: ehci-mv: improve error handling in mv_ehci_enable() usb: gadget: composite: Allow bMaxPower=0 if self-powered staging: board: Fix uninitialized spinlock when attaching genpd tty: serial: jsm: hold port lock when reporting modem line changes bus: fsl-mc: fix mmio base address for child DPRCs selftests: firmware: Fix ignored return val of asprintf() warn drm/amd/display: Fix timer_per_pixel unit error media: hantro: vp8: Move noisy WARN_ON to vpu_debug media: platform: stm32: unprepare clocks at handling errors in probe media: atomisp: Fix runtime PM imbalance in atomisp_pci_probe media: atomisp: pci: fix error return code in atomisp_pci_probe() nfp: fix return statement in nfp_net_parse_meta() ethtool: improve compat ioctl handling drm/amdgpu: Fix a printing message drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex bpf/tests: Fix copy-and-paste error in double word test bpf/tests: Do not PASS tests without actually testing the result drm/bridge: nwl-dsi: Avoid potential multiplication overflow on 32-bit arm64: dts: allwinner: h6: tanix-tx6: Fix regulator node names video: fbdev: asiliantfb: Error out if 'pixclock' equals zero video: fbdev: kyro: Error out if 'pixclock' equals zero video: fbdev: riva: Error out if 'pixclock' equals zero ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs() flow_dissector: Fix out-of-bounds warnings s390/jump_label: print real address in a case of a jump label bug s390: make PCI mio support a machine flag serial: 8250: Define RX trigger levels for OxSemi 950 devices xtensa: ISS: don't panic in rs_init hvsi: don't panic on tty_register_driver failure serial: 8250_pci: make setup_port() parameters explicitly unsigned staging: ks7010: Fix the initialization of the 'sleep_status' structure samples: bpf: Fix tracex7 error raised on the missing argument libbpf: Fix race when pinning maps in parallel ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init() Bluetooth: skip invalid hci_sync_conn_complete_evt workqueue: Fix possible memory leaks in wq_numa_init() ARM: dts: stm32: Set {bitclock,frame}-master phandles on DHCOM SoM ARM: dts: stm32: Set {bitclock,frame}-master phandles on ST DKx ARM: dts: stm32: Update AV96 adv7513 node per dtbs_check bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() ARM: dts: at91: use the right property for shutdown controller arm64: tegra: Fix Tegra194 PCIe EP compatible string ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output ASoC: Intel: update sof_pcm512x quirks media: imx258: Rectify mismatch of VTS value media: imx258: Limit the max analogue gain to 480 media: v4l2-dv-timings.c: fix wrong condition in two for-loops media: TDA1997x: fix tda1997x_query_dv_timings() return value media: tegra-cec: Handle errors of clk_prepare_enable() gfs2: Fix glock recursion in freeze_go_xmote_bh arm64: dts: qcom: sdm630: Rewrite memory map arm64: dts: qcom: sdm630: Fix TLMM node and pinctrl configuration serial: 8250_omap: Handle optional overrun-throttle-ms property ARM: dts: imx53-ppd: Fix ACHC entry arm64: dts: qcom: ipq8074: fix pci node reg property arm64: dts: qcom: sdm660: use reg value for memory node arm64: dts: qcom: ipq6018: drop '0x' from unit address arm64: dts: qcom: sdm630: don't use underscore in node name arm64: dts: qcom: msm8994: don't use underscore in node name arm64: dts: qcom: msm8996: don't use underscore in node name arm64: dts: qcom: sm8250: Fix epss_l3 unit address nvmem: qfprom: Fix up qfprom_disable_fuse_blowing() ordering net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe() drm/msm: mdp4: drop vblank get/put from prepare/complete_commit drm/msm/dsi: Fix DSI and DSI PHY regulator config from SDM660 drm: xlnx: zynqmp_dpsub: Call pm_runtime_get_sync before setting pixel clock drm: xlnx: zynqmp: release reset to DP controller before accessing DP registers thunderbolt: Fix port linking by checking all adapters drm/amd/display: fix missing writeback disablement if plane is removed drm/amd/display: fix incorrect CM/TF programming sequence in dwb selftests/bpf: Fix xdp_tx.c prog section name drm/vmwgfx: fix potential UAF in vmwgfx_surface.c Bluetooth: schedule SCO timeouts with delayed_work Bluetooth: avoid circular locks in sco_sock_connect drm/msm/dp: return correct edid checksum after corrupted edid checksum read net/mlx5: Fix variable type to match 64bit gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port() drm/display: fix possible null-pointer dereference in dcn10_set_clock() mac80211: Fix monitor MTU limit so that A-MSDUs get through ARM: tegra: acer-a500: Remove bogus USB VBUS regulators ARM: tegra: tamonten: Fix UART pad setting arm64: tegra: Fix compatible string for Tegra132 CPUs arm64: dts: ls1046a: fix eeprom entries nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data nvme: code command_id with a genctr for use-after-free validation Bluetooth: Fix handling of LE Enhanced Connection Complete opp: Don't print an error if required-opps is missing serial: sh-sci: fix break handling for sysrq iomap: pass writeback errors to the mapping tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD rpc: fix gss_svc_init cleanup on failure selftests/bpf: Fix flaky send_signal test hwmon: (pmbus/ibm-cffps) Fix write bits for LED control staging: rts5208: Fix get_ms_information() heap buffer size net: Fix offloading indirect devices dependency on qdisc order creation kselftest/arm64: mte: Fix misleading output when skipping tests kselftest/arm64: pac: Fix skipping of tests on systems without PAC gfs2: Don't call dlm after protocol is unmounted usb: chipidea: host: fix port index underflow and UBSAN complains lockd: lockd server-side shouldn't set fl_ops drm/exynos: Always initialize mapping in exynos_drm_register_dma() rtl8xxxu: Fix the handling of TX A-MPDU aggregation rtw88: use read_poll_timeout instead of fixed sleep rtw88: wow: build wow function only if CONFIG_PM is on rtw88: wow: fix size access error of probe request octeontx2-pf: Fix NIX1_RX interface backpressure m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch btrfs: tree-log: check btrfs_lookup_data_extent return value soundwire: intel: fix potential race condition during power down ASoC: Intel: Skylake: Fix module configuration for KPB and MIXER ASoC: Intel: Skylake: Fix passing loadable flag for module of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS mmc: sdhci-of-arasan: Modified SD default speed to 19MHz for ZynqMP mmc: sdhci-of-arasan: Check return value of non-void funtions mmc: rtsx_pci: Fix long reads when clock is prescaled selftests/bpf: Enlarge select() timeout for test_maps mmc: core: Return correct emmc response in case of ioctl error cifs: fix wrong release in sess_alloc_buffer() failed path Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set" usb: musb: musb_dsps: request_irq() after initializing musb usbip: give back URBs for unsent unlink requests during cleanup usbip:vhci_hcd USB port can get stuck in the disabled state ASoC: rockchip: i2s: Fix regmap_ops hang ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B drm/amdkfd: Account for SH/SE count when setting up cu masks. nfsd: fix crash on LOCKT on reexported NFSv3 iwlwifi: pcie: free RBs during configure iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed iwlwifi: mvm: avoid static queue number aliasing iwlwifi: mvm: fix access to BSS elements iwlwifi: fw: correctly limit to monitor dump iwlwifi: mvm: Fix scan channel flags settings net/mlx5: DR, fix a potential use-after-free bug net/mlx5: DR, Enable QP retransmission parport: remove non-zero check on count selftests/bpf: Fix potential unreleased lock wcn36xx: Fix missing frame timestamp for beacon/probe-resp ath9k: fix OOB read ar9300_eeprom_restore_internal ath9k: fix sleeping in atomic context net: fix NULL pointer reference in cipso_v4_doi_free fix array-index-out-of-bounds in taprio_change net: w5100: check return value after calling platform_get_resource() net: hns3: clean up a type mismatch warning fs/io_uring Don't use the return value from import_iovec(). io_uring: remove duplicated io_size from rw parisc: fix crash with signals and alloca ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup() scsi: BusLogic: Fix missing pr_cont() use scsi: qla2xxx: Changes to support kdump kernel scsi: qla2xxx: Sync queue idx with queue_pair_map idx cpufreq: powernv: Fix init_chip_info initialization in numa=off s390/pv: fix the forcing of the swiotlb hugetlb: fix hugetlb cgroup refcounting during vma split mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled mm/hugetlb: initialize hugetlb_usage in mm_init mm,vmscan: fix divide by zero in get_scan_count memcg: enable accounting for pids in nested pid namespaces libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind platform/chrome: cros_ec_proto: Send command again when timeout occurs lib/test_stackinit: Fix static initializer test net: dsa: lantiq_gswip: fix maximum frame length drm/mgag200: Select clock in PLL update functions drm/msi/mdp4: populate priv->kms in mdp4_kms_init drm/dp_mst: Fix return code on sideband message failure drm/panfrost: Make sure MMU context lifetime is not bound to panfrost_priv drm/amdgpu: Fix BUG_ON assert drm/amd/display: Update number of DCN3 clock states drm/amd/display: Update bounding box states (v2) drm/panfrost: Simplify lock_region calculation drm/panfrost: Use u64 for size in lock_region drm/panfrost: Clamp lock region to Bifrost minimum fanotify: limit number of event merge attempts Linux 5.10.67 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic8df59518265d0cdf724e93e8922cde48fc85ce9 |
||
Nadav Amit
|
6afd1e053d |
userfaultfd: prevent concurrent API initialization
[ Upstream commit 22e5fe2a2a279d9a6fcbdfb4dffe73821bef1c90 ]
userfaultfd assumes that the enabled features are set once and never
changed after UFFDIO_API ioctl succeeded.
However, currently, UFFDIO_API can be called concurrently from two
different threads, succeed on both threads and leave userfaultfd's
features in non-deterministic state. Theoretically, other uffd operations
(ioctl's and page-faults) can be dispatched while adversely affected by
such changes of features.
Moreover, the writes to ctx->state and ctx->features are not ordered,
which can - theoretically, again - let userfaultfd_ioctl() think that
userfaultfd API completed, while the features are still not initialized.
To avoid races, it is arguably best to get rid of ctx->state. Since there
are only 2 states, record the API initialization in ctx->features as the
uppermost bit and remove ctx->state.
Link: https://lkml.kernel.org/r/20210808020724.1022515-3-namit@vmware.com
Fixes:
|
||
Greg Kroah-Hartman
|
e4cac2c332 |
This is the 5.10.54 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEBT00ACgkQONu9yGCS aT7svA/9HCRwW+pK3UpK1+0FK7gGH8DA3jSONj775zVEKhboDZNIwZsDqG0Ly+jm /JejWXPKZlekaDXgzBfZY3H59xgij/VwYHe8p7cdxfi1TlhmAQwFjLNZnWav8as6 IyNkpsDJn8fMXmfHDi2u3cb8wrVi/aQDzTlwu88cUtREyZCaaYlo0Fdv9MJyhww/ p6LWPYQoZ8TmFY+Y/2ORVxFos2UVuU0hhhMdGt9LrX2WNEGRNUZUqmbhcXYfdsX0 ckSHbijIcWdcka3nQ6yOvdxw75rTqd8c/bP0y+yAteeJ0CykjVnI2cdK+M2ZEi4j /JqpGJrRWhsZf5MiO8b3k+I1K62JDa1GYBQ9Amp8FKKzjYLPTNeFAP9IsMyDc4oi oW98XM7XzoSEU9t/FSAIGT0hYK9k+lnPxw623LhxD6x3VPynnNAnQsLr+HirOgG6 mZ79L4ZFu3lUvVsCuCgKn/uxwDopUNlhqo5B2/4M2kSWwe2Xu5bExpGc2bT9xCOP 6fF9DmvmpG1UPGCXrOqaxemyEPmHqmyjKJpxDt6vZqlOL9vqHez4WmEEM1C+E2NZ 5VKKbBk/KZDxNX9EiFOtI2HRFb1cghoI2Hcb/QjRoB9Dv3a6cHgjxDl0eKm8SiDN +1ytV0IFH3fT4aRiXJ7I3GBwkjKcDaX0sjYwtnCx9s5XZmm9PRQ= =HAyL -----END PGP SIGNATURE----- Merge 5.10.54 into android12-5.10-lts Changes in 5.10.54 igc: Fix use-after-free error during reset igb: Fix use-after-free error during reset igc: change default return of igc_read_phy_reg() ixgbe: Fix an error handling path in 'ixgbe_probe()' igc: Fix an error handling path in 'igc_probe()' igb: Fix an error handling path in 'igb_probe()' fm10k: Fix an error handling path in 'fm10k_probe()' e1000e: Fix an error handling path in 'e1000_probe()' iavf: Fix an error handling path in 'iavf_probe()' igb: Check if num of q_vectors is smaller than max before array access igb: Fix position of assignment to *ring gve: Fix an error handling path in 'gve_probe()' net: add kcov handle to skb extensions bonding: fix suspicious RCU usage in bond_ipsec_add_sa() bonding: fix null dereference in bond_ipsec_add_sa() ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops bonding: fix suspicious RCU usage in bond_ipsec_del_sa() bonding: disallow setting nested bonding + ipsec offload bonding: Add struct bond_ipesc to manage SA bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() bonding: fix incorrect return value of bond_ipsec_offload_ok() ipv6: fix 'disable_policy' for fwd packets stmmac: platform: Fix signedness bug in stmmac_probe_config_dt() selftests: icmp_redirect: remove from checking for IPv6 route get selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped cxgb4: fix IRQ free race during driver unload mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join nvme-pci: do not call nvme_dev_remove_admin from nvme_remove KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM perf inject: Fix dso->nsinfo refcounting perf map: Fix dso->nsinfo refcounting perf probe: Fix dso->nsinfo refcounting perf env: Fix sibling_dies memory leak perf test session_topology: Delete session->evlist perf test event_update: Fix memory leak of evlist perf dso: Fix memory leak in dso__new_map() perf test maps__merge_in: Fix memory leak of maps perf env: Fix memory leak of cpu_pmu_caps perf report: Free generated help strings for sort option perf script: Fix memory 'threads' and 'cpus' leaks on exit perf lzma: Close lzma stream on exit perf probe-file: Delete namelist in del_events() on the error path perf data: Close all files in close_dir() perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set ASoC: wm_adsp: Correct wm_coeff_tlv_get handling spi: imx: add a check for speed_hz before calculating the clock spi: stm32: fixes pm_runtime calls in probe/remove regulator: hi6421: Use correct variable type for regmap api val argument regulator: hi6421: Fix getting wrong drvdata spi: mediatek: fix fifo rx mode ASoC: rt5631: Fix regcache sync errors on resume bpf, test: fix NULL pointer dereference on invalid expected_attach_type bpf: Fix tail_call_reachable rejection for interpreter when jit failed xdp, net: Fix use-after-free in bpf_xdp_link_release timers: Fix get_next_timer_interrupt() with no timers pending liquidio: Fix unintentional sign extension issue on left shift of u16 s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1] bpf, sockmap: Fix potential memory leak on unlikely error case bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats bpftool: Check malloc return value in mount_bpffs_for_pin net: fix uninit-value in caif_seqpkt_sendmsg usb: hso: fix error handling code of hso_create_net_device dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} efi/tpm: Differentiate missing and invalid final event log table. net: decnet: Fix sleeping inside in af_decnet KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak net: sched: fix memory leak in tcindex_partial_destroy_work sctp: trim optlen when it's a huge value in sctp_setsockopt netrom: Decrease sock refcount when sock timers expire scsi: iscsi: Fix iface sysfs attr detection scsi: target: Fix protect handling in WRITE SAME(32) spi: cadence: Correct initialisation of runtime PM again ACPI: Kconfig: Fix table override from built-in initrd bnxt_en: don't disable an already disabled PCI device bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe() bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task() bnxt_en: Validate vlan protocol ID on RX packets bnxt_en: Check abort error state in bnxt_half_open_nic() net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition net/tcp_fastopen: fix data races around tfo_active_disable_stamp ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID net: hns3: fix possible mismatches resp of mailbox net: hns3: fix rx VLAN offload state inconsistent issue spi: spi-bcm2835: Fix deadlock net/sched: act_skbmod: Skip non-Ethernet packets ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions ceph: don't WARN if we're still opening a session to an MDS nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem" afs: Fix tracepoint string placement with built-in AFS r8169: Avoid duplicate sysfs entry creation error nvme: set the PRACT bit when using Write Zeroes with T10 PI sctp: update active_key for asoc when old key is being replaced tcp: disable TFO blackhole logic by default net: dsa: sja1105: make VID 4095 a bridge VLAN too net: sched: cls_api: Fix the the wrong parameter drm/panel: raspberrypi-touchscreen: Prevent double-free cifs: only write 64kb at a time when fallocating a small region of a file cifs: fix fallocate when trying to allocate a hole. proc: Avoid mixing integer types in mem_rw() mmc: core: Don't allocate IDA for OF aliases s390/ftrace: fix ftrace_update_ftrace_func implementation s390/boot: fix use of expolines in the DMA code ALSA: usb-audio: Add missing proc text entry for BESPOKEN type ALSA: usb-audio: Add registration quirk for JBL Quantum headsets ALSA: sb: Fix potential ABBA deadlock in CSP driver ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine ALSA: hdmi: Expose all pins on MSI MS-7C94 board ALSA: pcm: Call substream ack() method upon compat mmap commit ALSA: pcm: Fix mmap capability check Revert "usb: renesas-xhci: Fix handling of unknown ROM state" usb: xhci: avoid renesas_usb_fw.mem when it's unusable xhci: Fix lost USB 2 remote wake KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state usb: hub: Disable USB 3 device initiated lpm if exit latency is too high usb: hub: Fix link power management max exit latency (MEL) calculations USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS usb: max-3421: Prevent corruption of freed memory usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop() USB: serial: option: add support for u-blox LARA-R6 family USB: serial: cp210x: fix comments for GE CS1000 USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode. usb: dwc2: gadget: Fix sending zero length packet in DDMA mode. usb: typec: stusb160x: register role switch before interrupt registration firmware/efi: Tell memblock about EFI iomem reservations tracepoints: Update static_call before tp_funcs when adding a tracepoint tracing/histogram: Rename "cpu" to "common_cpu" tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. tracing: Synthetic event field_pos is an index not a boolean btrfs: check for missing device in btrfs_trim_fs media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf() ixgbe: Fix packet corruption due to missing DMA sync bus: mhi: core: Validate channel ID when processing command completions posix-cpu-timers: Fix rearm racing against process tick selftest: use mmap instead of posix_memalign to allocate memory io_uring: explicitly count entries for poll reqs io_uring: remove double poll entry on arm failure userfaultfd: do not untag user pointers memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions hugetlbfs: fix mount mode command line processing rbd: don't hold lock_rwsem while running_list is being drained rbd: always kick acquire on "acquired" and "released" notifications misc: eeprom: at24: Always append device id even if label property is set. nds32: fix up stack guard gap driver core: Prevent warning when removing a device link from unregistered consumer drm: Return -ENOTTY for non-drm ioctls drm/amdgpu: update golden setting for sienna_cichlid net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz PCI: Mark AMD Navi14 GPU ATS as broken bonding: fix build issue skbuff: Release nfct refcount on napi stolen or re-used skbs Documentation: Fix intiramfs script name perf inject: Close inject.output on exit usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI drm/i915/gvt: Clear d3_entered on elsp cmd submission. sfc: ensure correct number of XDP queues xhci: add xhci_get_virt_ep() helper skbuff: Fix build with SKB extensions disabled Linux 5.10.54 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e |
||
Peter Collingbourne
|
0b591c020d |
userfaultfd: do not untag user pointers
commit e71e2ace5721a8b921dca18b045069e7bb411277 upstream. Patch series "userfaultfd: do not untag user pointers", v5. If a user program uses userfaultfd on ranges of heap memory, it may end up passing a tagged pointer to the kernel in the range.start field of the UFFDIO_REGISTER ioctl. This can happen when using an MTE-capable allocator, or on Android if using the Tagged Pointers feature for MTE readiness [1]. When a fault subsequently occurs, the tag is stripped from the fault address returned to the application in the fault.address field of struct uffd_msg. However, from the application's perspective, the tagged address *is* the memory address, so if the application is unaware of memory tags, it may get confused by receiving an address that is, from its point of view, outside of the bounds of the allocation. We observed this behavior in the kselftest for userfaultfd [2] but other applications could have the same problem. Address this by not untagging pointers passed to the userfaultfd ioctls. Instead, let the system call fail. Also change the kselftest to use mmap so that it doesn't encounter this problem. [1] https://source.android.com/devices/tech/debug/tagged-pointers [2] tools/testing/selftests/vm/userfaultfd.c This patch (of 2): Do not untag pointers passed to the userfaultfd ioctls. Instead, let the system call fail. This will provide an early indication of problems with tag-unaware userspace code instead of letting the code get confused later, and is consistent with how we decided to handle brk/mmap/mremap in commit |
||
Axel Rasmussen
|
d672123ec4 |
BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem
Patch series "userfaultfd: support minor fault handling for shmem", v2. Overview ======== See my original series [1] for a detailed overview of minor fault handling in general. The feature in this series works exactly like the hugetblfs version (from userspace's perspective). I'm sending this as a separate series because: - The original minor fault handling series has a full set of R-Bs, and seems close to being merged. So, it seems reasonable to start looking at this next step, which extends the basic functionality. - shmem is different enough that this series may require some additional work before it's ready, and I don't want to delay the original series unnecessarily by bundling them together. Use Case ======== In some cases it is useful to have VM memory backed by tmpfs instead of hugetlbfs. So, this feature will be used to support the same VM live migration use case described in my original series. Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope to optimize the Android Runtime garbage collector using this feature: "The plan is to use userfaultfd for concurrently compacting the heap. With this feature, the heap can be shared-mapped at another location where the GC-thread(s) could continue the compaction operation without the need to invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads get faults on the heap, UFFDIO_CONTINUE can be used to resume execution. Furthermore, this feature enables updating references in the 'non-moving' portion of the heap efficiently. Without this feature, uneccessary page copying (ioctl(UFFDIO_COPY)) would be required." [1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t This patch (of 5): Modify the userfaultfd register API to allow registering shmem VMAs in minor mode. Modify the shmem mcopy implementation to support UFFDIO_CONTINUE in order to resolve such faults. Combine the shmem mcopy handler functions into a single shmem_mcopy_atomic_pte, which takes a mode parameter. This matches how the hugetlbfs implementation is structured, and lets us remove a good chunk of boilerplate. Link: https://lkml.kernel.org/r/20210302000133.272579-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210302000133.272579-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Wang Qing <wangqing@vivo.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 4cc6e15679966aa49afc5b114c3c83ba0ac39b05 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388146/ Conflicts: mm/shmem.c (1. Manual rebase 2. Enclosed shmem_copy_atomic_pte() with CONFIG_USERFAULTFD to avoid compile erros when USERFAULTFD is not enabled.) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Idcd822b2a124a089121b9ad8c65061f6979126ec |
||
Axel Rasmussen
|
4a5cf92412 |
BACKPORT: FROMGIT: userfaultfd: add UFFDIO_CONTINUE ioctl
This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page **pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 14ea86439abaf3423cd9b6712ed5ce8451d2d181 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388136/ Conflicts: mm/hugetlb.c (8f251a3d5ce3bdea73bd045ed35db64f32e0d0d9 is not cherry-picked yet so switched SetHPageMigratable() to set_active_huge_page()) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I45b62959dcb1d343154cb831113a26e47e77c8af |
||
Axel Rasmussen
|
4d3dd339de |
BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388132/ Conflicts: arch/arm64/Kconfig fs/userfaultfd.c mm/hugetlb.c (All related to SPF feature. Resolved by manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I43b37272d531341439ceaa03213d0e2415e04688 |
||
Peter Xu
|
343cacfa06 |
FROMGIT: hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp
Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Link: https://lkml.kernel.org/r/20210218231206.15524-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 267bda5c9993856b86f91a998df632b29cf517e2 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1382208/ Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I99d541ce45aaf924fa912f00dafa4caefe307755 |
||
Lokesh Gidra
|
6a6bc06393 |
UPSTREAM: userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. The main reason this change is desirable as in the short term is that the Android userland will behave as with the sysctl set to zero. So without this commit, any Linux binary using userfaultfd to manage its memory would behave differently if run within the Android userland. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Daniel Colascione <dancol@dancol.org> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: <calin@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Shaohua Li <shli@fb.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nitin Gupta <nigupta@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d0d4730ac2e404a5b0da9a87ef38c73e51cb1664) Bug: 160737021 Bug: 169683130 Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Change-Id: I08b7080b49ca626c5ab41bb2621fa21fa9a928a2 |
||
Lokesh Gidra
|
b8af1f96cc |
UPSTREAM: userfaultfd: add UFFD_USER_MODE_ONLY
Patch series "Control over userfaultfd kernel-fault handling", v6. This patch series is split from [1]. The other series enables SELinux support for userfaultfd file descriptors so that its creation and movement can be controlled. It has been demonstrated on various occasions that suspending kernel code execution for an arbitrary amount of time at any access to userspace memory (copy_from_user()/copy_to_user()/...) can be exploited to change the intended behavior of the kernel. For instance, handling page faults in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise, FUSE, which is similar to userfaultfd in this respect, has been exploited in [4, 5] for similar outcome. This small patch series adds a new flag to userfaultfd(2) that allows callers to give up the ability to handle kernel-mode faults with the resulting UFFD file object. It then adds a 'user-mode only' option to the unprivileged_userfaultfd sysctl knob to require unprivileged callers to use this new flag. The purpose of this new interface is to decrease the chance of an unprivileged userfaultfd user taking advantage of userfaultfd to enhance security vulnerabilities by lengthening the race window in kernel code. [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/ [2] https://duasynt.com/blog/linux-kernel-heap-spray [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit [4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808 This patch (of 2): userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <calin@google.com> Cc: Daniel Colascione <dancol@dancol.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nitin Gupta <nigupta@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shaohua Li <shli@fb.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 37cd0575b8510159992d279c530c05f872990b02) Bug: 160737021 Bug: 169683130 Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Change-Id: I19ff309b616c7a4a247e8c8427a87caffb1b2df9 |
||
Daniel Colascione
|
dbc935c62b |
UPSTREAM: userfaultfd: use secure anon inodes for userfaultfd
This change gives userfaultfd file descriptors a real security context, allowing policy to act on them. Signed-off-by: Daniel Colascione <dancol@google.com> [LG: Remove owner inode from userfaultfd_ctx] [LG: Use anon_inode_getfd_secure() in userfaultfd syscall] [LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()] Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> (cherry picked from commit b537900f1598b67bcb8acac20da73c6e26ebbf99) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Ib2973ca3650a8defe15eded13294a3fb25356b9d |
||
Laurent Dufour
|
9cfe16897f |
FROMLIST: mm: protect VMA modifications using VMA sequence count
The VMA sequence count has been introduced to allow fast detection of VMA modification when running a page fault handler without holding the mmap_sem. This patch provides protection against the VMA modification done in : - madvise() - mpol_rebind_policy() - vma_replace_policy() - change_prot_numa() - mlock(), munlock() - mprotect() - mmap_region() - collapse_huge_page() - userfaultd registering services In addition, VMA fields which will be read during the speculative fault path needs to be written using WRITE_ONCE to prevent write to be split and intermediate values to be pushed to other CPUs. Change-Id: Ic36046b7254e538b6baf7144c50ae577ee7f2074 Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-10-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Greg Kroah-Hartman
|
05d2a661fd |
Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1 Resolves conflicts in: fs/userfaultfd.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af |
||
Jann Horn
|
4d45e75a99 |
mm: remove the now-unnecessary mmget_still_valid() hack
The preceding patches have ensured that core dumping properly takes the mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all its users. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Kroah-Hartman
|
d7b0856eac |
Merge 00e4db5125 ("Merge tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") into android-mainline
Tiny steps on the way to 5.9-rc1. Fixes conflicts in: fs/f2fs/inline.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I16d863ae44a51156499458e8c3486587cbe2babe |
||
Linus Torvalds
|
97d052ea3f |
A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generaly useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul 81ppn2KlvzUK8g== =7Gj+ -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ... |
||
Greg Kroah-Hartman
|
003fccf2e7 |
Merge 99f6cf61f1 ("Merge branch 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux") into android-mainline
steps along the way to 5.9-rc1 Change-Id: I3090afff778aaa50064b4d8cce21cd7d8bf746a4 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Linus Torvalds
|
f9bf352224 |
userfaultfd: simplify fault handling
Instead of waiting in a loop for the userfaultfd condition to become true, just wait once and return VM_FAULT_RETRY. We've already dropped the mmap lock, we know we can't really successfully handle the fault at this point and the caller will have to retry anyway. So there's no point in making the wait any more complicated than it needs to be - just schedule away. And once you don't have that complexity with explicit looping, you can also just lose all the 'userfaultfd_signal_pending()' complexity, because once we've set the correct process sleeping state, and don't loop, the act of scheduling itself will be checking if there are any pending signals before going to sleep. We can also drop the VM_FAULT_MAJOR games, since we'll be treating all retried faults as major soon anyway (series to regularize and share more of fault handling across architectures in a separate series by Peter Xu, and in the meantime we won't worry about the possible minor - I'll be here all week, try the veal - accounting difference). Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Ahmed S. Darwish
|
2ca97ac8bd |
userfaultfd: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de |
||
Greg Kroah-Hartman
|
a253db8915 |
Merge ad57a1022f ("Merge tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") into android-mainline
Steps on the way to 5.8-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4 |
||
Michel Lespinasse
|
c1e8d7c6a7 |
mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michel Lespinasse
|
3e4e28c5a8 |
mmap locking API: convert mmap_sem API comments
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michel Lespinasse
|
42fc541404 |
mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
Add new APIs to assert that mmap_sem is held. Using this instead of rwsem_is_locked and lockdep_assert_held[_write] makes the assertions more tolerant of future changes to the lock type. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michel Lespinasse
|
d8ed45c5dc |
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Kroah-Hartman
|
2724136fc5 |
Merge 5d30bcacd9 ("Merge tag '9p-for-5.7-2' of git://github.com/martinetd/linux") into android-mainline
Baby steps on the way to 5.7-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I89095a90046a14eab189aab257a75b3dfdb5b1db |
||
Peter Xu
|
14819305e0 |
userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally
Only declare _UFFDIO_WRITEPROTECT if the user specified UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user registers regions with shmem/hugetlbfs we won't expose the new ioctl to them. Even with complete anonymous memory range, we'll only expose the new WP ioctl bit if the register mode has MODE_WP. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Peter Xu
|
23080e2783 |
userfaultfd: wp: don't wake up when doing write protect
It does not make sense to try to wake up any waiting thread when we're write-protecting a memory region. Only wake up when resolving a write protected page fault. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrea Arcangeli
|
63b2d4174c |
userfaultfd: wp: add the writeprotect API to userfaultfd ioctl
Introduce the new uffd-wp APIs for userspace. Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the userspace program can not only resolve missing page faults, and at the same time tracking page data changes along the way. Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level write protection tracking. Note that we will need to register the memory region with UFFDIO_REGISTER_MODE_WP before that. [peterx@redhat.com: write up the commit message] [peterx@redhat.com: remove useless block, write commit message, check against VM_MAYWRITE rather than VM_WRITE when register] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrea Arcangeli
|
72981e0e7b |
userfaultfd: wp: add UFFDIO_COPY_MODE_WP
This allows UFFDIO_COPY to map pages write-protected. [peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and commit messages] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Kroah-Hartman
|
47627513ce |
Merge e109f50607 ("Merge tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux") into android-mainline
Baby steps on the way to 5.7-rc1 Change-Id: I136ebb5242e3499873dcd5f5178ad7f68512d11c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Peter Xu
|
3e69ad081c |
mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path
Userfaultfd fault path was by default killable even if the caller does not have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE. Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code right now, this patch should have no functional change. It also cleaned the code a little bit by introducing some helpers. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Peter Xu
|
c270a7eedc |
mm: introduce FAULT_FLAG_INTERRUPTIBLE
handle_userfaultfd() is currently the only one place in the kernel page fault procedures that can respond to non-fatal userspace signals. It was trying to detect such an allowance by checking against USER & KILLABLE flags, which was "un-official". In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show that the fault handler allows the fault procedure to respond even to non-fatal signals. Meanwhile, add this new flag to the default fault flags so that all the page fault handlers can benefit from the new flag. With that, replacing the userfault check to this one. Since the line is getting even longer, clean up the fault flags a bit too to ease TTY users. Although we've got a new flag and applied it, we shouldn't have any functional change with this patch so far. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Peter Xu
|
ef429ee740 |
userfaultfd: don't retake mmap_sem to emulate NOPAGE
This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Kroah-Hartman
|
d3a196a371 |
Linux 5.5-rc1
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3tf/0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlKwH/3fTToujuJfTx5E5 mrARAP65J1L/DxpEKvKRt2bNZo6w13mNd8g7ZPmYChz90bYGvXQSG8hYTU9iAw3O yimSTJlNXDhVAluB53XnDdUxIWC4HUZsNxWJNCeXMuiMcGNsTGX+v3f+x7oHCT0P jI1RSIsFGjgr0RWqZ8U5aJckQo2xABC1TfYw53K66Oc/JLZpSFJFwMgjf1fD5diU HGDA8E2p0u1TQIyNzr86iqMvnlSRYBQwBQn6OgEKCG4Z0NLtXfDF4mqnxsXgLmIH oQoFfxaMKXyGWds7ZxwcGWntALCF41ThfpiJWDIyxjWxFEty4bqTCbDPwwyp7ip0 iuASmTI= =YqO2 -----END PGP SIGNATURE----- Merge 5.5-rc1 into android-mainline Linux 5.5-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237 |
||
Linus Torvalds
|
596cf45cbf |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: "Incoming: - a small number of updates to scripts/, ocfs2 and fs/buffer.c - most of MM I still have quite a lot of material (mostly not MM) staged after linux-next due to -next dependencies. I'll send those across next week as the preprequisites get merged up" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits) mm/page_io.c: annotate refault stalls from swap_readpage mm/Kconfig: fix trivial help text punctuation mm/Kconfig: fix indentation mm/memory_hotplug.c: remove __online_page_set_limits() mm: fix typos in comments when calling __SetPageUptodate() mm: fix struct member name in function comments mm/shmem.c: cast the type of unmap_start to u64 mm: shmem: use proper gfp flags for shmem_writepage() mm/shmem.c: make array 'values' static const, makes object smaller userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register() userfaultfd: wrap the common dst_vma check into an inlined function userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb() userfaultfd: use vma_pagesize for all huge page size calculation mm/madvise.c: use PAGE_ALIGN[ED] for range checking mm/madvise.c: replace with page_size() in madvise_inject_error() mm/mmap.c: make vma_merge() comment more easy to understand mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops autonuma: reduce cache footprint when scanning page tables autonuma: fix watermark checking in migrate_balanced_pgdat() ... |
||
Mike Rapoport
|
3c1c24d91f |
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
A while ago Andy noticed (http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com) that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have security implications. As the first step of the solution the following patch limits the availably of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE. The usage of CAP_SYS_PTRACE ensures compatibility with CRIU. Yet, if there are other users of non-cooperative userfaultfd that run without CAP_SYS_PTRACE, they would be broken :( Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file descriptor table from the read() implementation of uffd, which may have security implications for unprivileged use of the userfaultfd. Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have CAP_SYS_PTRACE. Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Nick Kralevich <nnk@google.com> Cc: Nosh Minwalla <nosh@google.com> Cc: Pavel Emelyanov <ovzxemul@gmail.com> Cc: Tim Murray <timmurray@google.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrea Arcangeli
|
9d4678eb17 |
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they need to be cleared. Currently setting UFFDIO_REGISTER_MODE_WP returns -EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support is applied. Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Arnd Bergmann
|
1832f2d8ff |
compat_ioctl: move more drivers to compat_ptr_ioctl
The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
||
Greg Kroah-Hartman
|
94139142d9 |
Merge 5.4-rc1-prelrease into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time before the final release happens. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e |
||
Andrey Konovalov
|
7d0325749a |
userfaultfd: untag user pointers
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. userfaultfd code use provided user pointers for vma lookups, which can only by done with untagged pointers. Untag user pointers in validate_range(). Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Kroah-Hartman
|
ad455d87e5 |
Linux 5.3-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl1i2wkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGcDQIAJINYON5WdDSFDpp htva213hSIxYLix8Dc4cTMk8qT/P2MAj9pPYERuLwIxWZlfbduW6Fxy8bJANZ7k3 4cJ/IbmA5M5ZIaOJTTL45w8H0CMR/4mdPl5rb5k/Wkh449Cj101gZLlh0FEtR5zG uDJecKSuHjH1ikySk6+zmRG5X+lq6wNY8NkuBtfwAwLffFc0ljQHwPUMJ8ojgqt/ p3ChNgtb/I6U6ExITlyktKdP59bAoHAoBiKKFZWw5yJWgXE2q4Sv9nT4Btkr5KdJ 9mnWnSaSLwptNCOtU4tKLwFIZP2WoVXGPNxxq4XLoTEuieXCqmikhc9tSSTwk+Tp CKHN6wU= =JkJ4 -----END PGP SIGNATURE----- Merge 5.3-rc6 into android-mainline Linux 5.3-rc6 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id10580d48d56054408b3efe0bd1866d67aba2a3d |
||
Oleg Nesterov
|
46d0b24c5e |
userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
mm->core_state != NULL.
Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.
Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes:
|
||
Greg Kroah-Hartman
|
a4bbf3df04 |
Linux 5.2
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0idTweHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZesIAJDKicw2Voyx8K8m 3pXSK+71RuO/d3Y9M51mdfTMKRP4PHR9/4wVZ9wHPwC4dV6wxgsmIYCF69a1Wety LD1MpDCP1DK5wVfPNKVX2xmj7ua6iutPtSsJHzdzM2TlscgsrFKjmUccqJ5JLwL5 c34nqwXWnzzRyI5Ga9cQSlwzAXq0vDHXyML3AnCosSsLX0lKFrHlK1zttdOPNkfj dXRN62g3q+9kVQozzhDXb8atZZ7IkBk8Q0lujpNXW83Ci1VjaVNv3SB8GZTXIlLj U15VdyuwfJDfpBgFBN6/unzVaAB6FFrEKy0jT1aeTyKarMKDKgOnJjn10aKjDNno /bXsKKc= =TVqV -----END PGP SIGNATURE----- Merge 5.2 into android-common Linux 5.2 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Eric Biggers
|
cbcfa130a9 |
fs/userfaultfd.c: disable irqs for fault_pending and event locks
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit |
||
Greg Kroah-Hartman
|
9a7ed8b83e |
Linux 5.2-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH VTS5zps= =huBY -----END PGP SIGNATURE----- Merge 5.2-rc6 into android-mainline Linux 5.2-rc6 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Thomas Gleixner
|
20c8ccb197 |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
Based on 1 normalized pattern(s): this work is licensed under the terms of the gnu gpl version 2 see the copying file in the top level directory extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 35 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Greg Kroah-Hartman
|
1226c72a32 |
Linux 5.2-rc1
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlzh3PgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGnhoH/jkl4X66KuOXvGXa 9pgFzyEa3Mhqs0j+AaJYmRyoRty1CbAfMLEWgr0JbZy56zm0PhtXOxcu56/tfdtw f5j8OJLDvld10imHXxUrom9zc546Ff90VTOvWmsYznszTz0rV5HLmKgVIIc7ZN8W 6hshDOy/rviUcPAVrLKdZffzgQZlASNS7To7IBE9okT4QHEtER7dgzM/Z0VAg9R1 guCPaN8tje4vq2Lv7+J5T9LOF1RObCbKXNADwXY1rMRK5ao3xS93eDnJ8Vn08utI UECIVfnYsA6pGl6v1ErCl9izx9MoTU3Crle7BRzVbrw7furvB2lJ1R4RGwqRbvcB HovhmHI= =TMir -----END PGP SIGNATURE----- Merge 5.2-rc1 into android-mainline Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Peter Xu
|
cefdca0a86 |
userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing use-after-free (and similar) bugs that might otherwise only make a short window or race condition available. By using userfaultfd to stall a kernel thread, a malicious program can keep some state that it wrote, stable for an extended period, which it can then access using an existing exploit. While it doesn't cause the exploit itself, and while it's not the only thing that can stall a kernel thread when accessing a memory location, it's one of the few that never needs privilege. We can add a flag, allowing userfaultfd to be restricted, so that in general it won't be useable by arbitrary user programs, but in environments that require userfaultfd it can be turned back on. Add a global sysctl knob "vm.unprivileged_userfaultfd" to control whether userfaultfd is allowed by unprivileged users. When this is set to zero, only privileged users (root user, or users with the CAP_SYS_PTRACE capability) will be able to use the userfaultfd syscalls. Andrea said: : The only difference between the bpf sysctl and the userfaultfd sysctl : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement, : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE : already if it's doing other kind of tracking on processes runtime, in : addition of userfaultfd. In other words both syscalls works only for : root, when the two sysctl are opt-in set to 1. [dgilbert@redhat.com: changelog additions] [akpm@linux-foundation.org: documentation tweak, per Mike] Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Cracauer <cracauer@cons.org> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Todd Kjos
|
0f2cb7cf80 |
Merge branch 'linux-mainline' into android-mainline-tmp
Change-Id: I4380c68c3474026a42ffa9f95c525f9a563ba7a3 |
||
Colin Cross
|
60500a4228 |
ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
|