Commit Graph

141 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
08ed4cb090 Merge 5.10.67 into android12-5.10-lts
Changes in 5.10.67
	rtc: tps65910: Correct driver module alias
	io_uring: limit fixed table size by RLIMIT_NOFILE
	io_uring: place fixed tables under memcg limits
	io_uring: add ->splice_fd_in checks
	io_uring: fail links of cancelled timeouts
	io-wq: fix wakeup race when adding new work
	btrfs: wake up async_delalloc_pages waiters after submit
	btrfs: reset replace target device to allocation state on close
	blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
	blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
	PCI/MSI: Skip masking MSI-X on Xen PV
	powerpc/perf/hv-gpci: Fix counter value parsing
	xen: fix setting of max_pfn in shared_info
	9p/xen: Fix end of loop tests for list_for_each_entry
	ceph: fix dereference of null pointer cf
	selftests/ftrace: Fix requirement check of README file
	tools/thermal/tmon: Add cross compiling support
	clk: socfpga: agilex: fix the parents of the psi_ref_clk
	clk: socfpga: agilex: fix up s2f_user0_clk representation
	clk: socfpga: agilex: add the bypass register for s2f_usr0 clock
	pinctrl: stmfx: Fix hazardous u8[] to unsigned long cast
	pinctrl: ingenic: Fix incorrect pull up/down info
	soc: qcom: aoss: Fix the out of bound usage of cooling_devs
	soc: aspeed: lpc-ctrl: Fix boundary check for mmap
	soc: aspeed: p2a-ctrl: Fix boundary check for mmap
	arm64: mm: Fix TLBI vs ASID rollover
	arm64: head: avoid over-mapping in map_memory
	iio: ltc2983: fix device probe
	wcn36xx: Ensure finish scan is not requested before start scan
	crypto: public_key: fix overflow during implicit conversion
	block: bfq: fix bfq_set_next_ioprio_data()
	power: supply: max17042: handle fails of reading status register
	dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
	crypto: ccp - shutdown SEV firmware on kexec
	VMCI: fix NULL pointer dereference when unmapping queue pair
	media: uvc: don't do DMA on stack
	media: rc-loopback: return number of emitters rather than error
	s390/qdio: fix roll-back after timeout on ESTABLISH ccw
	s390/qdio: cancel the ESTABLISH ccw after timeout
	Revert "dmaengine: imx-sdma: refine to load context only once"
	dmaengine: imx-sdma: remove duplicated sdma_load_context
	libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
	ARM: 9105/1: atags_to_fdt: don't warn about stack size
	f2fs: fix to do sanity check for sb/cp fields correctly
	PCI/portdrv: Enable Bandwidth Notification only if port supports it
	PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported
	PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure
	PCI: xilinx-nwl: Enable the clock through CCF
	PCI: aardvark: Configure PCIe resources from 'ranges' DT property
	PCI: Export pci_pio_to_address() for module use
	PCI: aardvark: Fix checking for PIO status
	PCI: aardvark: Fix masking and unmasking legacy INTx interrupts
	HID: input: do not report stylus battery state as "full"
	f2fs: quota: fix potential deadlock
	pinctrl: remove empty lines in pinctrl subsystem
	pinctrl: armada-37xx: Correct PWM pins definitions
	scsi: bsg: Remove support for SCSI_IOCTL_SEND_COMMAND
	clk: rockchip: drop GRF dependency for rk3328/rk3036 pll types
	IB/hfi1: Adjust pkey entry in index 0
	RDMA/iwcm: Release resources if iw_cm module initialization fails
	docs: Fix infiniband uverbs minor number
	scsi: BusLogic: Use %X for u32 sized integer rather than %lX
	pinctrl: samsung: Fix pinctrl bank pin count
	vfio: Use config not menuconfig for VFIO_NOIOMMU
	scsi: ufs: Fix memory corruption by ufshcd_read_desc_param()
	cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards
	powerpc/stacktrace: Include linux/delay.h
	RDMA/efa: Remove double QP type assignment
	RDMA/mlx5: Delete not-available udata check
	cpuidle: pseries: Mark pseries_idle_proble() as __init
	f2fs: reduce the scope of setting fsck tag when de->name_len is zero
	openrisc: don't printk() unconditionally
	dma-debug: fix debugfs initialization order
	NFSv4/pNFS: Fix a layoutget livelock loop
	NFSv4/pNFS: Always allow update of a zero valued layout barrier
	NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid
	SUNRPC: Fix potential memory corruption
	SUNRPC/xprtrdma: Fix reconnection locking
	SUNRPC query transport's source port
	sunrpc: Fix return value of get_srcport()
	scsi: fdomain: Fix error return code in fdomain_probe()
	pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry()
	powerpc/numa: Consider the max NUMA node for migratable LPAR
	scsi: smartpqi: Fix an error code in pqi_get_raid_map()
	scsi: qedi: Fix error codes in qedi_alloc_global_queues()
	scsi: qedf: Fix error codes in qedf_alloc_global_queues()
	powerpc/config: Renable MTD_PHYSMAP_OF
	iommu/vt-d: Update the virtual command related registers
	HID: i2c-hid: Fix Elan touchpad regression
	clk: imx8m: fix clock tree update of TF-A managed clocks
	KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines
	scsi: ufs: ufs-exynos: Fix static checker warning
	KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live
	platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
	powerpc/smp: Update cpu_core_map on all PowerPc systems
	RDMA/hns: Fix QP's resp incomplete assignment
	fscache: Fix cookie key hashing
	clk: at91: clk-generated: Limit the requested rate to our range
	KVM: PPC: Fix clearing never mapped TCEs in realmode
	soc: mediatek: cmdq: add address shift in jump
	f2fs: fix to account missing .skipped_gc_rwsem
	f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
	f2fs: fix to unmap pages from userspace process in punch_hole()
	f2fs: deallocate compressed pages when error happens
	f2fs: should put a page beyond EOF when preparing a write
	MIPS: Malta: fix alignment of the devicetree buffer
	kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
	userfaultfd: prevent concurrent API initialization
	drm/vc4: hdmi: Set HD_CTL_WHOLSMP and HD_CTL_CHALIGN_SET
	drm/amdgpu: Fix amdgpu_ras_eeprom_init()
	ASoC: atmel: ATMEL drivers don't need HAS_DMA
	media: dib8000: rewrite the init prbs logic
	libbpf: Fix reuse of pinned map on older kernel
	x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
	crypto: mxs-dcp - Use sg_mapping_iter to copy data
	PCI: Use pci_update_current_state() in pci_enable_device_flags()
	tipc: keep the skb in rcv queue until the whole data is read
	net: phy: Fix data type in DP83822 dp8382x_disable_wol()
	iio: dac: ad5624r: Fix incorrect handling of an optional regulator.
	iavf: do not override the adapter state in the watchdog task
	iavf: fix locking of critical sections
	ARM: dts: qcom: apq8064: correct clock names
	video: fbdev: kyro: fix a DoS bug by restricting user input
	netlink: Deal with ESRCH error in nlmsg_notify()
	Smack: Fix wrong semantics in smk_access_entry()
	drm: avoid blocking in drm_clients_info's rcu section
	drm: serialize drm_file.master with a new spinlock
	drm: protect drm_master pointers in drm_lease.c
	rcu: Fix macro name CONFIG_TASKS_RCU_TRACE
	igc: Check if num of q_vectors is smaller than max before array access
	usb: host: fotg210: fix the endpoint's transactional opportunities calculation
	usb: host: fotg210: fix the actual_length of an iso packet
	usb: gadget: u_ether: fix a potential null pointer dereference
	USB: EHCI: ehci-mv: improve error handling in mv_ehci_enable()
	usb: gadget: composite: Allow bMaxPower=0 if self-powered
	staging: board: Fix uninitialized spinlock when attaching genpd
	tty: serial: jsm: hold port lock when reporting modem line changes
	bus: fsl-mc: fix mmio base address for child DPRCs
	selftests: firmware: Fix ignored return val of asprintf() warn
	drm/amd/display: Fix timer_per_pixel unit error
	media: hantro: vp8: Move noisy WARN_ON to vpu_debug
	media: platform: stm32: unprepare clocks at handling errors in probe
	media: atomisp: Fix runtime PM imbalance in atomisp_pci_probe
	media: atomisp: pci: fix error return code in atomisp_pci_probe()
	nfp: fix return statement in nfp_net_parse_meta()
	ethtool: improve compat ioctl handling
	drm/amdgpu: Fix a printing message
	drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex
	bpf/tests: Fix copy-and-paste error in double word test
	bpf/tests: Do not PASS tests without actually testing the result
	drm/bridge: nwl-dsi: Avoid potential multiplication overflow on 32-bit
	arm64: dts: allwinner: h6: tanix-tx6: Fix regulator node names
	video: fbdev: asiliantfb: Error out if 'pixclock' equals zero
	video: fbdev: kyro: Error out if 'pixclock' equals zero
	video: fbdev: riva: Error out if 'pixclock' equals zero
	ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()
	flow_dissector: Fix out-of-bounds warnings
	s390/jump_label: print real address in a case of a jump label bug
	s390: make PCI mio support a machine flag
	serial: 8250: Define RX trigger levels for OxSemi 950 devices
	xtensa: ISS: don't panic in rs_init
	hvsi: don't panic on tty_register_driver failure
	serial: 8250_pci: make setup_port() parameters explicitly unsigned
	staging: ks7010: Fix the initialization of the 'sleep_status' structure
	samples: bpf: Fix tracex7 error raised on the missing argument
	libbpf: Fix race when pinning maps in parallel
	ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init()
	Bluetooth: skip invalid hci_sync_conn_complete_evt
	workqueue: Fix possible memory leaks in wq_numa_init()
	ARM: dts: stm32: Set {bitclock,frame}-master phandles on DHCOM SoM
	ARM: dts: stm32: Set {bitclock,frame}-master phandles on ST DKx
	ARM: dts: stm32: Update AV96 adv7513 node per dtbs_check
	bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
	ARM: dts: at91: use the right property for shutdown controller
	arm64: tegra: Fix Tegra194 PCIe EP compatible string
	ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output
	ASoC: Intel: update sof_pcm512x quirks
	media: imx258: Rectify mismatch of VTS value
	media: imx258: Limit the max analogue gain to 480
	media: v4l2-dv-timings.c: fix wrong condition in two for-loops
	media: TDA1997x: fix tda1997x_query_dv_timings() return value
	media: tegra-cec: Handle errors of clk_prepare_enable()
	gfs2: Fix glock recursion in freeze_go_xmote_bh
	arm64: dts: qcom: sdm630: Rewrite memory map
	arm64: dts: qcom: sdm630: Fix TLMM node and pinctrl configuration
	serial: 8250_omap: Handle optional overrun-throttle-ms property
	ARM: dts: imx53-ppd: Fix ACHC entry
	arm64: dts: qcom: ipq8074: fix pci node reg property
	arm64: dts: qcom: sdm660: use reg value for memory node
	arm64: dts: qcom: ipq6018: drop '0x' from unit address
	arm64: dts: qcom: sdm630: don't use underscore in node name
	arm64: dts: qcom: msm8994: don't use underscore in node name
	arm64: dts: qcom: msm8996: don't use underscore in node name
	arm64: dts: qcom: sm8250: Fix epss_l3 unit address
	nvmem: qfprom: Fix up qfprom_disable_fuse_blowing() ordering
	net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()
	drm/msm: mdp4: drop vblank get/put from prepare/complete_commit
	drm/msm/dsi: Fix DSI and DSI PHY regulator config from SDM660
	drm: xlnx: zynqmp_dpsub: Call pm_runtime_get_sync before setting pixel clock
	drm: xlnx: zynqmp: release reset to DP controller before accessing DP registers
	thunderbolt: Fix port linking by checking all adapters
	drm/amd/display: fix missing writeback disablement if plane is removed
	drm/amd/display: fix incorrect CM/TF programming sequence in dwb
	selftests/bpf: Fix xdp_tx.c prog section name
	drm/vmwgfx: fix potential UAF in vmwgfx_surface.c
	Bluetooth: schedule SCO timeouts with delayed_work
	Bluetooth: avoid circular locks in sco_sock_connect
	drm/msm/dp: return correct edid checksum after corrupted edid checksum read
	net/mlx5: Fix variable type to match 64bit
	gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
	drm/display: fix possible null-pointer dereference in dcn10_set_clock()
	mac80211: Fix monitor MTU limit so that A-MSDUs get through
	ARM: tegra: acer-a500: Remove bogus USB VBUS regulators
	ARM: tegra: tamonten: Fix UART pad setting
	arm64: tegra: Fix compatible string for Tegra132 CPUs
	arm64: dts: ls1046a: fix eeprom entries
	nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
	nvme: code command_id with a genctr for use-after-free validation
	Bluetooth: Fix handling of LE Enhanced Connection Complete
	opp: Don't print an error if required-opps is missing
	serial: sh-sci: fix break handling for sysrq
	iomap: pass writeback errors to the mapping
	tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
	rpc: fix gss_svc_init cleanup on failure
	selftests/bpf: Fix flaky send_signal test
	hwmon: (pmbus/ibm-cffps) Fix write bits for LED control
	staging: rts5208: Fix get_ms_information() heap buffer size
	net: Fix offloading indirect devices dependency on qdisc order creation
	kselftest/arm64: mte: Fix misleading output when skipping tests
	kselftest/arm64: pac: Fix skipping of tests on systems without PAC
	gfs2: Don't call dlm after protocol is unmounted
	usb: chipidea: host: fix port index underflow and UBSAN complains
	lockd: lockd server-side shouldn't set fl_ops
	drm/exynos: Always initialize mapping in exynos_drm_register_dma()
	rtl8xxxu: Fix the handling of TX A-MPDU aggregation
	rtw88: use read_poll_timeout instead of fixed sleep
	rtw88: wow: build wow function only if CONFIG_PM is on
	rtw88: wow: fix size access error of probe request
	octeontx2-pf: Fix NIX1_RX interface backpressure
	m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch
	btrfs: tree-log: check btrfs_lookup_data_extent return value
	soundwire: intel: fix potential race condition during power down
	ASoC: Intel: Skylake: Fix module configuration for KPB and MIXER
	ASoC: Intel: Skylake: Fix passing loadable flag for module
	of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS
	mmc: sdhci-of-arasan: Modified SD default speed to 19MHz for ZynqMP
	mmc: sdhci-of-arasan: Check return value of non-void funtions
	mmc: rtsx_pci: Fix long reads when clock is prescaled
	selftests/bpf: Enlarge select() timeout for test_maps
	mmc: core: Return correct emmc response in case of ioctl error
	cifs: fix wrong release in sess_alloc_buffer() failed path
	Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
	usb: musb: musb_dsps: request_irq() after initializing musb
	usbip: give back URBs for unsent unlink requests during cleanup
	usbip:vhci_hcd USB port can get stuck in the disabled state
	ASoC: rockchip: i2s: Fix regmap_ops hang
	ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B
	drm/amdkfd: Account for SH/SE count when setting up cu masks.
	nfsd: fix crash on LOCKT on reexported NFSv3
	iwlwifi: pcie: free RBs during configure
	iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed
	iwlwifi: mvm: avoid static queue number aliasing
	iwlwifi: mvm: fix access to BSS elements
	iwlwifi: fw: correctly limit to monitor dump
	iwlwifi: mvm: Fix scan channel flags settings
	net/mlx5: DR, fix a potential use-after-free bug
	net/mlx5: DR, Enable QP retransmission
	parport: remove non-zero check on count
	selftests/bpf: Fix potential unreleased lock
	wcn36xx: Fix missing frame timestamp for beacon/probe-resp
	ath9k: fix OOB read ar9300_eeprom_restore_internal
	ath9k: fix sleeping in atomic context
	net: fix NULL pointer reference in cipso_v4_doi_free
	fix array-index-out-of-bounds in taprio_change
	net: w5100: check return value after calling platform_get_resource()
	net: hns3: clean up a type mismatch warning
	fs/io_uring Don't use the return value from import_iovec().
	io_uring: remove duplicated io_size from rw
	parisc: fix crash with signals and alloca
	ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
	scsi: BusLogic: Fix missing pr_cont() use
	scsi: qla2xxx: Changes to support kdump kernel
	scsi: qla2xxx: Sync queue idx with queue_pair_map idx
	cpufreq: powernv: Fix init_chip_info initialization in numa=off
	s390/pv: fix the forcing of the swiotlb
	hugetlb: fix hugetlb cgroup refcounting during vma split
	mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
	mm/hugetlb: initialize hugetlb_usage in mm_init
	mm,vmscan: fix divide by zero in get_scan_count
	memcg: enable accounting for pids in nested pid namespaces
	libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
	platform/chrome: cros_ec_proto: Send command again when timeout occurs
	lib/test_stackinit: Fix static initializer test
	net: dsa: lantiq_gswip: fix maximum frame length
	drm/mgag200: Select clock in PLL update functions
	drm/msi/mdp4: populate priv->kms in mdp4_kms_init
	drm/dp_mst: Fix return code on sideband message failure
	drm/panfrost: Make sure MMU context lifetime is not bound to panfrost_priv
	drm/amdgpu: Fix BUG_ON assert
	drm/amd/display: Update number of DCN3 clock states
	drm/amd/display: Update bounding box states (v2)
	drm/panfrost: Simplify lock_region calculation
	drm/panfrost: Use u64 for size in lock_region
	drm/panfrost: Clamp lock region to Bifrost minimum
	fanotify: limit number of event merge attempts
	Linux 5.10.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic8df59518265d0cdf724e93e8922cde48fc85ce9
2021-09-30 12:21:03 +02:00
Nadav Amit
6afd1e053d userfaultfd: prevent concurrent API initialization
[ Upstream commit 22e5fe2a2a279d9a6fcbdfb4dffe73821bef1c90 ]

userfaultfd assumes that the enabled features are set once and never
changed after UFFDIO_API ioctl succeeded.

However, currently, UFFDIO_API can be called concurrently from two
different threads, succeed on both threads and leave userfaultfd's
features in non-deterministic state.  Theoretically, other uffd operations
(ioctl's and page-faults) can be dispatched while adversely affected by
such changes of features.

Moreover, the writes to ctx->state and ctx->features are not ordered,
which can - theoretically, again - let userfaultfd_ioctl() think that
userfaultfd API completed, while the features are still not initialized.

To avoid races, it is arguably best to get rid of ctx->state.  Since there
are only 2 states, record the API initialization in ctx->features as the
uppermost bit and remove ctx->state.

Link: https://lkml.kernel.org/r/20210808020724.1022515-3-namit@vmware.com
Fixes: 9cd75c3cd4 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor")
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18 13:40:17 +02:00
Greg Kroah-Hartman
e4cac2c332 This is the 5.10.54 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEBT00ACgkQONu9yGCS
 aT7svA/9HCRwW+pK3UpK1+0FK7gGH8DA3jSONj775zVEKhboDZNIwZsDqG0Ly+jm
 /JejWXPKZlekaDXgzBfZY3H59xgij/VwYHe8p7cdxfi1TlhmAQwFjLNZnWav8as6
 IyNkpsDJn8fMXmfHDi2u3cb8wrVi/aQDzTlwu88cUtREyZCaaYlo0Fdv9MJyhww/
 p6LWPYQoZ8TmFY+Y/2ORVxFos2UVuU0hhhMdGt9LrX2WNEGRNUZUqmbhcXYfdsX0
 ckSHbijIcWdcka3nQ6yOvdxw75rTqd8c/bP0y+yAteeJ0CykjVnI2cdK+M2ZEi4j
 /JqpGJrRWhsZf5MiO8b3k+I1K62JDa1GYBQ9Amp8FKKzjYLPTNeFAP9IsMyDc4oi
 oW98XM7XzoSEU9t/FSAIGT0hYK9k+lnPxw623LhxD6x3VPynnNAnQsLr+HirOgG6
 mZ79L4ZFu3lUvVsCuCgKn/uxwDopUNlhqo5B2/4M2kSWwe2Xu5bExpGc2bT9xCOP
 6fF9DmvmpG1UPGCXrOqaxemyEPmHqmyjKJpxDt6vZqlOL9vqHez4WmEEM1C+E2NZ
 5VKKbBk/KZDxNX9EiFOtI2HRFb1cghoI2Hcb/QjRoB9Dv3a6cHgjxDl0eKm8SiDN
 +1ytV0IFH3fT4aRiXJ7I3GBwkjKcDaX0sjYwtnCx9s5XZmm9PRQ=
 =HAyL
 -----END PGP SIGNATURE-----

Merge 5.10.54 into android12-5.10-lts

Changes in 5.10.54
	igc: Fix use-after-free error during reset
	igb: Fix use-after-free error during reset
	igc: change default return of igc_read_phy_reg()
	ixgbe: Fix an error handling path in 'ixgbe_probe()'
	igc: Fix an error handling path in 'igc_probe()'
	igb: Fix an error handling path in 'igb_probe()'
	fm10k: Fix an error handling path in 'fm10k_probe()'
	e1000e: Fix an error handling path in 'e1000_probe()'
	iavf: Fix an error handling path in 'iavf_probe()'
	igb: Check if num of q_vectors is smaller than max before array access
	igb: Fix position of assignment to *ring
	gve: Fix an error handling path in 'gve_probe()'
	net: add kcov handle to skb extensions
	bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
	bonding: fix null dereference in bond_ipsec_add_sa()
	ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
	bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
	bonding: disallow setting nested bonding + ipsec offload
	bonding: Add struct bond_ipesc to manage SA
	bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
	bonding: fix incorrect return value of bond_ipsec_offload_ok()
	ipv6: fix 'disable_policy' for fwd packets
	stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
	selftests: icmp_redirect: remove from checking for IPv6 route get
	selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
	pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
	cxgb4: fix IRQ free race during driver unload
	mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join
	nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
	KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
	perf inject: Fix dso->nsinfo refcounting
	perf map: Fix dso->nsinfo refcounting
	perf probe: Fix dso->nsinfo refcounting
	perf env: Fix sibling_dies memory leak
	perf test session_topology: Delete session->evlist
	perf test event_update: Fix memory leak of evlist
	perf dso: Fix memory leak in dso__new_map()
	perf test maps__merge_in: Fix memory leak of maps
	perf env: Fix memory leak of cpu_pmu_caps
	perf report: Free generated help strings for sort option
	perf script: Fix memory 'threads' and 'cpus' leaks on exit
	perf lzma: Close lzma stream on exit
	perf probe-file: Delete namelist in del_events() on the error path
	perf data: Close all files in close_dir()
	perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
	ASoC: wm_adsp: Correct wm_coeff_tlv_get handling
	spi: imx: add a check for speed_hz before calculating the clock
	spi: stm32: fixes pm_runtime calls in probe/remove
	regulator: hi6421: Use correct variable type for regmap api val argument
	regulator: hi6421: Fix getting wrong drvdata
	spi: mediatek: fix fifo rx mode
	ASoC: rt5631: Fix regcache sync errors on resume
	bpf, test: fix NULL pointer dereference on invalid expected_attach_type
	bpf: Fix tail_call_reachable rejection for interpreter when jit failed
	xdp, net: Fix use-after-free in bpf_xdp_link_release
	timers: Fix get_next_timer_interrupt() with no timers pending
	liquidio: Fix unintentional sign extension issue on left shift of u16
	s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
	bpf, sockmap: Fix potential memory leak on unlikely error case
	bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats
	bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats
	bpftool: Check malloc return value in mount_bpffs_for_pin
	net: fix uninit-value in caif_seqpkt_sendmsg
	usb: hso: fix error handling code of hso_create_net_device
	dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
	efi/tpm: Differentiate missing and invalid final event log table.
	net: decnet: Fix sleeping inside in af_decnet
	KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
	KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
	net: sched: fix memory leak in tcindex_partial_destroy_work
	sctp: trim optlen when it's a huge value in sctp_setsockopt
	netrom: Decrease sock refcount when sock timers expire
	scsi: iscsi: Fix iface sysfs attr detection
	scsi: target: Fix protect handling in WRITE SAME(32)
	spi: cadence: Correct initialisation of runtime PM again
	ACPI: Kconfig: Fix table override from built-in initrd
	bnxt_en: don't disable an already disabled PCI device
	bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe()
	bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task()
	bnxt_en: Validate vlan protocol ID on RX packets
	bnxt_en: Check abort error state in bnxt_half_open_nic()
	net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition
	net/tcp_fastopen: fix data races around tfo_active_disable_stamp
	ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
	net: hns3: fix possible mismatches resp of mailbox
	net: hns3: fix rx VLAN offload state inconsistent issue
	spi: spi-bcm2835: Fix deadlock
	net/sched: act_skbmod: Skip non-Ethernet packets
	ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
	ceph: don't WARN if we're still opening a session to an MDS
	nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
	Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
	afs: Fix tracepoint string placement with built-in AFS
	r8169: Avoid duplicate sysfs entry creation error
	nvme: set the PRACT bit when using Write Zeroes with T10 PI
	sctp: update active_key for asoc when old key is being replaced
	tcp: disable TFO blackhole logic by default
	net: dsa: sja1105: make VID 4095 a bridge VLAN too
	net: sched: cls_api: Fix the the wrong parameter
	drm/panel: raspberrypi-touchscreen: Prevent double-free
	cifs: only write 64kb at a time when fallocating a small region of a file
	cifs: fix fallocate when trying to allocate a hole.
	proc: Avoid mixing integer types in mem_rw()
	mmc: core: Don't allocate IDA for OF aliases
	s390/ftrace: fix ftrace_update_ftrace_func implementation
	s390/boot: fix use of expolines in the DMA code
	ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
	ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
	ALSA: sb: Fix potential ABBA deadlock in CSP driver
	ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
	ALSA: hdmi: Expose all pins on MSI MS-7C94 board
	ALSA: pcm: Call substream ack() method upon compat mmap commit
	ALSA: pcm: Fix mmap capability check
	Revert "usb: renesas-xhci: Fix handling of unknown ROM state"
	usb: xhci: avoid renesas_usb_fw.mem when it's unusable
	xhci: Fix lost USB 2 remote wake
	KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
	KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
	usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
	usb: hub: Fix link power management max exit latency (MEL) calculations
	USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
	usb: max-3421: Prevent corruption of freed memory
	usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
	USB: serial: option: add support for u-blox LARA-R6 family
	USB: serial: cp210x: fix comments for GE CS1000
	USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
	usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
	usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
	usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
	usb: typec: stusb160x: register role switch before interrupt registration
	firmware/efi: Tell memblock about EFI iomem reservations
	tracepoints: Update static_call before tp_funcs when adding a tracepoint
	tracing/histogram: Rename "cpu" to "common_cpu"
	tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
	tracing: Synthetic event field_pos is an index not a boolean
	btrfs: check for missing device in btrfs_trim_fs
	media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
	ixgbe: Fix packet corruption due to missing DMA sync
	bus: mhi: core: Validate channel ID when processing command completions
	posix-cpu-timers: Fix rearm racing against process tick
	selftest: use mmap instead of posix_memalign to allocate memory
	io_uring: explicitly count entries for poll reqs
	io_uring: remove double poll entry on arm failure
	userfaultfd: do not untag user pointers
	memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
	hugetlbfs: fix mount mode command line processing
	rbd: don't hold lock_rwsem while running_list is being drained
	rbd: always kick acquire on "acquired" and "released" notifications
	misc: eeprom: at24: Always append device id even if label property is set.
	nds32: fix up stack guard gap
	driver core: Prevent warning when removing a device link from unregistered consumer
	drm: Return -ENOTTY for non-drm ioctls
	drm/amdgpu: update golden setting for sienna_cichlid
	net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
	net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
	PCI: Mark AMD Navi14 GPU ATS as broken
	bonding: fix build issue
	skbuff: Release nfct refcount on napi stolen or re-used skbs
	Documentation: Fix intiramfs script name
	perf inject: Close inject.output on exit
	usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
	drm/i915/gvt: Clear d3_entered on elsp cmd submission.
	sfc: ensure correct number of XDP queues
	xhci: add xhci_get_virt_ep() helper
	skbuff: Fix build with SKB extensions disabled
	Linux 5.10.54

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e
2021-07-28 15:23:47 +02:00
Peter Collingbourne
0b591c020d userfaultfd: do not untag user pointers
commit e71e2ace5721a8b921dca18b045069e7bb411277 upstream.

Patch series "userfaultfd: do not untag user pointers", v5.

If a user program uses userfaultfd on ranges of heap memory, it may end
up passing a tagged pointer to the kernel in the range.start field of
the UFFDIO_REGISTER ioctl.  This can happen when using an MTE-capable
allocator, or on Android if using the Tagged Pointers feature for MTE
readiness [1].

When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field of struct
uffd_msg.  However, from the application's perspective, the tagged
address *is* the memory address, so if the application is unaware of
memory tags, it may get confused by receiving an address that is, from
its point of view, outside of the bounds of the allocation.  We observed
this behavior in the kselftest for userfaultfd [2] but other
applications could have the same problem.

Address this by not untagging pointers passed to the userfaultfd ioctls.
Instead, let the system call fail.  Also change the kselftest to use
mmap so that it doesn't encounter this problem.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

This patch (of 2):

Do not untag pointers passed to the userfaultfd ioctls.  Instead, let
the system call fail.  This will provide an early indication of problems
with tag-unaware userspace code instead of letting the code get confused
later, and is consistent with how we decided to handle brk/mmap/mremap
in commit dcde237319 ("mm: Avoid creating virtual address aliases in
brk()/mmap()/mremap()"), as well as being consistent with the existing
tagged address ABI documentation relating to how ioctl arguments are
handled.

The code change is a revert of commit 7d0325749a ("userfaultfd: untag
user pointers") plus some fixups to some additional calls to
validate_range that have appeared since then.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com
Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a25501b
Fixes: 63f0c60379 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alistair Delva <adelva@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mitch Phillips <mitchp@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William McVicker <willmcvicker@google.com>
Cc: <stable@vger.kernel.org>	[5.4]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-28 14:35:46 +02:00
Axel Rasmussen
d672123ec4 BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem
Patch series "userfaultfd: support minor fault handling for shmem", v2.

Overview
========

See my original series [1] for a detailed overview of minor fault handling
in general.  The feature in this series works exactly like the hugetblfs
version (from userspace's perspective).

I'm sending this as a separate series because:

- The original minor fault handling series has a full set of R-Bs, and seems
  close to being merged. So, it seems reasonable to start looking at this next
  step, which extends the basic functionality.

- shmem is different enough that this series may require some additional work
  before it's ready, and I don't want to delay the original series
  unnecessarily by bundling them together.

Use Case
========

In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs.  So, this feature will be used to support the same VM live
migration use case described in my original series.

Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope
to optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap.
With this feature, the heap can be shared-mapped at another location where
the GC-thread(s) could continue the compaction operation without the need
to invoke userfault ioctl(UFFDIO_COPY) each time.  OTOH, if and when Java
threads get faults on the heap, UFFDIO_CONTINUE can be used to resume
execution.  Furthermore, this feature enables updating references in the
'non-moving' portion of the heap efficiently.  Without this feature,
uneccessary page copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t

This patch (of 5):

Modify the userfaultfd register API to allow registering shmem VMAs in
minor mode.  Modify the shmem mcopy implementation to support
UFFDIO_CONTINUE in order to resolve such faults.

Combine the shmem mcopy handler functions into a single
shmem_mcopy_atomic_pte, which takes a mode parameter.  This matches how
the hugetlbfs implementation is structured, and lets us remove a good
chunk of boilerplate.

Link: https://lkml.kernel.org/r/20210302000133.272579-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210302000133.272579-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 4cc6e15679966aa49afc5b114c3c83ba0ac39b05
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388146/

Conflicts:
	mm/shmem.c

(1. Manual rebase
 2. Enclosed shmem_copy_atomic_pte() with CONFIG_USERFAULTFD to avoid
 compile erros when USERFAULTFD is not enabled.)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Idcd822b2a124a089121b9ad8c65061f6979126ec
2021-04-09 15:36:00 -07:00
Axel Rasmussen
4a5cf92412 BACKPORT: FROMGIT: userfaultfd: add UFFDIO_CONTINUE ioctl
This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It might
change the contents of the page using its second non-UFFD mapping, or
not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
the page contents are correct, carry on setting up the mapping".

Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.

It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)

Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 14ea86439abaf3423cd9b6712ed5ce8451d2d181
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388136/

Conflicts:
	mm/hugetlb.c

(8f251a3d5ce3bdea73bd045ed35db64f32e0d0d9 is not cherry-picked yet so
switched SetHPageMigratable() to set_active_huge_page())

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I45b62959dcb1d343154cb831113a26e47e77c8af
2021-04-09 15:35:59 -07:00
Axel Rasmussen
4d3dd339de BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9.

Overview
========

This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults.  By "minor" fault, I mean the following
situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory).  One of the mappings is registered with userfaultfd (in minor
mode), and the other is not.  Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents.  The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault.  As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE.  The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...).  In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".

Use Case
========

Consider the use case of VM live migration (e.g. under QEMU/KVM):

1. While a VM is still running, we copy the contents of its memory to a
   target machine. The pages are populated on the target by writing to the
   non-UFFD mapping, using the setup described above. The VM is still running
   (and therefore its memory is likely changing), so this may be repeated
   several times, until we decide the target is "up to date enough".

2. We pause the VM on the source, and start executing on the target machine.
   During this gap, the VM's user(s) will *see* a pause, so it is desirable to
   minimize this window.

3. Between the last time any page was copied from the source to the target, and
   when the VM was paused, the contents of that page may have changed - and
   therefore the copy we have on the target machine is out of date. Although we
   can keep track of which pages are out of date, for VMs with large amounts of
   memory, it is "slow" to transfer this information to the target machine. We
   want to resume execution before such a transfer would complete.

4. So, the guest begins executing on the target machine. The first time it
   touches its memory (via the UFFD-registered mapping), userspace wants to
   intercept this fault. Userspace checks whether or not the page is up to date,
   and if not, copies the updated page from the source machine, via the non-UFFD
   mapping. Finally, whether a copy was performed or not, userspace issues a
   UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
   are correct, carry on setting up the mapping".

We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.

Interaction with Existing APIs
==============================

Because this is a feature, a registered VMA could potentially receive both
missing and minor faults.  I spent some time thinking through how the
existing API interacts with the new feature:

UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page.  If UFFDIO_CONTINUE is used on a non-minor fault:

- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.

UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated.  This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).

- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
  in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
  -ENOENT in that case (regardless of the kind of fault).

Future Work
===========

This series only supports hugetlbfs.  I have a second series in flight to
support shmem as well, extending the functionality.  This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.

This patch (of 6):

This feature allows userspace to intercept "minor" faults.  By "minor"
faults, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s).  One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not.  Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents.  The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault.  As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.

This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered.  In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.

This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].

However, doing it this was requires we allocate a VM_* flag for the new
registration mode.  On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS.  When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.

[1] https://lore.kernel.org/patchwork/patch/1380226/

Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388132/

Conflicts:
	arch/arm64/Kconfig
	fs/userfaultfd.c
	mm/hugetlb.c

(All related to SPF feature. Resolved by manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I43b37272d531341439ceaa03213d0e2415e04688
2021-04-09 15:35:59 -07:00
Peter Xu
343cacfa06 FROMGIT: hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp
Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
userfaultfd-wp is always based on pgtable entries, so they cannot be
shared.

Walk the hugetlb range and unshare all such mappings if there is, right
before UFFDIO_REGISTER will succeed and return to userspace.

This will pair with want_pmd_share() in hugetlb code so that huge pmd
sharing is completely disabled for userfaultfd-wp registered range.

Link: https://lkml.kernel.org/r/20210218231206.15524-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 267bda5c9993856b86f91a998df632b29cf517e2
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382208/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I99d541ce45aaf924fa912f00dafa4caefe307755
2021-04-09 15:35:58 -07:00
Lokesh Gidra
6a6bc06393 UPSTREAM: userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled.  In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.

The default value of this knob is changed to 0.  This is required for
correct functioning of pipe mutex.  However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests.  To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero.  So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland.  For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/

Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d0d4730ac2e404a5b0da9a87ef38c73e51cb1664)

Bug: 160737021
Bug: 169683130
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Change-Id: I08b7080b49ca626c5ab41bb2621fa21fa9a928a2
2021-02-05 13:14:08 +00:00
Lokesh Gidra
b8af1f96cc UPSTREAM: userfaultfd: add UFFD_USER_MODE_ONLY
Patch series "Control over userfaultfd kernel-fault handling", v6.

This patch series is split from [1].  The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.

It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel.  For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object.  It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

This patch (of 2):

userfaultfd handles page faults from both user and kernel code.  Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.

A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.

Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 37cd0575b8510159992d279c530c05f872990b02)

Bug: 160737021
Bug: 169683130
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Change-Id: I19ff309b616c7a4a247e8c8427a87caffb1b2df9
2021-02-05 13:14:00 +00:00
Daniel Colascione
dbc935c62b UPSTREAM: userfaultfd: use secure anon inodes for userfaultfd
This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Remove owner inode from userfaultfd_ctx]
[LG: Use anon_inode_getfd_secure() in userfaultfd syscall]
[LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit b537900f1598b67bcb8acac20da73c6e26ebbf99)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ib2973ca3650a8defe15eded13294a3fb25356b9d
2021-02-05 11:04:21 +00:00
Laurent Dufour
9cfe16897f FROMLIST: mm: protect VMA modifications using VMA sequence count
The VMA sequence count has been introduced to allow fast detection of
VMA modification when running a page fault handler without holding
the mmap_sem.

This patch provides protection against the VMA modification done in :
	- madvise()
	- mpol_rebind_policy()
	- vma_replace_policy()
	- change_prot_numa()
	- mlock(), munlock()
	- mprotect()
	- mmap_region()
	- collapse_huge_page()
	- userfaultd registering services

In addition, VMA fields which will be read during the speculative fault
path needs to be written using WRITE_ONCE to prevent write to be split
and intermediate values to be pushed to other CPUs.

Change-Id: Ic36046b7254e538b6baf7144c50ae577ee7f2074
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-10-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-01-22 17:59:47 +00:00
Greg Kroah-Hartman
05d2a661fd Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1

Resolves conflicts in:
	fs/userfaultfd.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
2020-10-26 09:23:33 +01:00
Jann Horn
4d45e75a99 mm: remove the now-unnecessary mmget_still_valid() hack
The preceding patches have ensured that core dumping properly takes the
mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
its users.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Greg Kroah-Hartman
d7b0856eac Merge 00e4db5125 ("Merge tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") into android-mainline
Tiny steps on the way to 5.9-rc1.

Fixes conflicts in:
	fs/f2fs/inline.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I16d863ae44a51156499458e8c3486587cbe2babe
2020-08-11 10:57:46 +02:00
Linus Torvalds
97d052ea3f A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various
     situations caused by the lockdep additions to seqcount to validate that
     the write side critical sections are non-preemptible.
 
   - The seqcount associated lock debug addons which were blocked by the
     above fallout.
 
     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict per
     CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
     validate that the lock is held.
 
     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored and
     write_seqcount_begin() has a lockdep assertion to validate that the
     lock is held.
 
     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API is
     unchanged and determines the type at compile time with the help of
     _Generic which is possible now that the minimal GCC version has been
     moved up.
 
     Adding this lockdep coverage unearthed a handful of seqcount bugs which
     have been addressed already independent of this.
 
     While generaly useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if the
     writers are serialized by an associated lock, which leads to the well
     known reader preempts writer livelock. RT prevents this by storing the
     associated lock pointer independent of lockdep in the seqcount and
     changing the reader side to block on the lock when a reader detects
     that a writer is in the write side critical section.
 
  - Conversion of seqcount usage sites to associated types and initializers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
 QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
 KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
 shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
 n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
 F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
 DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
 AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
 81ppn2KlvzUK8g==
 =7Gj+
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Greg Kroah-Hartman
003fccf2e7 Merge 99f6cf61f1 ("Merge branch 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux") into android-mainline
steps along the way to 5.9-rc1

Change-Id: I3090afff778aaa50064b4d8cce21cd7d8bf746a4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-08-06 17:02:44 +02:00
Linus Torvalds
f9bf352224 userfaultfd: simplify fault handling
Instead of waiting in a loop for the userfaultfd condition to become
true, just wait once and return VM_FAULT_RETRY.

We've already dropped the mmap lock, we know we can't really
successfully handle the fault at this point and the caller will have to
retry anyway.  So there's no point in making the wait any more
complicated than it needs to be - just schedule away.

And once you don't have that complexity with explicit looping, you can
also just lose all the 'userfaultfd_signal_pending()' complexity,
because once we've set the correct process sleeping state, and don't
loop, the act of scheduling itself will be checking if there are any
pending signals before going to sleep.

We can also drop the VM_FAULT_MAJOR games, since we'll be treating all
retried faults as major soon anyway (series to regularize and share more
of fault handling across architectures in a separate series by Peter Xu,
and in the meantime we won't worry about the possible minor - I'll be
here all week, try the veal - accounting difference).

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-03 11:25:16 -07:00
Ahmed S. Darwish
2ca97ac8bd userfaultfd: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de
2020-07-29 16:14:28 +02:00
Greg Kroah-Hartman
a253db8915 Merge ad57a1022f ("Merge tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") into android-mainline
Steps on the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4
2020-06-24 17:54:12 +02:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
3e4e28c5a8 mmap locking API: convert mmap_sem API comments
Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
42fc541404 mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
Add new APIs to assert that mmap_sem is held.

Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Greg Kroah-Hartman
2724136fc5 Merge 5d30bcacd9 ("Merge tag '9p-for-5.7-2' of git://github.com/martinetd/linux") into android-mainline
Baby steps on the way to 5.7-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I89095a90046a14eab189aab257a75b3dfdb5b1db
2020-04-10 11:53:08 +02:00
Peter Xu
14819305e0 userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally
Only declare _UFFDIO_WRITEPROTECT if the user specified
UFFDIO_REGISTER_MODE_WP and if all the checks passed.  Then when the user
registers regions with shmem/hugetlbfs we won't expose the new ioctl to
them.  Even with complete anonymous memory range, we'll only expose the
new WP ioctl bit if the register mode has MODE_WP.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:40 -07:00
Peter Xu
23080e2783 userfaultfd: wp: don't wake up when doing write protect
It does not make sense to try to wake up any waiting thread when we're
write-protecting a memory region.  Only wake up when resolving a write
protected page fault.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
63b2d4174c userfaultfd: wp: add the writeprotect API to userfaultfd ioctl
Introduce the new uffd-wp APIs for userspace.

Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
using the new UFFDIO_REGISTER_MODE_WP flag.  Note that this flag can
co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
userspace program can not only resolve missing page faults, and at the
same time tracking page data changes along the way.

Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
write protection tracking.  Note that we will need to register the memory
region with UFFDIO_REGISTER_MODE_WP before that.

[peterx@redhat.com: write up the commit message]
[peterx@redhat.com: remove useless block, write commit message, check against
 VM_MAYWRITE rather than VM_WRITE when register]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
72981e0e7b userfaultfd: wp: add UFFDIO_COPY_MODE_WP
This allows UFFDIO_COPY to map pages write-protected.

[peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
 around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
 commit messages]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Greg Kroah-Hartman
47627513ce Merge e109f50607 ("Merge tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux") into android-mainline
Baby steps on the way to 5.7-rc1

Change-Id: I136ebb5242e3499873dcd5f5178ad7f68512d11c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-04-07 16:21:10 +02:00
Peter Xu
3e69ad081c mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path
Userfaultfd fault path was by default killable even if the caller does not
have FAULT_FLAG_KILLABLE.  That makes sense before in that when with gup
we don't have FAULT_FLAG_KILLABLE properly set before.  Now after previous
patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.

Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
right now, this patch should have no functional change.  It also cleaned
the code a little bit by introducing some helpers.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Peter Xu
c270a7eedc mm: introduce FAULT_FLAG_INTERRUPTIBLE
handle_userfaultfd() is currently the only one place in the kernel page
fault procedures that can respond to non-fatal userspace signals.  It was
trying to detect such an allowance by checking against USER & KILLABLE
flags, which was "un-official".

In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
that the fault handler allows the fault procedure to respond even to
non-fatal signals.  Meanwhile, add this new flag to the default fault
flags so that all the page fault handlers can benefit from the new flag.
With that, replacing the userfault check to this one.

Since the line is getting even longer, clean up the fault flags a bit too
to ease TTY users.

Although we've got a new flag and applied it, we shouldn't have any
functional change with this patch so far.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Peter Xu
ef429ee740 userfaultfd: don't retake mmap_sem to emulate NOPAGE
This patch removes the risk path in handle_userfault() then we will be
sure that the callers of handle_mm_fault() will know that the VMAs might
have changed.  Meanwhile with previous patch we don't lose responsiveness
as well since the core mm code now can handle the nonfatal userspace
signals even if we return VM_FAULT_RETRY.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Greg Kroah-Hartman
d3a196a371 Linux 5.5-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3tf/0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlKwH/3fTToujuJfTx5E5
 mrARAP65J1L/DxpEKvKRt2bNZo6w13mNd8g7ZPmYChz90bYGvXQSG8hYTU9iAw3O
 yimSTJlNXDhVAluB53XnDdUxIWC4HUZsNxWJNCeXMuiMcGNsTGX+v3f+x7oHCT0P
 jI1RSIsFGjgr0RWqZ8U5aJckQo2xABC1TfYw53K66Oc/JLZpSFJFwMgjf1fD5diU
 HGDA8E2p0u1TQIyNzr86iqMvnlSRYBQwBQn6OgEKCG4Z0NLtXfDF4mqnxsXgLmIH
 oQoFfxaMKXyGWds7ZxwcGWntALCF41ThfpiJWDIyxjWxFEty4bqTCbDPwwyp7ip0
 iuASmTI=
 =YqO2
 -----END PGP SIGNATURE-----

Merge 5.5-rc1 into android-mainline

Linux 5.5-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237
2019-12-09 12:12:00 +01:00
Linus Torvalds
596cf45cbf Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "Incoming:

   - a small number of updates to scripts/, ocfs2 and fs/buffer.c

   - most of MM

  I still have quite a lot of material (mostly not MM) staged after
  linux-next due to -next dependencies. I'll send those across next week
  as the preprequisites get merged up"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
  mm/page_io.c: annotate refault stalls from swap_readpage
  mm/Kconfig: fix trivial help text punctuation
  mm/Kconfig: fix indentation
  mm/memory_hotplug.c: remove __online_page_set_limits()
  mm: fix typos in comments when calling __SetPageUptodate()
  mm: fix struct member name in function comments
  mm/shmem.c: cast the type of unmap_start to u64
  mm: shmem: use proper gfp flags for shmem_writepage()
  mm/shmem.c: make array 'values' static const, makes object smaller
  userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
  fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
  userfaultfd: wrap the common dst_vma check into an inlined function
  userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
  userfaultfd: use vma_pagesize for all huge page size calculation
  mm/madvise.c: use PAGE_ALIGN[ED] for range checking
  mm/madvise.c: replace with page_size() in madvise_inject_error()
  mm/mmap.c: make vma_merge() comment more easy to understand
  mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
  autonuma: reduce cache footprint when scanning page tables
  autonuma: fix watermark checking in migrate_balanced_pgdat()
  ...
2019-12-01 20:36:41 -08:00
Mike Rapoport
3c1c24d91f userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
A while ago Andy noticed
(http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
security implications.

As the first step of the solution the following patch limits the availably
of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.

The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.

Yet, if there are other users of non-cooperative userfaultfd that run
without CAP_SYS_PTRACE, they would be broken :(

Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
descriptor table from the read() implementation of uffd, which may have
security implications for unprivileged use of the userfaultfd.

Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
CAP_SYS_PTRACE.

Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Nosh Minwalla <nosh@google.com>
Cc: Pavel Emelyanov <ovzxemul@gmail.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:10 -08:00
Andrea Arcangeli
9d4678eb17 fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
need to be cleared.  Currently setting UFFDIO_REGISTER_MODE_WP returns
-EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
is applied.

Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:10 -08:00
Arnd Bergmann
1832f2d8ff compat_ioctl: move more drivers to compat_ptr_ioctl
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.

One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.

I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.

Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23 17:23:44 +02:00
Greg Kroah-Hartman
94139142d9 Merge 5.4-rc1-prelrease into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e
2019-10-02 17:58:47 +02:00
Andrey Konovalov
7d0325749a userfaultfd: untag user pointers
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

userfaultfd code use provided user pointers for vma lookups, which can
only by done with untagged pointers.

Untag user pointers in validate_range().

Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:41 -07:00
Greg Kroah-Hartman
ad455d87e5 Linux 5.3-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl1i2wkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGcDQIAJINYON5WdDSFDpp
 htva213hSIxYLix8Dc4cTMk8qT/P2MAj9pPYERuLwIxWZlfbduW6Fxy8bJANZ7k3
 4cJ/IbmA5M5ZIaOJTTL45w8H0CMR/4mdPl5rb5k/Wkh449Cj101gZLlh0FEtR5zG
 uDJecKSuHjH1ikySk6+zmRG5X+lq6wNY8NkuBtfwAwLffFc0ljQHwPUMJ8ojgqt/
 p3ChNgtb/I6U6ExITlyktKdP59bAoHAoBiKKFZWw5yJWgXE2q4Sv9nT4Btkr5KdJ
 9mnWnSaSLwptNCOtU4tKLwFIZP2WoVXGPNxxq4XLoTEuieXCqmikhc9tSSTwk+Tp
 CKHN6wU=
 =JkJ4
 -----END PGP SIGNATURE-----

Merge 5.3-rc6 into android-mainline

Linux 5.3-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id10580d48d56054408b3efe0bd1866d67aba2a3d
2019-08-26 16:45:30 +02:00
Oleg Nesterov
46d0b24c5e userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
mm->core_state != NULL.

Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.

Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41 ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Greg Kroah-Hartman
a4bbf3df04 Linux 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0idTweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZesIAJDKicw2Voyx8K8m
 3pXSK+71RuO/d3Y9M51mdfTMKRP4PHR9/4wVZ9wHPwC4dV6wxgsmIYCF69a1Wety
 LD1MpDCP1DK5wVfPNKVX2xmj7ua6iutPtSsJHzdzM2TlscgsrFKjmUccqJ5JLwL5
 c34nqwXWnzzRyI5Ga9cQSlwzAXq0vDHXyML3AnCosSsLX0lKFrHlK1zttdOPNkfj
 dXRN62g3q+9kVQozzhDXb8atZZ7IkBk8Q0lujpNXW83Ci1VjaVNv3SB8GZTXIlLj
 U15VdyuwfJDfpBgFBN6/unzVaAB6FFrEKy0jT1aeTyKarMKDKgOnJjn10aKjDNno
 /bXsKKc=
 =TVqV
 -----END PGP SIGNATURE-----

Merge 5.2 into android-common

Linux 5.2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-08 08:24:40 +02:00
Eric Biggers
cbcfa130a9 fs/userfaultfd.c: disable irqs for fault_pending and event locks
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.

This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
userfaultfd_ctx_read(), which in turn can be waiting for
userfaultfd_ctx::fault_pending_wqh.lock or
userfaultfd_ctx::event_wqh.lock.

But elsewhere the fault_pending_wqh and event_wqh locks are taken with
IRQs enabled.  Since the IRQ handler may take kioctx::ctx_lock, lockdep
reports that a deadlock is possible.

Fix it by always disabling IRQs when taking the fault_pending_wqh and
event_wqh locks.

Commit ae62c16e10 ("userfaultfd: disable irqs when taking the
waitqueue lock") didn't fix this because it only accounted for the
fd_wqh lock, not the other locks nested inside it.

Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
Fixes: bfe4037e72 ("aio: implement IOCB_CMD_POLL")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05 11:12:07 +09:00
Greg Kroah-Hartman
9a7ed8b83e Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge 5.2-rc6 into android-mainline

Linux 5.2-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-23 07:22:28 +02:00
Thomas Gleixner
20c8ccb197 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
Based on 1 normalized pattern(s):

  this work is licensed under the terms of the gnu gpl version 2 see
  the copying file in the top level directory

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 35 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:53 +02:00
Greg Kroah-Hartman
1226c72a32 Linux 5.2-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlzh3PgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGnhoH/jkl4X66KuOXvGXa
 9pgFzyEa3Mhqs0j+AaJYmRyoRty1CbAfMLEWgr0JbZy56zm0PhtXOxcu56/tfdtw
 f5j8OJLDvld10imHXxUrom9zc546Ff90VTOvWmsYznszTz0rV5HLmKgVIIc7ZN8W
 6hshDOy/rviUcPAVrLKdZffzgQZlASNS7To7IBE9okT4QHEtER7dgzM/Z0VAg9R1
 guCPaN8tje4vq2Lv7+J5T9LOF1RObCbKXNADwXY1rMRK5ao3xS93eDnJ8Vn08utI
 UECIVfnYsA6pGl6v1ErCl9izx9MoTU3Crle7BRzVbrw7furvB2lJ1R4RGwqRbvcB
 HovhmHI=
 =TMir
 -----END PGP SIGNATURE-----

Merge 5.2-rc1 into android-mainline

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-20 20:17:24 +02:00
Peter Xu
cefdca0a86 userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Todd Kjos
0f2cb7cf80 Merge branch 'linux-mainline' into android-mainline-tmp
Change-Id: I4380c68c3474026a42ffa9f95c525f9a563ba7a3
2019-05-03 12:22:22 -07:00
Colin Cross
60500a4228 ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory.  When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.

This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma.  vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas.  If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".

The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen.  This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging.  The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.

Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list.  Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.

Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
        5b56d49fc3 ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2019-05-03 10:39:58 -07:00