Commit Graph

184 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
af6deae771 This is the 5.4.264 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmV5528ACgkQONu9yGCS
 aT7+DxAAl/t4oGT1Di8mIhCfqsezTj/SQ6HAMaAFpKjGXdgTBX9QavwTHp35qLpv
 xZtibdl611oEDT2B3//cvJu9Qs8sJGlDWxpJF8fkF/NED2EHLRwUAOmhc4fyBCBq
 01GfPRefU9G5E0Nw2g3o07dD0otKQzENh74iAzUr/cyju5TBgS4NC7CljiD6GXP8
 DtCcbz2UMmHF5icyasjw7WoCqKaWUn7KLqU3RyQfmDgnh3z0vSXKs17CFoNk1+R6
 EAkZJkrUsIDO3N6aRkJGvwtbEFDws4S7onpcthckXt6IF/OQ+h8LcHvyTpblVeWH
 qVBXmj1QZD+SwfP5qVtM+RwHHTE3GVJI3+MVTInu0vzgEz6lhNPILzWwso+AIhHM
 +SBZkx4/pO2LbrSgqn+NYWRcBTn4fXNURPd+zLXNJ1ZSnhamWm7kuFK8B36lz0s8
 CB5ngLld1wg6m2NVPneqAeEcD54msGd18iNZvnxfx3c28sb27RgLqLJQKdSrtFMH
 tZj0+3KGkYHzWQmt07iLmg1DPAmZSNzMW7MDJRHlqDCKS2gBKO/LKxWiri2WPh+o
 IvgrfGzEJsyYXi5+JcGC9+RkoPcGaous4VA2kWFJSjXiXPrJC4jknQm3mtdMiTf3
 V9DEOMvot9ELGI9RODaNsk59C2RhOuVimm2XWV9n88UcX38reAQ=
 =gOna
 -----END PGP SIGNATURE-----

Merge 5.4.264 into android11-5.4-lts

Changes in 5.4.264
	hrtimers: Push pending hrtimers away from outgoing CPU earlier
	netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test
	tg3: Move the [rt]x_dropped counters to tg3_napi
	tg3: Increment tx_dropped in tg3_tso_bug()
	kconfig: fix memory leak from range properties
	drm/amdgpu: correct chunk_ptr to a pointer to chunk.
	of: base: Add of_get_cpu_state_node() to get idle states for a CPU node
	ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
	ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
	of/iommu: Make of_map_rid() PCI agnostic
	of/irq: make of_msi_map_get_device_domain() bus agnostic
	of/irq: Make of_msi_map_rid() PCI bus agnostic
	of: base: Fix some formatting issues and provide missing descriptions
	of: Fix kerneldoc output formatting
	of: Add missing 'Return' section in kerneldoc comments
	of: dynamic: Fix of_reconfig_get_state_change() return value documentation
	ipv6: fix potential NULL deref in fib6_add()
	hv_netvsc: rndis_filter needs to select NLS
	net: arcnet: Fix RESET flag handling
	net: arcnet: com20020 fix error handling
	arcnet: restoring support for multiple Sohard Arcnet cards
	ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()
	net: hns: fix fake link up on xge port
	netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
	tcp: do not accept ACK of bytes we never sent
	bpf: sockmap, updating the sg structure should also update curr
	RDMA/bnxt_re: Correct module description string
	hwmon: (acpi_power_meter) Fix 4.29 MW bug
	ASoC: wm_adsp: fix memleak in wm_adsp_buffer_populate
	tracing: Fix a warning when allocating buffered events fails
	scsi: be2iscsi: Fix a memleak in beiscsi_init_wrb_handle()
	ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
	ARM: dts: imx: make gpt node name generic
	ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
	ALSA: pcm: fix out-of-bounds in snd_pcm_state_names
	nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
	tracing: Always update snapshot buffer size
	tracing: Fix incomplete locking when disabling buffered events
	tracing: Fix a possible race when disabling buffered events
	packet: Move reference count in packet_sock to atomic_long_t
	arm64: dts: mediatek: mt7622: fix memory node warning check
	arm64: dts: mediatek: mt8173-evb: Fix regulator-fixed node names
	perf/core: Add a new read format to get a number of lost samples
	perf: Fix perf_event_validate_size()
	gpiolib: sysfs: Fix error handling on failed export
	mmc: core: add helpers mmc_regulator_enable/disable_vqmmc
	mmc: sdhci-sprd: Fix vqmmc not shutting down after the card was pulled
	usb: gadget: f_hid: fix report descriptor allocation
	parport: Add support for Brainboxes IX/UC/PX parallel cards
	usb: typec: class: fix typec_altmode_put_partner to put plugs
	ARM: PL011: Fix DMA support
	serial: sc16is7xx: address RX timeout interrupt errata
	serial: 8250_omap: Add earlycon support for the AM654 UART controller
	x86/CPU/AMD: Check vendor in the AMD microcode callback
	KVM: s390/mm: Properly reset no-dat
	nilfs2: fix missing error check for sb_set_blocksize call
	io_uring/af_unix: disable sending io_uring over sockets
	netlink: don't call ->netlink_bind with table lock held
	genetlink: add CAP_NET_ADMIN test for multicast bind
	psample: Require 'CAP_NET_ADMIN' when joining "packets" group
	drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
	tools headers UAPI: Sync linux/perf_event.h with the kernel sources
	Revert "btrfs: add dmesg output for first mount and last unmount of a filesystem"
	cifs: Fix non-availability of dedup breaking generic/304
	smb: client: fix potential NULL deref in parse_dfs_referrals()
	devcoredump : Serialize devcd_del work
	devcoredump: Send uevent once devcd is ready
	Linux 5.4.264

Change-Id: I32d19db2a0ff0cf6d061fa9c8ca527d0b61dd158
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-12-14 13:03:30 +00:00
Pavel Begunkov
18824f592a io_uring/af_unix: disable sending io_uring over sockets
commit 705318a99a138c29a512a72c3e0043b3cd7f55f4 upstream.

File reference cycles have caused lots of problems for io_uring
in the past, and it still doesn't work exactly right and races with
unix_stream_read_generic(). The safest fix would be to completely
disallow sending io_uring files via sockets via SCM_RIGHT, so there
are no possible cycles invloving registered files and thus rendering
SCM accounting on the io_uring side unnecessary.

Cc:  <stable@vger.kernel.org>
Fixes: 0091bfc81741b ("io_uring/af_unix: defer registered files gc to io_uring release")
Reported-and-suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c716c88321939156909cfa1bd8b0faaf1c804103.1701868795.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13 18:18:16 +01:00
Greg Kroah-Hartman
0780b1ab09 This is the 5.4.263 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVyyWIACgkQONu9yGCS
 aT6Y8A//QJPg7pguCawsJGrem3a5dvhi9scNMmfuhKZOKS73JEmt4yudB9IOUjIX
 1c1aBcJo5yYMZq5L9mhXnlgkgqENxE9fI45FtMdwoKiriEQ0w9OBLlfZuKN9lwzC
 tyIigaGE5DD3SqL8e/04LNmMPPdolM38lJ368fYaD3T4d7LfwK0qHJFL8dSg4OFQ
 VaePViMFgbodjtSXoERNjVLaNtSlQDQytiWHMiQX2uf6CIIRbm+zFHn2Se1mUgh3
 WGT9JfXZ+achPw6OLhSIjwL+7vowhn3eRETq4zGkkNSK+rmB6W7zjPhou4SYsmc+
 FAYXvalmhQWWjlmIyZzO7GIVtgx19VuEYB8h5KLvp6DXQ0h0wCBOGgsfIT4icbgW
 wO0R+toWYY3Y79OLRGiMjiL9b60njJYnrm7JrheRD+BIm2jva+Tb7UxhC6QDMfH6
 a8fya8iJDNZWggwpx67JUANdMO8e+2rS4ttNxW0gTZSHhyEjo1HXctKBEmmtXk4s
 HGNV5xUniPnzrP8rduNqePG5B6c3wqOHUwj45L4scGmeC0DzW7E8EBgkHfRcU6CG
 ik9z5nQeDikREfK7cp8OSFtLaEBWSIX57XwHWDTMVPDGTN8EQ6eI7vTnQH3xOhA8
 VWFfwcU6avROM/ih7eJ+X4JvuDKcAGTPeD6oF3II0MLPK2m7ZmE=
 =p/ty
 -----END PGP SIGNATURE-----

Merge 5.4.263 into android11-5.4-lts

Changes in 5.4.263
	driver core: Release all resources during unbind before updating device links
	RDMA/irdma: Prevent zero-length STAG registration
	PCI: keystone: Drop __init from ks_pcie_add_pcie_{ep,port}()
	afs: Make error on cell lookup failure consistent with OpenAFS
	drm/panel: simple: Fix Innolux G101ICE-L01 bus flags
	drm/panel: simple: Fix Innolux G101ICE-L01 timings
	ata: pata_isapnp: Add missing error check for devm_ioport_map()
	drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP full
	HID: core: store the unique system identifier in hid_device
	HID: fix HID device resource race between HID core and debugging support
	ipv4: Correct/silence an endian warning in __ip_do_redirect
	net: usb: ax88179_178a: fix failed operations during ax88179_reset
	arm/xen: fix xen_vcpu_info allocation alignment
	amd-xgbe: handle corner-case during sfp hotplug
	amd-xgbe: handle the corner-case during tx completion
	amd-xgbe: propagate the correct speed and duplex status
	net: axienet: Fix check for partial TX checksum
	afs: Return ENOENT if no cell DNS record can be found
	afs: Fix file locking on R/O volumes to operate in local mode
	nvmet: remove unnecessary ctrl parameter
	nvmet: nul-terminate the NQNs passed in the connect command
	MIPS: KVM: Fix a build warning about variable set but not used
	ext4: add a new helper to check if es must be kept
	ext4: factor out __es_alloc_extent() and __es_free_extent()
	ext4: use pre-allocated es in __es_insert_extent()
	ext4: use pre-allocated es in __es_remove_extent()
	ext4: using nofail preallocation in ext4_es_remove_extent()
	ext4: using nofail preallocation in ext4_es_insert_delayed_block()
	ext4: using nofail preallocation in ext4_es_insert_extent()
	ext4: fix slab-use-after-free in ext4_es_insert_extent()
	ext4: make sure allocate pending entry not fail
	arm64: cpufeature: Extract capped perfmon fields
	KVM: arm64: limit PMU version to PMUv3 for ARMv8.1
	ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA
	bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce()
	s390/dasd: protect device queue against concurrent access
	USB: serial: option: add Luat Air72*U series products
	hv_netvsc: Fix race of register_netdevice_notifier and VF register
	hv_netvsc: Mark VF as slave before exposing it to user-mode
	dm-delay: fix a race between delay_presuspend and delay_bio
	bcache: check return value from btree_node_alloc_replacement()
	bcache: prevent potential division by zero error
	USB: serial: option: add Fibocom L7xx modules
	USB: serial: option: fix FM101R-GL defines
	USB: serial: option: don't claim interface 4 for ZTE MF290
	USB: dwc2: write HCINT with INTMASK applied
	usb: dwc3: set the dma max_seg_size
	USB: dwc3: qcom: fix resource leaks on probe deferral
	USB: dwc3: qcom: fix wakeup after probe deferral
	io_uring: fix off-by one bvec index
	pinctrl: avoid reload of p state in list iteration
	firewire: core: fix possible memory leak in create_units()
	mmc: block: Do not lose cache flush during CQE error recovery
	ALSA: hda: Disable power-save on KONTRON SinglePC
	ALSA: hda/realtek: Headset Mic VREF to 100%
	ALSA: hda/realtek: Add supported ALC257 for ChromeOS
	dm-verity: align struct dm_verity_fec_io properly
	dm verity: don't perform FEC for failed readahead IO
	bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
	powerpc: Don't clobber f0/vs0 during fp|altivec register save
	btrfs: add dmesg output for first mount and last unmount of a filesystem
	btrfs: fix off-by-one when checking chunk map includes logical address
	btrfs: send: ensure send_fd is writable
	btrfs: make error messages more clear when getting a chunk map
	Input: xpad - add HyperX Clutch Gladiate Support
	ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet
	net: stmmac: xgmac: Disable FPE MMC interrupts
	ravb: Fix races between ravb_tx_timeout_work() and net related ops
	net: ravb: Use pm_runtime_resume_and_get()
	net: ravb: Start TX queues after HW initialization succeeded
	smb3: fix touch -h of symlink
	s390/mm: fix phys vs virt confusion in mark_kernel_pXd() functions family
	s390/cmma: fix detection of DAT pages
	mtd: cfi_cmdset_0001: Support the absence of protection registers
	mtd: cfi_cmdset_0001: Byte swap OTP info
	fbdev: stifb: Make the STI next font pointer a 32-bit signed offset
	ima: annotate iint mutex to avoid lockdep false positive warnings
	ovl: skip overlayfs superblocks at global sync
	ima: detect changes to the backing overlay file
	scsi: qla2xxx: Simplify the code for aborting SCSI commands
	scsi: core: Introduce the scsi_cmd_to_rq() function
	scsi: qla2xxx: Use scsi_cmd_to_rq() instead of scsi_cmnd.request
	scsi: qla2xxx: Fix system crash due to bad pointer access
	cpufreq: imx6q: don't warn for disabling a non-existing frequency
	cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
	mmc: cqhci: Increase recovery halt timeout
	mmc: cqhci: Warn of halt or task clear failure
	mmc: cqhci: Fix task clearing in CQE error recovery
	mmc: core: convert comma to semicolon
	mmc: block: Retry commands in CQE error recovery
	Linux 5.4.263

Change-Id: I5187b50207d7ed37d7448664448409ed75106ea1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-12-11 12:08:17 +00:00
Keith Busch
40532b2913 io_uring: fix off-by one bvec index
commit d6fef34ee4d102be448146f24caf96d7b4a05401 upstream.

If the offset equals the bv_len of the first registered bvec, then the
request does not include any of that first bvec. Skip it so that drivers
don't have to deal with a zero length bvec, which was observed to break
NVMe's PRP list creation.

Cc: stable@vger.kernel.org
Fixes: bd11b3a391 ("io_uring: don't use iov_iter_advance() for fixed buffers")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20231120221831.2646460-1-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08 08:44:26 +01:00
Greg Kroah-Hartman
acebb4758a Merge 5.4.245 into android11-5.4-lts
Changes in 5.4.245
	cdc_ncm: Implement the 32-bit version of NCM Transfer Block
	net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize
	power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize
	power: supply: core: Refactor power_supply_set_input_current_limit_from_supplier()
	power: supply: bq24190: Call power_supply_changed() after updating input current
	fs: fix undefined behavior in bit shift for SB_NOUSER
	net/mlx5: devcom only supports 2 ports
	net/mlx5: Devcom, serialize devcom registration
	cdc_ncm: Fix the build warning
	io_uring: always grab lock in io_cancel_async_work()
	io_uring: don't drop completion lock before timer is fully initialized
	io_uring: have io_kill_timeout() honor the request references
	bluetooth: Add cmd validity checks at the start of hci_sock_ioctl()
	binder: fix UAF caused by faulty buffer cleanup
	ipv{4,6}/raw: fix output xfrm lookup wrt protocol
	netfilter: ctnetlink: Support offloaded conntrack entry deletion
	Linux 5.4.245

Change-Id: I25e786ed304f80b6ccb3896a8b8d2f16384f0cd6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-20 16:54:26 +00:00
Jens Axboe
9ba28194ea io_uring: have io_kill_timeout() honor the request references
No upstream commit exists for this patch.

Don't free the request unconditionally, if the request is issued async
then someone else may be holding a submit reference to it.

Reported-and-tested-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-05 08:17:32 +02:00
Jens Axboe
6de3014d4b io_uring: don't drop completion lock before timer is fully initialized
No upstream commit exists for this patch.

If we drop the lock right after adding it to the timeout list, then
someone attempting to kill timeouts will find it in an indeterminate
state. That means that cancelation could attempt to cancel and remove
a timeout, and then io_timeout() proceeds to init and add the timer
afterwards.

Ensure the timeout request is fully setup before we drop the
completion lock, which guards cancelation as well.

Reported-and-tested-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-05 08:17:32 +02:00
Jens Axboe
b0bfceaa8c io_uring: always grab lock in io_cancel_async_work()
No upstream commit exists for this patch.

It's not necessarily safe to check the task_list locklessly, remove
this micro optimization and always grab task_lock before deeming it
empty.

Reported-and-tested-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-05 08:17:32 +02:00
Greg Kroah-Hartman
6b029aa535 This is the 5.4.220 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmNZGM8ACgkQONu9yGCS
 aT6cjQ/+JSj2g4OKD3WLjhnyy3+GJC7GdHvD8dvMkX/DNW+DD+Ja32O00Jfwi7F1
 NMP/AglR4Y5aL3LvCyBR3SLj7Hq8pGOLYpLT8FxtFf7NSCXumUZmnLjRCDUqzovE
 W1ObC5EIJ1WMArZc28ECq5EGqOLuqiRcZyel4yDM71ttJ6AglEgOvhGIZMDDEaIh
 7rTKgplaU0rNwiOrh16PwUjXVd7AW3dkVCN+Mog96hgkrfokCTVj00QHy2DxEFV4
 JKrmrQBSwK36Db02k1+V2kpaKzVflPA1ZHAPee9SfJG50kfEoOOvjg9Yo0csMvqV
 LbYXiDhd04oF37Gf73PNhQyFVdyJYZstw1BOO5M/etYN9CNEGrWC1jR3XculxPdx
 oIN5Cy+9jBBAJOMxMi7Zx2ZSnacaSlKQq1faVFyv9ekA53HFKPKHUwy4jOGcM/rR
 yJw0r+IkCSYv4zTzUc2XM5n+3PXCBtXnrG7yVsihZiHxt4MZvQ5+J/aI88L8vOYa
 5mkt8hQ75cZmWiCQOzR2TcVwy/FoPoGlKUWZIO8XYCDLVNgUyqSyTPhe7+9AU7HK
 rKHTktX7BJ/202xRypqc4tRuOhRZ3W3Htzq9Dmhf0so61D9Ayzrdm7/eiNto+1ru
 nU+V4I740is9x1CMyUU30pHretuhUdz0cuhgpwHeiF2ki/21J6A=
 =JFUC
 -----END PGP SIGNATURE-----

Merge 5.4.220 into android11-5.4-lts

Changes in 5.4.220
	ALSA: oss: Fix potential deadlock at unregistration
	ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
	ALSA: usb-audio: Fix potential memory leaks
	ALSA: usb-audio: Fix NULL dererence at error path
	ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530
	ALSA: hda/realtek: Correct pin configs for ASUS G533Z
	ALSA: hda/realtek: Add quirk for ASUS GV601R laptop
	ALSA: hda/realtek: Add Intel Reference SSID to support headset keys
	mtd: rawnand: atmel: Unmap streaming DMA mappings
	cifs: destage dirty pages before re-reading them for cache=none
	cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message
	iio: dac: ad5593r: Fix i2c read protocol requirements
	iio: pressure: dps310: Refactor startup procedure
	iio: pressure: dps310: Reset chip after timeout
	usb: add quirks for Lenovo OneLink+ Dock
	can: kvaser_usb: Fix use of uninitialized completion
	can: kvaser_usb_leaf: Fix overread with an invalid command
	can: kvaser_usb_leaf: Fix TX queue out of sync after restart
	can: kvaser_usb_leaf: Fix CAN state after restart
	mmc: sdhci-sprd: Fix minimum clock limit
	fs: dlm: fix race between test_bit() and queue_work()
	fs: dlm: handle -EBUSY first in lock arg validation
	HID: multitouch: Add memory barriers
	quota: Check next/prev free block number after reading from quota file
	ASoC: wcd9335: fix order of Slimbus unprepare/disable
	regulator: qcom_rpm: Fix circular deferral regression
	RISC-V: Make port I/O string accessors actually work
	parisc: fbdev/stifb: Align graphics memory size to 4MB
	riscv: Allow PROT_WRITE-only mmap()
	riscv: Pass -mno-relax only on lld < 15.0.0
	UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
	PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge
	powerpc/boot: Explicitly disable usage of SPE instructions
	fbdev: smscufx: Fix use-after-free in ufx_ops_open()
	btrfs: fix race between quota enable and quota rescan ioctl
	f2fs: increase the limit for reserve_root
	f2fs: fix to do sanity check on destination blkaddr during recovery
	f2fs: fix to do sanity check on summary info
	nilfs2: fix use-after-free bug of struct nilfs_root
	jbd2: wake up journal waiters in FIFO order, not LIFO
	ext4: avoid crash when inline data creation follows DIO write
	ext4: fix null-ptr-deref in ext4_write_info
	ext4: make ext4_lazyinit_thread freezable
	ext4: place buffer head allocation before handle start
	livepatch: fix race between fork and KLP transition
	ftrace: Properly unset FTRACE_HASH_FL_MOD
	ring-buffer: Allow splice to read previous partially read pages
	ring-buffer: Have the shortest_full queue be the shortest not longest
	ring-buffer: Check pending waiters when doing wake ups as well
	ring-buffer: Fix race between reset page and reading page
	media: cedrus: Set the platform driver data earlier
	KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
	KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"
	KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS
	gcov: support GCC 12.1 and newer compilers
	drm/nouveau: fix a use-after-free in nouveau_gem_prime_import_sg_table()
	selinux: use "grep -E" instead of "egrep"
	tracing: Disable interrupt or preemption before acquiring arch_spinlock_t
	userfaultfd: open userfaultfds with O_RDONLY
	sh: machvec: Use char[] for section boundaries
	ARM: 9247/1: mm: set readonly for MT_MEMORY_RO with ARM_LPAE
	nfsd: Fix a memory leak in an error handling path
	wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state()
	wifi: mac80211: allow bw change during channel switch in mesh
	bpftool: Fix a wrong type cast in btf_dumper_int
	x86/resctrl: Fix to restore to original value when re-enabling hardware prefetch register
	wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime()
	wifi: rtl8xxxu: Fix skb misuse in TX queue selection
	bpf: btf: fix truncated last_member_type_id in btf_struct_resolve
	wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration
	net: fs_enet: Fix wrong check in do_pd_setup
	bpf: Ensure correct locking around vulnerable function find_vpid()
	x86/microcode/AMD: Track patch allocation size explicitly
	spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe
	netfilter: nft_fib: Fix for rpath check with VRF devices
	spi: s3c64xx: Fix large transfers with DMA
	vhost/vsock: Use kvmalloc/kvfree for larger packets.
	mISDN: fix use-after-free bugs in l1oip timer handlers
	sctp: handle the error returned from sctp_auth_asoc_init_active_key
	tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
	net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
	bnx2x: fix potential memory leak in bnx2x_tpa_stop()
	net/ieee802154: reject zero-sized raw_sendmsg()
	once: add DO_ONCE_SLOW() for sleepable contexts
	net: mvpp2: fix mvpp2 debugfs leak
	drm: bridge: adv7511: fix CEC power down control register offset
	drm/mipi-dsi: Detach devices when removing the host
	platform/chrome: fix double-free in chromeos_laptop_prepare()
	platform/chrome: fix memory corruption in ioctl
	platform/x86: msi-laptop: Fix old-ec check for backlight registering
	platform/x86: msi-laptop: Fix resource cleanup
	drm: fix drm_mipi_dbi build errors
	drm/bridge: megachips: Fix a null pointer dereference bug
	ASoC: rsnd: Add check for rsnd_mod_power_on
	ALSA: hda: beep: Simplify keep-power-at-enable behavior
	drm/omap: dss: Fix refcount leak bugs
	mmc: au1xmmc: Fix an error handling path in au1xmmc_probe()
	ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API
	drm/msm/dpu: index dpu_kms->hw_vbif using vbif_idx
	ALSA: dmaengine: increment buffer pointer atomically
	mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe()
	ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe
	ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe
	ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe
	ALSA: hda/hdmi: Don't skip notification handling during PM operation
	memory: pl353-smc: Fix refcount leak bug in pl353_smc_probe()
	memory: of: Fix refcount leak bug in of_get_ddr_timings()
	soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
	soc: qcom: smem_state: Add refcounting for the 'state->of_node'
	ARM: dts: turris-omnia: Fix mpp26 pin name and comment
	ARM: dts: kirkwood: lsxl: fix serial line
	ARM: dts: kirkwood: lsxl: remove first ethernet port
	ARM: dts: exynos: correct s5k6a3 reset polarity on Midas family
	ARM: Drop CMDLINE_* dependency on ATAGS
	ARM: dts: exynos: fix polarity of VBUS GPIO of Origen
	iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX
	iio: adc: at91-sama5d2_adc: check return status for pressure and touch
	iio: adc: at91-sama5d2_adc: lock around oversampling and sample freq
	iio: inkern: only release the device node when done with it
	iio: ABI: Fix wrong format of differential capacitance channel ABI.
	clk: meson: Hold reference returned by of_get_parent()
	clk: oxnas: Hold reference returned by of_get_parent()
	clk: berlin: Add of_node_put() for of_get_parent()
	clk: tegra: Fix refcount leak in tegra210_clock_init
	clk: tegra: Fix refcount leak in tegra114_clock_init
	clk: tegra20: Fix refcount leak in tegra20_clock_init
	HSI: omap_ssi: Fix refcount leak in ssi_probe
	HSI: omap_ssi_port: Fix dma_map_sg error check
	media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop
	tty: xilinx_uartps: Fix the ignore_status
	media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init
	RDMA/rxe: Fix "kernel NULL pointer dereference" error
	RDMA/rxe: Fix the error caused by qp->sk
	misc: ocxl: fix possible refcount leak in afu_ioctl()
	dyndbg: fix module.dyndbg handling
	dyndbg: let query-modname override actual module name
	mtd: devices: docg3: check the return value of devm_ioremap() in the probe
	RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.
	ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
	ata: fix ata_id_has_devslp()
	ata: fix ata_id_has_ncq_autosense()
	ata: fix ata_id_has_dipm()
	mtd: rawnand: meson: fix bit map use in meson_nfc_ecc_correct()
	md/raid5: Ensure stripe_fill happens on non-read IO with journal
	xhci: Don't show warning for reinit on known broken suspend
	usb: gadget: function: fix dangling pnp_string in f_printer.c
	drivers: serial: jsm: fix some leaks in probe
	tty: serial: fsl_lpuart: disable dma rx/tx use flags in lpuart_dma_shutdown
	phy: qualcomm: call clk_disable_unprepare in the error handling
	staging: vt6655: fix some erroneous memory clean-up loops
	firmware: google: Test spinlock on panic path to avoid lockups
	serial: 8250: Fix restoring termios speed after suspend
	scsi: libsas: Fix use-after-free bug in smp_execute_task_sg()
	fsi: core: Check error number after calling ida_simple_get
	mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe()
	mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq()
	mfd: lp8788: Fix an error handling path in lp8788_probe()
	mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init()
	mfd: fsl-imx25: Fix check for platform_get_irq() errors
	mfd: sm501: Add check for platform_driver_register()
	clk: mediatek: mt8183: mfgcfg: Propagate rate changes to parent
	dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup()
	spmi: pmic-arb: correct duplicate APID to PPID mapping logic
	clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration
	clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
	clk: ast2600: BCLK comes from EPLL
	mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
	powerpc/math_emu/efp: Include module.h
	powerpc/sysdev/fsl_msi: Add missing of_node_put()
	powerpc/pci_dn: Add missing of_node_put()
	powerpc/powernv: add missing of_node_put() in opal_export_attrs()
	x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
	powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5
	powerpc: Fix SPE Power ISA properties for e500v1 platforms
	cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
	iommu/omap: Fix buffer overflow in debugfs
	crypto: akcipher - default implementation for setting a private key
	crypto: ccp - Release dma channels before dmaengine unrgister
	iommu/iova: Fix module config properly
	kbuild: remove the target in signal traps when interrupted
	crypto: cavium - prevent integer overflow loading firmware
	f2fs: fix race condition on setting FI_NO_EXTENT flag
	ACPI: video: Add Toshiba Satellite/Portege Z830 quirk
	MIPS: BCM47XX: Cast memcmp() of function to (void *)
	powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue
	thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
	NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
	wifi: brcmfmac: fix invalid address access when enabling SCAN log level
	bpftool: Clear errno after libcap's checks
	openvswitch: Fix double reporting of drops in dropwatch
	openvswitch: Fix overreporting of drops in dropwatch
	tcp: annotate data-race around tcp_md5sig_pool_populated
	wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()
	xfrm: Update ipcomp_scratches with NULL when freed
	wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit()
	Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
	Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
	can: bcm: check the result of can_send() in bcm_can_tx()
	wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620
	wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620
	wifi: rt2x00: set VGC gain for both chains of MT7620
	wifi: rt2x00: set SoC wmac clock register
	wifi: rt2x00: correctly set BBP register 86 for MT7620
	net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
	Bluetooth: L2CAP: Fix user-after-free
	r8152: Rate limit overflow messages
	drm/nouveau/nouveau_bo: fix potential memory leak in nouveau_bo_alloc()
	drm: Use size_t type for len variable in drm_copy_field()
	drm: Prevent drm_copy_field() to attempt copying a NULL pointer
	drm/amd/display: fix overflow on MIN_I64 definition
	drm/vc4: vec: Fix timings for VEC modes
	drm: panel-orientation-quirks: Add quirk for Anbernic Win600
	platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
	drm/amdgpu: fix initial connector audio value
	mmc: sdhci-msm: add compatible string check for sdm670
	ARM: dts: imx7d-sdb: config the max pressure for tsc2046
	ARM: dts: imx6q: add missing properties for sram
	ARM: dts: imx6dl: add missing properties for sram
	ARM: dts: imx6qp: add missing properties for sram
	ARM: dts: imx6sl: add missing properties for sram
	ARM: dts: imx6sll: add missing properties for sram
	ARM: dts: imx6sx: add missing properties for sram
	btrfs: scrub: try to fix super block errors
	clk: zynqmp: Fix stack-out-of-bounds in strncpy`
	media: cx88: Fix a null-ptr-deref bug in buffer_prepare()
	clk: zynqmp: pll: rectify rate rounding in zynqmp_pll_round_rate
	scsi: 3w-9xxx: Avoid disabling device if failing to enable it
	nbd: Fix hung when signal interrupts nbd_start_device_ioctl()
	power: supply: adp5061: fix out-of-bounds read in adp5061_get_chg_type()
	staging: vt6655: fix potential memory leak
	ata: libahci_platform: Sanity check the DT child nodes number
	bcache: fix set_at_max_writeback_rate() for multiple attached devices
	HID: roccat: Fix use-after-free in roccat_read()
	md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
	usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info()
	usb: musb: Fix musb_gadget.c rxstate overflow bug
	Revert "usb: storage: Add quirk for Samsung Fit flash"
	staging: rtl8723bs: fix a potential memory leak in rtw_init_cmd_priv()
	nvme: copy firmware_rev on each init
	nvmet-tcp: add bounds check on Transfer Tag
	usb: idmouse: fix an uninit-value in idmouse_open
	clk: bcm2835: Make peripheral PLLC critical
	perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
	io_uring/af_unix: defer registered files gc to io_uring release
	net: ieee802154: return -EINVAL for unknown addr type
	Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
	net/ieee802154: don't warn zero-sized raw_sendmsg()
	ext4: continue to expand file system when the target size doesn't reach
	md: Replace snprintf with scnprintf
	efi: libstub: drop pointless get_memory_map() call
	inet: fully convert sk->sk_rx_dst to RCU rules
	thermal: intel_powerclamp: Use first online CPU as control_cpu
	Linux 5.4.220

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I91859d6b79f44ab654cb0c88d0d6c9c46f62131b
2022-10-29 10:45:08 +02:00
Pavel Begunkov
04df9719df io_uring/af_unix: defer registered files gc to io_uring release
[ upstream commit 0091bfc81741b8d3aeb3b7ab8636f911b2de6e80 ]

Instead of putting io_uring's registered files in unix_gc() we want it
to be done by io_uring itself. The trick here is to consider io_uring
registered files for cycle detection but not actually putting them down.
Because io_uring can't register other ring instances, this will remove
all refs to the ring file triggering the ->release path and clean up
with io_ring_ctx_free().

Cc: stable@vger.kernel.org
Fixes: 6b06314c47 ("io_uring: add file set registration")
Reported-and-tested-by: David Bouman <dbouman03@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[axboe: add kerneldoc comment to skb, fold in skb leak fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26 13:22:59 +02:00
Greg Kroah-Hartman
d60223937b Revert "io_uring: disable polling pollfree files"
This reverts commit fc78b2fc21.

This breaks the Android api and for now, does not seem to be necessary
due to the lack of io_uring users in this kernel branch.  If io_uring
starts to be used more, it can be brought back in a ABI-safe way.

Bug: 161946584
Bug: 248008710
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2696bd5e1ad61d3ab0e8d06f4ffe46718bb05845
2022-09-21 15:47:16 +02:00
Pavel Begunkov
fc78b2fc21 io_uring: disable polling pollfree files
Older kernels lack io_uring POLLFREE handling. As only affected files
are signalfd and android binder the safest option would be to disable
polling those files via io_uring and hope there are no users.

Fixes: 221c5eb233 ("io_uring: add support for IORING_OP_POLL")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05 10:27:47 +02:00
Pavel Begunkov
1a623d361f io_uring: fix fs->users overflow
There is a bunch of cases where we can grab req->fs but not put it, this
can be used to cause a controllable overflow with further implications.
Release req->fs in the request free path and make sure we zero the field
to be sure we don't do it twice.

Fixes: cac68d12c5 ("io_uring: grab ->fs as part of async offload")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15 14:18:41 +02:00
Nicolai Stange
c4a23c852e io_uring: Fix current->fs handling in io_sq_wq_submit_work()
No upstream commit, this is a fix to a stable 5.4 specific backport.

The intention of backport commit cac68d12c5 ("io_uring: grab ->fs as part
of async offload") as found in the stable 5.4 tree was to make
io_sq_wq_submit_work() to switch the workqueue task's ->fs over to the
submitting task's one during the IO operation.

However, due to a small logic error, this change turned out to not have any
actual effect. From a high level, the relevant code in
io_sq_wq_submit_work() looks like

  old_fs_struct = current->fs;
  do {
          ...
          if (req->fs != current->fs && current->fs != old_fs_struct) {
                  task_lock(current);
                  if (req->fs)
                          current->fs = req->fs;
                  else
                          current->fs = old_fs_struct;
                  task_unlock(current);
          }
          ...
  } while (req);

The if condition is supposed to cover the case that current->fs doesn't
match what's needed for processing the request, but observe how it fails
to ever evaluate to true due to the second clause:
current->fs != old_fs_struct will be false in the first iteration as per
the initialization of old_fs_struct and because this prevents current->fs
from getting replaced, the same follows inductively for all subsequent
iterations.

Fix said if condition such that
- if req->fs is set and doesn't match current->fs, the latter will be
  switched to the former
- or if req->fs is unset, the switch back to the initial old_fs_struct
  will be made, if necessary.

While at it, also correct the condition for the ->fs related cleanup right
before the return of io_sq_wq_submit_work(): currently, old_fs_struct is
restored only if it's non-NULL. It is always non-NULL though and thus, the
if-condition is rendundant. Supposedly, the motivation had been to optimize
and avoid switching current->fs back to the initial old_fs_struct in case
it is found to have the desired value already. Make it so.

Cc: stable@vger.kernel.org # v5.4
Fixes: cac68d12c5 ("io_uring: grab ->fs as part of async offload")
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:54:10 +01:00
Muchun Song
ee413b2915 io_uring: Fix double list add in io_queue_async_work()
If we queue work in io_poll_wake(), it will leads to list double
add. So we should add the list when the callback func is the
io_sq_wq_submit_work.

The following oops was seen:

    list_add double add: new=ffff9ca6a8f1b0e0, prev=ffff9ca62001cee8,
    next=ffff9ca6a8f1b0e0.
    ------------[ cut here ]------------
    kernel BUG at lib/list_debug.c:31!
    Call Trace:
     <IRQ>
     io_poll_wake+0xf3/0x230
     __wake_up_common+0x91/0x170
     __wake_up_common_lock+0x7a/0xc0
     io_commit_cqring+0xea/0x280
     ? blkcg_iolatency_done_bio+0x2b/0x610
     io_cqring_add_event+0x3e/0x60
     io_complete_rw+0x58/0x80
     dio_complete+0x106/0x250
     blk_update_request+0xa0/0x3b0
     blk_mq_end_request+0x1a/0x110
     blk_mq_complete_request+0xd0/0xe0
     nvme_irq+0x129/0x270 [nvme]
     __handle_irq_event_percpu+0x7b/0x190
     handle_irq_event_percpu+0x30/0x80
     handle_irq_event+0x3c/0x60
     handle_edge_irq+0x91/0x1e0
     do_IRQ+0x4d/0xd0
     common_interrupt+0xf/0xf

Fixes: 1c4404efcf ("io_uring: make sure async workqueue is canceled on exit")
Reported-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:32:57 +02:00
Muchun Song
efb1cef27d io_uring: Fix remove irrelevant req from the task_list
If the process 0 has been initialized io_uring is complete, and
then fork process 1. If process 1 exits and it leads to delete
all reqs from the task_list. If we kill process 0. We will not
send SIGINT signal to the kworker. So we can not remove the req
from the task_list. The io_sq_wq_submit_work() can do that for
us.

Fixes: 1c4404efcf ("io_uring: make sure async workqueue is canceled on exit")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:32:57 +02:00
Muchun Song
75524f7533 io_uring: Fix missing smp_mb() in io_cancel_async_work()
The store to req->flags and load req->work_task should not be
reordering in io_cancel_async_work(). We should make sure that
either we store REQ_F_CANCE flag to req->flags or we see the
req->work_task setted in io_sq_wq_submit_work().

Fixes: 1c4404efcf ("io_uring: make sure async workqueue is canceled on exit")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:32:57 +02:00
Yinyin Zhu
d9e81b2fb3 io_uring: Fix resource leaking when kill the process
The commit

  1c4404efcf2c0> ("<io_uring: make sure async workqueue is canceled on exit>")

doesn't solve the resource leak problem totally! When kworker is doing a
io task for the io_uring, The process which submitted the io task has
received a SIGKILL signal from the user. Then the io_cancel_async_work
function could have sent a SIGINT signal to the kworker, but the judging
condition is wrong. So it doesn't send a SIGINT signal to the kworker,
then caused the resource leaking problem.

Why the juding condition is wrong? The process is a multi-threaded process,
we call the thread of the process which has submitted the io task Thread1.
So the req->task is the current macro of the Thread1. when all the threads
of the process have done exit procedure, the last thread will call the
io_cancel_async_work, but the last thread may not the Thread1, so the task
is not equal and doesn't send the SIGINT signal. To fix this bug, we alter
the task attribute of the req with struct files_struct. And check the files
instead.

Fixes: 1c4404efcf ("io_uring: make sure async workqueue is canceled on exit")
Signed-off-by: Yinyin Zhu <zhuyinyin@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:32:57 +02:00
Xin Yin
54ee77961e io_uring: Fix NULL pointer dereference in io_sq_wq_submit_work()
the commit <1c4404efcf2c0> ("<io_uring: make sure async workqueue
is canceled on exit>") caused a crash in io_sq_wq_submit_work().
when io_ring-wq get a req form async_list, which not have been
added to task_list. Then try to delete the req from task_list will caused
a "NULL pointer dereference".

Ensure add req to async_list and task_list at the sametime.

The crash log looks like this:
[95995.973638] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[95995.979123] pgd = c20c8964
[95995.981803] [00000000] *pgd=1c72d831, *pte=00000000, *ppte=00000000
[95995.988043] Internal error: Oops: 817 [#1] SMP ARM
[95995.992814] Modules linked in: bpfilter(-)
[95995.996898] CPU: 1 PID: 15661 Comm: kworker/u8:5 Not tainted 5.4.56 #2
[95996.003406] Hardware name: Amlogic Meson platform
[95996.008108] Workqueue: io_ring-wq io_sq_wq_submit_work
[95996.013224] PC is at io_sq_wq_submit_work+0x1f4/0x5c4
[95996.018261] LR is at walk_stackframe+0x24/0x40
[95996.022685] pc : [<c059b898>]    lr : [<c030da7c>]    psr: 600f0093
[95996.028936] sp : dc6f7e88  ip : dc6f7df0  fp : dc6f7ef4
[95996.034148] r10: deff9800  r9 : dc1d1694  r8 : dda58b80
[95996.039358] r7 : dc6f6000  r6 : dc6f7ebc  r5 : dc1d1600  r4 : deff99c0
[95996.045871] r3 : 0000cb5d  r2 : 00000000  r1 : ef6b9b80  r0 : c059b88c
[95996.052385] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[95996.059593] Control: 10c5387d  Table: 22be804a  DAC: 00000055
[95996.065325] Process kworker/u8:5 (pid: 15661, stack limit = 0x78013c69)
[95996.071923] Stack: (0xdc6f7e88 to 0xdc6f8000)
[95996.076268] 7e80:                   dc6f7ecc dc6f7e98 00000000 c1f06c08 de9dc800 deff9a04
[95996.084431] 7ea0: 00000000 dc6f7f7c 00000000 c1f65808 0000080c dc677a00 c1ee9bd0 dc6f7ebc
[95996.092594] 7ec0: dc6f7ebc d085c8f6 c0445a90 dc1d1e00 e008f300 c0288400 e4ef7100 00000000
[95996.100757] 7ee0: c20d45b0 e4ef7115 dc6f7f34 dc6f7ef8 c03725f0 c059b6b0 c0288400 c0288400
[95996.108921] 7f00: c0288400 00000001 c0288418 e008f300 c0288400 e008f314 00000088 c0288418
[95996.117083] 7f20: c1f03d00 dc6f6038 dc6f7f7c dc6f7f38 c0372df8 c037246c dc6f7f5c 00000000
[95996.125245] 7f40: c1f03d00 c1f03d00 c20d3cbe c0288400 dc6f7f7c e1c43880 e4fa7980 00000000
[95996.133409] 7f60: e008f300 c0372d9c e48bbe74 e1c4389c dc6f7fac dc6f7f80 c0379244 c0372da8
[95996.141570] 7f80: 600f0093 e4fa7980 c0379108 00000000 00000000 00000000 00000000 00000000
[95996.149734] 7fa0: 00000000 dc6f7fb0 c03010ac c0379114 00000000 00000000 00000000 00000000
[95996.157897] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[95996.166060] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[95996.174217] Backtrace:
[95996.176662] [<c059b6a4>] (io_sq_wq_submit_work) from [<c03725f0>] (process_one_work+0x190/0x4c0)
[95996.185425]  r10:e4ef7115 r9:c20d45b0 r8:00000000 r7:e4ef7100 r6:c0288400 r5:e008f300
[95996.193237]  r4:dc1d1e00
[95996.195760] [<c0372460>] (process_one_work) from [<c0372df8>] (worker_thread+0x5c/0x5bc)
[95996.203836]  r10:dc6f6038 r9:c1f03d00 r8:c0288418 r7:00000088 r6:e008f314 r5:c0288400
[95996.211647]  r4:e008f300
[95996.214173] [<c0372d9c>] (worker_thread) from [<c0379244>] (kthread+0x13c/0x168)
[95996.221554]  r10:e1c4389c r9:e48bbe74 r8:c0372d9c r7:e008f300 r6:00000000 r5:e4fa7980
[95996.229363]  r4:e1c43880
[95996.231888] [<c0379108>] (kthread) from [<c03010ac>] (ret_from_fork+0x14/0x28)
[95996.239088] Exception stack(0xdc6f7fb0 to 0xdc6f7ff8)
[95996.244127] 7fa0:                                     00000000 00000000 00000000 00000000
[95996.252291] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[95996.260453] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[95996.267054]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0379108
[95996.274866]  r4:e4fa7980 r3:600f0093
[95996.278430] Code: eb3a59e1 e5952098 e5951094 e5812004 (e5821000)

Signed-off-by: Xin Yin <yinxin_1989@aliyun.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03 11:27:11 +02:00
Guoyu Huang
5de0b5247c io_uring: Fix NULL pointer dereference in loop_rw_iter()
commit 2dd2111d0d383df104b144e0d1f6b5a00cb7cd88 upstream.

loop_rw_iter() does not check whether the file has a read or
write function. This can lead to NULL pointer dereference
when the user passes in a file descriptor that does not have
read or write function.

The crash log looks like this:

[   99.834071] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   99.835364] #PF: supervisor instruction fetch in kernel mode
[   99.836522] #PF: error_code(0x0010) - not-present page
[   99.837771] PGD 8000000079d62067 P4D 8000000079d62067 PUD 79d8c067 PMD 0
[   99.839649] Oops: 0010 [#2] SMP PTI
[   99.840591] CPU: 1 PID: 333 Comm: io_wqe_worker-0 Tainted: G      D           5.8.0 #2
[   99.842622] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   99.845140] RIP: 0010:0x0
[   99.845840] Code: Bad RIP value.
[   99.846672] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202
[   99.848018] RAX: 0000000000000000 RBX: ffff92363bd67300 RCX: ffff92363d461208
[   99.849854] RDX: 0000000000000010 RSI: 00007ffdbf696bb0 RDI: ffff92363bd67300
[   99.851743] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000
[   99.853394] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010
[   99.855148] R13: 0000000000000000 R14: ffff92363d461208 R15: ffffa1c7c01ebc68
[   99.856914] FS:  0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000
[   99.858651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   99.860032] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0
[   99.861979] Call Trace:
[   99.862617]  loop_rw_iter.part.0+0xad/0x110
[   99.863838]  io_write+0x2ae/0x380
[   99.864644]  ? kvm_sched_clock_read+0x11/0x20
[   99.865595]  ? sched_clock+0x9/0x10
[   99.866453]  ? sched_clock_cpu+0x11/0xb0
[   99.867326]  ? newidle_balance+0x1d4/0x3c0
[   99.868283]  io_issue_sqe+0xd8f/0x1340
[   99.869216]  ? __switch_to+0x7f/0x450
[   99.870280]  ? __switch_to_asm+0x42/0x70
[   99.871254]  ? __switch_to_asm+0x36/0x70
[   99.872133]  ? lock_timer_base+0x72/0xa0
[   99.873155]  ? switch_mm_irqs_off+0x1bf/0x420
[   99.874152]  io_wq_submit_work+0x64/0x180
[   99.875192]  ? kthread_use_mm+0x71/0x100
[   99.876132]  io_worker_handle_work+0x267/0x440
[   99.877233]  io_wqe_worker+0x297/0x350
[   99.878145]  kthread+0x112/0x150
[   99.878849]  ? __io_worker_unuse+0x100/0x100
[   99.879935]  ? kthread_park+0x90/0x90
[   99.880874]  ret_from_fork+0x22/0x30
[   99.881679] Modules linked in:
[   99.882493] CR2: 0000000000000000
[   99.883324] ---[ end trace 4453745f4673190b ]---
[   99.884289] RIP: 0010:0x0
[   99.884837] Code: Bad RIP value.
[   99.885492] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202
[   99.886851] RAX: 0000000000000000 RBX: ffff92363acd7f00 RCX: ffff92363d461608
[   99.888561] RDX: 0000000000000010 RSI: 00007ffe040d9e10 RDI: ffff92363acd7f00
[   99.890203] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000
[   99.891907] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010
[   99.894106] R13: 0000000000000000 R14: ffff92363d461608 R15: ffffa1c7c01ebc68
[   99.896079] FS:  0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000
[   99.898017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   99.899197] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0

Fixes: 32960613b7 ("io_uring: correctly handle non ->{read,write}_iter() file_operations")
Cc: stable@vger.kernel.org
Signed-off-by: Guoyu Huang <hgy5945@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19 08:16:29 +02:00
Jens Axboe
3c512bd3db io_uring: set ctx sq/cq entry count earlier
commit bd74048108c179cea0ff52979506164c80f29da7 upstream.

If we hit an earlier error path in io_uring_create(), then we will have
accounted memory, but not set ctx->{sq,cq}_entries yet. Then when the
ring is torn down in error, we use those values to unaccount the memory.

Ensure we set the ctx entries before we're able to hit a potential error
path.

Cc: stable@vger.kernel.org
Reported-by: Tomáš Chaloupka <chalucha@gmail.com>
Tested-by: Tomáš Chaloupka <chalucha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19 08:16:24 +02:00
Dmitry Vyukov
0b1799662a io_uring: fix sq array offset calculation
[ Upstream commit b36200f543ff07a1cb346aa582349141df2c8068 ]

rings_size() sets sq_offset to the total size of the rings (the returned
value which is used for memory allocation). This is wrong: sq array should
be located within the rings, not after them. Set sq_offset to where it
should be.

Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hristo Venev <hristo@venev.name>
Cc: io-uring@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19 08:15:57 +02:00
Liu Yong
a02df82a59 fs/io_uring.c: Fix uninitialized variable is referenced in io_submit_sqe
the commit <a4d61e66ee4a> ("<io_uring: prevent re-read of sqe->opcode>")
caused another vulnerability. After io_get_req(), the sqe_submit struct
in req is not initialized, but the following code defaults that
req->submit.opcode is available.

Signed-off-by: Liu Yong <pkfxxxing@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19 08:15:53 +02:00
Guoyu Huang
e8053c6833 io_uring: Fix use-after-free in io_sq_wq_submit_work()
when ctx->sqo_mm is zero, io_sq_wq_submit_work() frees 'req'
without deleting it from 'task_list'. After that, 'req' is
accessed in io_ring_ctx_wait_and_kill() which lead to
a use-after-free.

Signed-off-by: Guoyu Huang <hgy5945@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11 15:33:33 +02:00
Jens Axboe
a4d61e66ee io_uring: prevent re-read of sqe->opcode
Liu reports that he can trigger a NULL pointer dereference with
IORING_OP_SENDMSG, by changing the sqe->opcode after we've validated
that the previous opcode didn't need a file and didn't assign one.

Ensure we validate and read the opcode only once.

Reported-by: Liu Yong <pkfxxxing@gmail.com>
Tested-by: Liu Yong <pkfxxxing@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11 15:33:32 +02:00
Jens Axboe
1c4404efcf io_uring: make sure async workqueue is canceled on exit
Track async work items that we queue, so we can safely cancel them
if the ring is closed or the process exits. Newer kernels handle
this automatically with io-wq, but the old workqueue based setup needs
a bit of special help to get there.

There's no upstream variant of this, as that would require backporting
all the io-wq changes from 5.5 and on. Hence I made a one-off that
ensures that we don't leak memory if we have async work items that
need active cancelation (like socket IO).

Reported-by: Agarwal, Anchal <anchalag@amazon.com>
Tested-by: Agarwal, Anchal <anchalag@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09 09:37:49 +02:00
Denis Efremov
ab2df991e5 io_uring: use kvfree() in io_sqe_buffer_register()
commit a8c73c1a614f6da6c0b04c393f87447e28cb6de4 upstream.

Use kvfree() to free the pages and vmas, since they are allocated by
kvmalloc_array() in a loop.

Fixes: d4ef647510 ("io_uring: avoid page allocation warnings")
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200605093203.40087-1-efremov@linux.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-17 16:40:26 +02:00
Jens Axboe
ba55015317 io_uring: initialize ctx->sqo_wait earlier
[ Upstream commit 583863ed918136412ddf14de2e12534f17cfdc6f ]

Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated,
instead of deferring it to the offload setup. This fixes a syzbot
reported lockdep complaint, which is really due to trying to wake_up
on an uninitialized wait queue:

RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b
RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260
R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122
 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160
 io_poll_remove_all fs/io_uring.c:4357 [inline]
 io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305
 io_uring_create fs/io_uring.c:7843 [inline]
 io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x441319
Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9

Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-07 13:18:50 +02:00
Jens Axboe
7661469ef5 io_uring: honor original task RLIMIT_FSIZE
commit 4ed734b0d0913e566a9d871e15d24eb240f269f7 upstream.

With the previous fixes for number of files open checking, I added some
debug code to see if we had other spots where we're checking rlimit()
against the async io-wq workers. The only one I found was file size
checking, which we should also honor.

During write and fallocate prep, store the max file size and override
that for the current ask if we're in io-wq worker context.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:50:16 +02:00
Jens Axboe
38119a6897 io_uring: remove bogus RLIMIT_NOFILE check in file registration
commit c336e992cb1cb1db9ee608dfb30342ae781057ab upstream.

We already checked this limit when the file was opened, and we keep it
open in the file table. Hence when we added unit_inflight to the count
we want to register, we're doubly accounting these files. This results
in -EMFILE for file registration, if we're at half the limit.

Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:50:11 +02:00
Jens Axboe
7eaf718b83 io_uring: fix 32-bit compatability with sendmsg/recvmsg
commit d876836204897b6d7d911f942084f69a1e9d5c4d upstream.

We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
the iovec import for these commands will not do the right thing and fail
the command with -EINVAL.

Found by running the test suite compiled as 32-bit.

Cc: stable@vger.kernel.org
Fixes: aa1fa28fc7 ("io_uring: add support for recvmsg()")
Fixes: 0fa03c624d ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05 16:43:44 +01:00
Jens Axboe
cac68d12c5 io_uring: grab ->fs as part of async offload
[ Upstream commits 9392a27d88b9 and ff002b30181d ]

Ensure that the async work grabs ->fs from the queueing task if the
punted commands needs to do lookups.

We don't have these two commits in 5.4-stable:

ff002b30181d30cdfbca316dadd099c3ca0d739c
9392a27d88b9707145d713654eb26f0c29789e50

because they don't apply with the rework that was done in how io_uring
handles offload. Since there's no io-wq in 5.4, it doesn't make sense to
do two patches. I'm attaching my port of the two for 5.4-stable, it's
been tested. Please queue it up for the next 5.4-stable, thanks!

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05 16:43:31 +01:00
Stefano Garzarella
8eb92c1228 io_uring: prevent sq_thread from spinning when it should stop
commit 7143b5ac5750f404ff3a594b34fdf3fc2f99f828 upstream.

This patch drops 'cur_mm' before calling cond_resched(), to prevent
the sq_thread from spinning even when the user process is finished.

Before this patch, if the user process ended without closing the
io_uring fd, the sq_thread continues to spin until the
'sq_thread_idle' timeout ends.

In the worst case where the 'sq_thread_idle' parameter is bigger than
INT_MAX, the sq_thread will spin forever.

Fixes: 6c271ce2f1 ("io_uring: add submission polling")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28 17:22:27 +01:00
Xiaoguang Wang
c7deb9612e io_uring: fix __io_iopoll_check deadlock in io_sq_thread
commit c7849be9cc2dd2754c48ddbaca27c2de6d80a95d upstream.

Since commit a3a0e43fd7 ("io_uring: don't enter poll loop if we have
CQEs pending"), if we already events pending, we won't enter poll loop.
In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has
been terminated and don't reap pending events which are already in cq
ring, and there are some reqs in poll_list, io_sq_thread will enter
__io_iopoll_check(), and find pending events, then return, this loop
will never have a chance to exit.

I have seen this issue in fio stress tests, to fix this issue, let
io_sq_thread call io_iopoll_getevents() with argument 'min' being zero,
and remove __io_iopoll_check().

Fixes: a3a0e43fd7 ("io_uring: don't enter poll loop if we have CQEs pending")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28 17:22:27 +01:00
Jens Axboe
b29d143a69 Revert "io_uring: only allow submit from owning task"
commit 73e08e711d9c1d79fae01daed4b0e1fee5f8a275 upstream.

This ends up being too restrictive for tasks that willingly fork and
share the ring between forks. Andres reports that this breaks his
postgresql work. Since we're close to 5.5 release, revert this change
for now.

Cc: stable@vger.kernel.org
Fixes: 44d282796f81 ("io_uring: only allow submit from owning task")
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:45:24 +01:00
Jens Axboe
af2e7c923d io_uring: only allow submit from owning task
commit 44d282796f81eb1debc1d7cb53245b4cb3214cb5 upstream.

If the credentials or the mm doesn't match, don't allow the task to
submit anything on behalf of this ring. The task that owns the ring can
pass the file descriptor to another task, but we don't want to allow
that task to submit an SQE that then assumes the ring mm and creds if
it needs to go async.

Cc: stable@vger.kernel.org
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:22:32 +01:00
Pavel Begunkov
0023527474 io_uring: don't wait when under-submitting
[ Upstream commit 7c504e65206a4379ff38fe41d21b32b6c2c3e53e ]

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

Don't wait/poll if can't submit all sqes

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:21:38 +01:00
Jens Axboe
d1b69aabcd io_uring: use current task creds instead of allocating a new one
commit 0b8c0ec7eedcd8f9f1a1f238d87f9b512b09e71a upstream.

syzbot reports:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace f2e1a4307fbe2245 ]---
RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

which is caused by slab fault injection triggering a failure in
prepare_creds(). We don't actually need to create a copy of the creds
as we're not modifying it, we just need a reference on the current task
creds. This avoids the failure case as well, and propagates the const
throughout the stack.

Fixes: 181e448d8709 ("io_uring: async workers should inherit the user creds")
Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[ only use the io_uring.c portion of the patch - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:20:00 +01:00
Jens Axboe
1768acaa6d io_uring: io_allocate_scq_urings() should return a sane state
[ Upstream commit eb065d301e8c83643367bdb0898becc364046bda ]

We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.

Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.

Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04 19:18:24 +01:00
Jens Axboe
74dcfcd1d3 io_uring: ensure req->submit is copied when req is deferred
There's an issue with deferred requests through drain, where if we do
need to defer, we're not copying over the sqe_submit state correctly.
This can result in using uninitialized data when we then later go and
submit the deferred request, like this check in __io_submit_sqe():

         if (unlikely(s->index >= ctx->sq_entries))
                 return -EINVAL;

with 's' being uninitialized, we can randomly fail this check. Fix this
by copying sqe_submit state when we defer a request.

Because it was fixed as part of a cleanup series in mainline, before
anyone realized we had this issue. That removed the separate states
of ->index vs ->submit.sqe. That series is not something I was
comfortable putting into stable, hence the much simpler addition.
Here's the patch in the series that fixes the same issue:

commit cf6fd4bd559ee61a4454b161863c8de6f30f8dca
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Mon Nov 25 23:14:39 2019 +0300

    io_uring: inline struct sqe_submit

Reported-by: Andres Freund <andres@anarazel.de>
Reported-by: Tomáš Chaloupka
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:33 +01:00
Jens Axboe
1dec7fcac3 io_uring: fix missing kmap() declaration on powerpc
commit aa4c3967756c6c576a38a23ac511be211462a6b7 upstream.

Christophe reports that current master fails building on powerpc with
this error:

   CC      fs/io_uring.o
fs/io_uring.c: In function ‘loop_rw_iter’:
fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’
[-Werror=implicit-function-declaration]
     iovec.iov_base = kmap(iter->bvec->bv_page)
                      ^
fs/io_uring.c:1628:19: warning: assignment makes pointer from integer
without a cast [-Wint-conversion]
     iovec.iov_base = kmap(iter->bvec->bv_page)
                    ^
fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’
[-Werror=implicit-function-declaration]
     kunmap(iter->bvec->bv_page);
     ^

which is caused by a missing highmem.h include. Fix it by including
it.

Fixes: 311ae9e159d8 ("io_uring: fix dead-hung for non-iter fixed rw")
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:32 +01:00
Jens Axboe
57aabff8cc io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR
commit 441cdbd5449b4923cd413d3ba748124f91388be9 upstream.

We should never return -ERESTARTSYS to userspace, transform it into
-EINTR.

Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:28 +01:00
Pavel Begunkov
f246eedbaf io_uring: fix dead-hung for non-iter fixed rw
commit 311ae9e159d81a1ec1cf645daf40b39ae5a0bd84 upstream.

Read/write requests to devices without implemented read/write_iter
using fixed buffers can cause general protection fault, which totally
hangs a machine.

io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter()
accesses it as iovec, dereferencing random address.

kmap() page by page in this case

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:27 +01:00
Jens Axboe
8387e3688a io_uring: async workers should inherit the user creds
[ Upstream commit 181e448d8709e517c9c7b523fcd209f24eb38ca7 ]

If we don't inherit the original task creds, then we can confuse users
like fuse that pass creds in the request header. See link below on
identical aio issue.

Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#u
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-04 22:30:42 +01:00
Jens Axboe
5e559561a8 io_uring: ensure registered buffer import returns the IO length
A test case was reported where two linked reads with registered buffers
failed the second link always. This is because we set the expected value
of a request in req->result, and if we don't get this result, then we
fail the dependent links. For some reason the registered buffer import
returned -ERROR/0, while the normal import returns -ERROR/length. This
broke linked commands with registered buffers.

Fix this by making io_import_fixed() correctly return the mapped length.

Cc: stable@vger.kernel.org # v5.3
Reported-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-13 16:15:14 -07:00
Pavel Begunkov
5683e5406e io_uring: Fix getting file for timeout
For timeout requests io_uring tries to grab a file with specified fd,
which is usually stdin/fd=0.
Update io_op_needs_file()

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-13 15:25:57 -07:00
Jens Axboe
93bd25bb69 io_uring: make timeout sequence == 0 mean no sequence
Currently we make sequence == 0 be the same as sequence == 1, but that's
not super useful if the intent is really to have a timeout that's just
a pure timeout.

If the user passes in sqe->off == 0, then don't apply any sequence logic
to the request, let it purely be driven by the timeout specified.

Reported-by: 李通洲 <carter.li@eoitek.com>
Reviewed-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12 00:18:51 -07:00
Jens Axboe
6873e0bd6a io_uring: ensure we clear io_kiocb->result before each issue
We use io_kiocb->result == -EAGAIN as a way to know if we need to
re-submit a polled request, as -EAGAIN reporting happens out-of-line
for IO submission failures. This field is cleared when we originally
allocate the request, but it isn't reset when we retry the submission
from async context. This can cause issues where we think something
needs a re-issue, but we're really just reading stale data.

Reset ->result whenever we re-prep a request for polled submission.

Cc: stable@vger.kernel.org
Fixes: 9e645e1105 ("io_uring: add support for sqe links")
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-30 14:45:22 -06:00
Jens Axboe
044c1ab399 io_uring: don't touch ctx in setup after ring fd install
syzkaller reported an issue where it looks like a malicious app can
trigger a use-after-free of reading the ctx ->sq_array and ->rings
value right after having installed the ring fd in the process file
table.

Defer ring fd installation until after we're done reading those
values.

Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-28 09:15:33 -06:00
Pavel Begunkov
7b20238d28 io_uring: Fix leaked shadow_req
io_queue_link_head() owns shadow_req after taking it as an argument.
By not freeing it in case of an error, it can leak the request along
with taken ctx->refs.

Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-27 21:29:18 -06:00