Commit Graph

9818 Commits

Author SHA1 Message Date
Jia-Ju Bai
1d2eda18f6 btrfs: fix root ref counts in error handling in btrfs_get_root_ref
commit 168a2f776b9762f4021421008512dd7ab7474df1 upstream.

In btrfs_get_root_ref(), when btrfs_insert_fs_root() fails,
btrfs_put_root() can happen for two reasons:

- the root already exists in the tree, in that case it returns the
  reference obtained in btrfs_lookup_fs_root()

- another error so the cleanup is done in the fail label

Calling btrfs_put_root() unconditionally would lead to double decrement
of the root reference possibly freeing it in the second case.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Fixes: bc44d7c4b2 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20 09:23:27 +02:00
Josef Bacik
76e086ce7b btrfs: do not warn for free space inode in cow_file_range
[ Upstream commit a7d16d9a07bbcb7dcd5214a1bea75c808830bc0d ]

This is a long time leftover from when I originally added the free space
inode, the point was to catch cases where we weren't honoring the NOCOW
flag.  However there exists a race with relocation, if we allocate our
free space inode in a block group that is about to be relocated, we
could trigger the COW path before the relocation has the opportunity to
find the extents and delete the free space cache.  In production where
we have auto-relocation enabled we're seeing this WARN_ON_ONCE() around
5k times in a 2 week period, so not super common but enough that it's at
the top of our metrics.

We're properly handling the error here, and with us phasing out v1 space
cache anyway just drop the WARN_ON_ONCE.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20 09:23:19 +02:00
Darrick J. Wong
217190dc66 btrfs: fix fallocate to use file_modified to update permissions consistently
[ Upstream commit 05fd9564e9faf0f23b4676385e27d9405cef6637 ]

Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files.  This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero range).  Because the call can be used to change file contents, we
should treat it like we do any other modification to a file -- update
the mtime, and drop set[ug]id privileges/capabilities.

The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20 09:23:19 +02:00
Nathan Chancellor
921fdc45a0 btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
commit 6d4a6b515c39f1f8763093e0f828959b2fbc2f45 upstream.

Clang's version of -Wunused-but-set-variable recently gained support for
unary operations, which reveals two unused variables:

  fs/btrfs/block-group.c:2949:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable]
          int num_started = 0;
              ^
  fs/btrfs/block-group.c:3116:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable]
          int num_started = 0;
              ^
  2 errors generated.

These variables appear to be unused from their introduction, so just
remove them to silence the warnings.

Fixes: c9dc4c6578 ("Btrfs: two stage dirty block group writeout")
Fixes: 1bbc621ef2 ("Btrfs: allow block group cache writeout outside critical section in commit")
CC: stable@vger.kernel.org # 5.4+
Link: https://github.com/ClangBuiltLinux/linux/issues/1614
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20 09:23:10 +02:00
Greg Kroah-Hartman
95f4203fc9 This is the 5.10.110 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmJQLWwACgkQONu9yGCS
 aT4R2BAAr/cGnf2/BQ6+zNPW+LlfGn75803yd+oWNL8WzjNiQGrTsQavE1jL0LXP
 45iPxvY6eOlP9oEoJGYyNYhzQfUM92Unysa/KemB/xUBsb2If0ZdWk1WB9Lnw0xq
 m65kACXovbcg4LsZGpgCv7ln1ykogo+bNMES9P6CLxwKR/DMKUeJxbRNKE/AkD5l
 DxF7IJEP+YRbKAtoLM2Xj4KdjVSfRIfs+Pf0A1t43GqAw6tt3beqmzeCwDzuzz5a
 DHpXS6PeJjTZOjz4LkuBSbyK5cKGFv1C6o7JVjWSZhDyI5E4OLdNDpNKqcjsXAN+
 wMqS1eh4gYUBXmPE44BGwkkugPyaR0/KHUebfkFZG2/H/8DfvrGqlbvsGSFNXxsV
 jH2/AV/rOxAFeM/U0c1I4Ve42MU18kdf1MRBo0Dq5xSoN9HFQhNp+HE5jpppgsvi
 FYpMqZoQzH31GIjOq7g0zLdj4NTBrkO9dh7kbpH0Xay1yBmigvD2PA4qpsL1+VMI
 v73Iq/RJVGUJFAeiYFjn9IGs9EsiKNG08v9uoKS+1m1VLrpVdgwtzo+RjJ/E51Mt
 Nk4WK94MyoivkRFKulDasv9yBWdcZCfljc91271UCKCERlyO/bmsTqhffeATGGRh
 N/7oxa71BHvxp0VYqvKD6xFUs+jFt9DQmIX7Pl1/yLpaz+sN0no=
 =31mv
 -----END PGP SIGNATURE-----

Merge 5.10.110 into android12-5.10-lts

Changes in 5.10.110
	swiotlb: fix info leak with DMA_FROM_DEVICE
	USB: serial: pl2303: add IBM device IDs
	USB: serial: simple: add Nokia phone driver
	hv: utils: add PTP_1588_CLOCK to Kconfig to fix build
	netdevice: add the case if dev is NULL
	HID: logitech-dj: add new lightspeed receiver id
	xfrm: fix tunnel model fragmentation behavior
	ARM: mstar: Select HAVE_ARM_ARCH_TIMER
	virtio_console: break out of buf poll on remove
	vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
	tools/virtio: fix virtio_test execution
	ethernet: sun: Free the coherent when failing in probing
	gpio: Revert regression in sysfs-gpio (gpiolib.c)
	spi: Fix invalid sgs value
	net:mcf8390: Use platform_get_irq() to get the interrupt
	Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"
	spi: Fix erroneous sgs value with min_t()
	Input: zinitix - do not report shadow fingers
	af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
	net: dsa: microchip: add spi_device_id tables
	locking/lockdep: Avoid potential access of invalid memory in lock_class
	iommu/iova: Improve 32-bit free space estimate
	tpm: fix reference counting for struct tpm_chip
	virtio-blk: Use blk_validate_block_size() to validate block size
	USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
	xhci: fix garbage USBSTS being logged in some cases
	xhci: fix runtime PM imbalance in USB2 resume
	xhci: make xhci_handshake timeout for xhci_reset() adjustable
	xhci: fix uninitialized string returned by xhci_decode_ctrl_ctx()
	mei: me: add Alder Lake N device id.
	mei: avoid iterator usage outside of list_for_each_entry
	coresight: Fix TRCCONFIGR.QE sysfs interface
	iio: afe: rescale: use s64 for temporary scale calculations
	iio: inkern: apply consumer scale on IIO_VAL_INT cases
	iio: inkern: apply consumer scale when no channel scale is available
	iio: inkern: make a best effort on offset calculation
	greybus: svc: fix an error handling bug in gb_svc_hello()
	clk: uniphier: Fix fixed-rate initialization
	ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
	KEYS: fix length validation in keyctl_pkey_params_get_2()
	Documentation: add link to stable release candidate tree
	Documentation: update stable tree link
	firmware: stratix10-svc: add missing callback parameter on RSU
	HID: intel-ish-hid: Use dma_alloc_coherent for firmware update
	SUNRPC: avoid race between mod_timer() and del_timer_sync()
	NFSD: prevent underflow in nfssvc_decode_writeargs()
	NFSD: prevent integer overflow on 32 bit systems
	f2fs: fix to unlock page correctly in error path of is_alive()
	f2fs: quota: fix loop condition at f2fs_quota_sync()
	f2fs: fix to do sanity check on .cp_pack_total_block_count
	remoteproc: Fix count check in rproc_coredump_write()
	pinctrl: samsung: drop pin banks references on error paths
	spi: mxic: Fix the transmit path
	mtd: rawnand: protect access to rawnand devices while in suspend
	can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
	jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
	jffs2: fix memory leak in jffs2_do_mount_fs
	jffs2: fix memory leak in jffs2_scan_medium
	mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
	mm: invalidate hwpoison page cache page in fault path
	mempolicy: mbind_range() set_policy() after vma_merge()
	scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
	qed: display VF trust config
	qed: validate and restrict untrusted VFs vlan promisc mode
	riscv: Fix fill_callchain return value
	riscv: Increase stack size under KASAN
	Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
	cifs: prevent bad output lengths in smb2_ioctl_query_info()
	cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
	ALSA: cs4236: fix an incorrect NULL check on list iterator
	ALSA: hda: Avoid unsol event during RPM suspending
	ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock
	ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
	mm: madvise: skip unmapped vma holes passed to process_madvise
	mm: madvise: return correct bytes advised with process_madvise
	Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
	mm,hwpoison: unmap poisoned page before invalidation
	mm/kmemleak: reset tag when compare object pointer
	dm integrity: set journal entry unused when shrinking device
	drbd: fix potential silent data corruption
	can: isotp: sanitize CAN ID checks in isotp_bind()
	powerpc/kvm: Fix kvm_use_magic_page
	udp: call udp_encap_enable for v6 sockets when enabling encap
	arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
	arm64: dts: ti: k3-am65: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j721e: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j7200: Fix gic-v3 compatible regs
	ACPI: properties: Consistently return -ENOENT if there are no more references
	coredump: Also dump first pages of non-executable ELF libraries
	ext4: fix ext4_fc_stats trace point
	ext4: fix fs corruption when tring to remove a non-empty directory with IO error
	drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
	mailbox: tegra-hsp: Flush whole channel
	block: limit request dispatch loop duration
	block: don't merge across cgroup boundaries if blkcg is enabled
	drm/edid: check basic audio support on CEA extension block
	video: fbdev: sm712fb: Fix crash in smtcfb_read()
	video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
	ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
	ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5420
	mgag200 fix memmapsl configuration in GCTL6 register
	carl9170: fix missing bit-wise or operator for tx_params
	pstore: Don't use semaphores in always-atomic-context code
	thermal: int340x: Increase bitmap size
	lib/raid6/test: fix multiple definition linking error
	exec: Force single empty string when argv is empty
	crypto: rsa-pkcs1pad - only allow with rsa
	crypto: rsa-pkcs1pad - correctly get hash from source scatterlist
	crypto: rsa-pkcs1pad - restore signature length check
	crypto: rsa-pkcs1pad - fix buffer overread in pkcs1pad_verify_complete()
	bcache: fixup multiple threads crash
	DEC: Limit PMAX memory probing to R3k systems
	media: gpio-ir-tx: fix transmit with long spaces on Orange Pi PC
	media: davinci: vpif: fix unbalanced runtime PM get
	media: davinci: vpif: fix unbalanced runtime PM enable
	xtensa: fix stop_machine_cpuslocked call in patch_text
	xtensa: fix xtensa_wsr always writing 0
	brcmfmac: firmware: Allocate space for default boardrev in nvram
	brcmfmac: pcie: Release firmwares in the brcmf_pcie_setup error path
	brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
	brcmfmac: pcie: Fix crashes due to early IRQs
	drm/i915/opregion: check port number bounds for SWSCI display power state
	drm/i915/gem: add missing boundary check in vm_access
	PCI: pciehp: Clear cmd_busy bit in polling mode
	PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
	regulator: qcom_smd: fix for_each_child.cocci warnings
	selinux: check return value of sel_make_avc_files
	hwrng: cavium - Check health status while reading random data
	hwrng: cavium - HW_RANDOM_CAVIUM should depend on ARCH_THUNDER
	crypto: sun8i-ss - really disable hash on A80
	crypto: authenc - Fix sleep in atomic context in decrypt_tail
	crypto: mxs-dcp - Fix scatterlist processing
	thermal: int340x: Check for NULL after calling kmemdup()
	spi: tegra114: Add missing IRQ check in tegra_spi_probe
	arm64/mm: avoid fixmap race condition when create pud mapping
	selftests/x86: Add validity check and allow field splitting
	crypto: rockchip - ECB does not need IV
	audit: log AUDIT_TIME_* records only from rules
	EVM: fix the evm= __setup handler return value
	crypto: ccree - don't attempt 0 len DMA mappings
	spi: pxa2xx-pci: Balance reference count for PCI DMA device
	hwmon: (pmbus) Add mutex to regulator ops
	hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
	nvme: cleanup __nvme_check_ids
	block: don't delete queue kobject before its children
	PM: hibernate: fix __setup handler error handling
	PM: suspend: fix return value of __setup handler
	spi: spi-zynqmp-gqspi: Handle error for dma_set_mask
	hwrng: atmel - disable trng on failure path
	crypto: sun8i-ss - call finalize with bh disabled
	crypto: sun8i-ce - call finalize with bh disabled
	crypto: amlogic - call finalize with bh disabled
	crypto: vmx - add missing dependencies
	clocksource/drivers/timer-ti-dm: Fix regression from errata i940 fix
	clocksource/drivers/exynos_mct: Refactor resources allocation
	clocksource/drivers/exynos_mct: Handle DTS with higher number of interrupts
	clocksource/drivers/timer-microchip-pit64b: Use notrace
	clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init()
	ACPI: APEI: fix return value of __setup handlers
	crypto: ccp - ccp_dmaengine_unregister release dma channels
	crypto: ccree - Fix use after free in cc_cipher_exit()
	vfio: platform: simplify device removal
	amba: Make the remove callback return void
	hwrng: nomadik - Change clk_disable to clk_disable_unprepare
	hwmon: (pmbus) Add Vin unit off handling
	clocksource: acpi_pm: fix return value of __setup handler
	io_uring: terminate manual loop iterator loop correctly for non-vecs
	watch_queue: Fix NULL dereference in error cleanup
	watch_queue: Actually free the watch
	f2fs: fix to enable ATGC correctly via gc_idle sysfs interface
	sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
	sched/core: Export pelt_thermal_tp
	rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs()
	rseq: Remove broken uapi field layout on 32-bit little endian
	perf/core: Fix address filter parser for multiple filters
	perf/x86/intel/pt: Fix address filter config for 32-bit kernel
	f2fs: fix missing free nid in f2fs_handle_failed_inode
	nfsd: more robust allocation failure handling in nfsd_file_cache_init
	f2fs: fix to avoid potential deadlock
	btrfs: fix unexpected error path when reflinking an inline extent
	f2fs: compress: remove unneeded read when rewrite whole cluster
	f2fs: fix compressed file start atomic write may cause data corruption
	selftests, x86: fix how check_cc.sh is being invoked
	kunit: make kunit_test_timeout compatible with comment
	media: staging: media: zoran: fix usage of vb2_dma_contig_set_max_seg_size
	media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across ioctls
	media: mtk-vcodec: potential dereference of null pointer
	media: bttv: fix WARNING regression on tunerless devices
	ASoC: xilinx: xlnx_formatter_pcm: Handle sysclk setting
	ASoC: generic: simple-card-utils: remove useless assignment
	media: coda: Fix missing put_device() call in coda_get_vdoa_data
	media: meson: vdec: potential dereference of null pointer
	media: hantro: Fix overfill bottom register field name
	media: aspeed: Correct value for h-total-pixels
	video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen
	video: fbdev: controlfb: Fix set but not used warnings
	video: fbdev: controlfb: Fix COMPILE_TEST build
	video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
	video: fbdev: atmel_lcdfb: fix an error code in atmel_lcdfb_probe()
	video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
	firmware: qcom: scm: Remove reassignment to desc following initializer
	ARM: dts: qcom: ipq4019: fix sleep clock
	soc: qcom: rpmpd: Check for null return of devm_kcalloc
	soc: qcom: ocmem: Fix missing put_device() call in of_get_ocmem
	soc: qcom: aoss: remove spurious IRQF_ONESHOT flags
	arm64: dts: qcom: sdm845: fix microphone bias properties and values
	arm64: dts: qcom: sm8150: Correct TCS configuration for apps rsc
	firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not defined
	soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
	ARM: dts: sun8i: v3s: Move the csi1 block to follow address order
	ARM: dts: imx: Add missing LVDS decoder on M53Menlo
	media: video/hdmi: handle short reads of hdmi info frame.
	media: em28xx: initialize refcount before kref_get
	media: usb: go7007: s2250-board: fix leak in probe()
	media: cedrus: H265: Fix neighbour info buffer size
	media: cedrus: h264: Fix neighbour info buffer size
	ASoC: codecs: wcd934x: fix return value of wcd934x_rx_hph_mode_put
	uaccess: fix nios2 and microblaze get_user_8()
	ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp()
	ASoC: ti: davinci-i2s: Add check for clk_enable()
	ALSA: spi: Add check for clk_enable()
	arm64: dts: ns2: Fix spi-cpol and spi-cpha property
	arm64: dts: broadcom: Fix sata nodename
	printk: fix return value of printk.devkmsg __setup handler
	ASoC: mxs-saif: Handle errors for clk_enable
	ASoC: atmel_ssc_dai: Handle errors for clk_enable
	ASoC: dwc-i2s: Handle errors for clk_enable
	ASoC: soc-compress: prevent the potentially use of null pointer
	memory: emif: Add check for setup_interrupts
	memory: emif: check the pointer temp in get_device_details()
	ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
	arm64: dts: rockchip: Fix SDIO regulator supply properties on rk3399-firefly
	m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined
	media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
	media: vidtv: Check for null return of vzalloc
	ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
	ASoC: wm8350: Handle error for wm8350_register_irq
	ASoC: fsi: Add check for clk_enable
	video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
	media: saa7134: convert list_for_each to entry variant
	media: saa7134: fix incorrect use to determine if list is empty
	ivtv: fix incorrect device_caps for ivtvfb
	ASoC: rockchip: i2s: Use devm_platform_get_and_ioremap_resource()
	ASoC: rockchip: i2s: Fix missing clk_disable_unprepare() in rockchip_i2s_probe
	ASoC: SOF: Add missing of_node_put() in imx8m_probe
	ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
	ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
	ASoC: fsl_spdif: Disable TX clock when stop
	ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
	ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
	mmc: davinci_mmc: Handle error for clk_enable
	ASoC: atmel: sam9x5_wm8731: use devm_snd_soc_register_card()
	ASoC: atmel: Fix error handling in sam9x5_wm8731_driver_probe
	ASoC: msm8916-wcd-analog: Fix error handling in pm8916_wcd_analog_spmi_probe
	ASoC: codecs: wcd934x: Add missing of_node_put() in wcd934x_codec_parse_data
	ARM: configs: multi_v5_defconfig: re-enable CONFIG_V4L_PLATFORM_DRIVERS
	drm/meson: osd_afbcd: Add an exit callback to struct meson_afbcd_ops
	drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev
	drm/bridge: Add missing pm_runtime_disable() in __dw_mipi_dsi_probe
	drm/bridge: nwl-dsi: Fix PM disable depth imbalance in nwl_dsi_probe
	drm: bridge: adv7511: Fix ADV7535 HPD enablement
	ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern
	drm/panfrost: Check for error num after setting mask
	libbpf: Fix possible NULL pointer dereference when destroying skeleton
	udmabuf: validate ubuf->pagecount
	Bluetooth: hci_serdev: call init_rwsem() before p->open()
	mtd: onenand: Check for error irq
	mtd: rawnand: gpmi: fix controller timings setting
	drm/edid: Don't clear formats if using deep color
	ionic: fix type complaint in ionic_dev_cmd_clean()
	drm/nouveau/acr: Fix undefined behavior in nvkm_acr_hsfw_load_bl()
	drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes()
	drm/amd/pm: return -ENOTSUPP if there is no get_dpm_ultimate_freq function
	ath9k_htc: fix uninit value bugs
	RDMA/core: Set MR type in ib_reg_user_mr
	KVM: PPC: Fix vmx/vsx mixup in mmio emulation
	i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	i40e: respect metadata on XSK Rx to skb
	power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
	ray_cs: Check ioremap return value
	powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
	KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
	powerpc/perf: Don't use perf_hw_context for trace IMC PMU
	mt76: mt7915: use proper aid value in mt7915_mcu_wtbl_generic_tlv in sta mode
	mt76: mt7915: use proper aid value in mt7915_mcu_sta_basic_tlv
	mt76: mt7603: check sta_rates pointer in mt7603_sta_rate_tbl_update
	mt76: mt7615: check sta_rates pointer in mt7615_sta_rate_tbl_update
	net: dsa: mv88e6xxx: Enable port policy support on 6097
	scripts/dtc: Call pkg-config POSIXly correct
	livepatch: Fix build failure on 32 bits processors
	PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge
	drm/bridge: dw-hdmi: use safe format when first in bridge chain
	power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
	HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
	iommu/ipmmu-vmsa: Check for error num after setting mask
	drm/amd/pm: enable pm sysfs write for one VF mode
	drm/amd/display: Add affected crtcs to atomic state for dsc mst unplug
	IB/cma: Allow XRC INI QPs to set their local ACK timeout
	dax: make sure inodes are flushed before destroy cache
	iwlwifi: Fix -EIO error code that is never returned
	iwlwifi: mvm: Fix an error code in iwl_mvm_up()
	drm/msm/dp: populate connector of struct dp_panel
	drm/msm/dpu: add DSPP blocks teardown
	drm/msm/dpu: fix dp audio condition
	dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
	scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
	scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
	scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
	scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config()
	scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req()
	scsi: pm8001: Fix NCQ NON DATA command task initialization
	scsi: pm8001: Fix NCQ NON DATA command completion handling
	scsi: pm8001: Fix abort all task initialization
	RDMA/mlx5: Fix the flow of a miss in the allocation of a cache ODP MR
	drm/amd/display: Remove vupdate_int_entry definition
	TOMOYO: fix __setup handlers return values
	ext2: correct max file size computing
	drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
	power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
	scsi: hisi_sas: Change permission of parameter prot_mask
	drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt
	bpf, arm64: Call build_prologue() first in first JIT pass
	bpf, arm64: Feed byte-offset into bpf line info
	gpu: host1x: Fix a memory leak in 'host1x_remove()'
	libbpf: Skip forward declaration when counting duplicated type names
	powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
	powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
	KVM: x86: Fix emulation in writing cr8
	KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
	hv_balloon: rate-limit "Unhandled message" warning
	i2c: xiic: Make bus names unique
	power: supply: wm8350-power: Handle error for wm8350_register_irq
	power: supply: wm8350-power: Add missing free in free_charger_irq
	IB/hfi1: Allow larger MTU without AIP
	PCI: Reduce warnings on possible RW1C corruption
	net: axienet: fix RX ring refill allocation failure handling
	mips: DEC: honor CONFIG_MIPS_FP_SUPPORT=n
	powerpc/sysdev: fix incorrect use to determine if list is empty
	mfd: mc13xxx: Add check for mc13xxx_irq_request
	libbpf: Unmap rings when umem deleted
	selftests/bpf: Make test_lwt_ip_encap more stable and faster
	platform/x86: huawei-wmi: check the return value of device_create_file()
	powerpc: 8xx: fix a return value error in mpc8xx_pic_init
	vxcan: enable local echo for sent CAN frames
	ath10k: Fix error handling in ath10k_setup_msa_resources
	mips: cdmm: Fix refcount leak in mips_cdmm_phys_base
	MIPS: RB532: fix return value of __setup handler
	MIPS: pgalloc: fix memory leak caused by pgd_free()
	mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
	RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
	bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
	bpf, sockmap: Fix more uncharged while msg has more_data
	bpf, sockmap: Fix double uncharge the mem of sk_msg
	samples/bpf, xdpsock: Fix race when running for fix duration of time
	USB: storage: ums-realtek: fix error code in rts51x_read_mem()
	can: isotp: return -EADDRNOTAVAIL when reading from unbound socket
	can: isotp: support MSG_TRUNC flag when reading from socket
	bareudp: use ipv6_mod_enabled to check if IPv6 enabled
	selftests/bpf: Fix error reporting from sock_fields programs
	Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
	Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
	ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
	af_netlink: Fix shift out of bounds in group mask calculation
	i2c: meson: Fix wrong speed use from probe
	i2c: mux: demux-pinctrl: do not deactivate a master that is not active
	selftests/bpf/test_lirc_mode2.sh: Exit with proper code
	PCI: Avoid broken MSI on SB600 USB devices
	net: bcmgenet: Use stronger register read/writes to assure ordering
	tcp: ensure PMTU updates are processed during fastopen
	openvswitch: always update flow key after nat
	tipc: fix the timer expires after interval 100ms
	mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
	mxser: fix xmit_buf leak in activate when LSR == 0xff
	pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
	fsi: aspeed: convert to devm_platform_ioremap_resource
	fsi: Aspeed: Fix a potential double free
	misc: alcor_pci: Fix an error handling path
	cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse
	soundwire: intel: fix wrong register name in intel_shim_wake
	clk: qcom: ipq8074: fix PCI-E clock oops
	iio: mma8452: Fix probe failing when an i2c_device_id is used
	staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
	pinctrl: renesas: r8a77470: Reduce size for narrow VIN1 channel
	pinctrl: renesas: checker: Fix miscalculation of number of states
	clk: qcom: ipq8074: Use floor ops for SDCC1 clock
	phy: dphy: Correct lpx parameter and its derivatives(ta_{get,go,sure})
	serial: 8250_mid: Balance reference count for PCI DMA device
	serial: 8250_lpss: Balance reference count for PCI DMA device
	NFS: Use of mapping_set_error() results in spurious errors
	serial: 8250: Fix race condition in RTS-after-send handling
	iio: adc: Add check for devm_request_threaded_irq
	habanalabs: Add check for pci_enable_device
	NFS: Return valid errors from nfs2/3_decode_dirent()
	dma-debug: fix return value of __setup handlers
	clk: imx7d: Remove audio_mclk_root_clk
	clk: at91: sama7g5: fix parents of PDMCs' GCLK
	clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
	clk: qcom: clk-rcg2: Update the frac table for pixel clock
	dmaengine: hisi_dma: fix MSI allocate fail when reload hisi_dma
	remoteproc: qcom: Fix missing of_node_put in adsp_alloc_memory_region
	remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
	remoteproc: qcom_q6v5_mss: Fix some leaks in q6v5_alloc_memory_region
	nvdimm/region: Fix default alignment for small regions
	clk: actions: Terminate clk_div_table with sentinel element
	clk: loongson1: Terminate clk_div_table with sentinel element
	clk: clps711x: Terminate clk_div_table with sentinel element
	clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
	NFS: remove unneeded check in decode_devicenotify_args()
	staging: mt7621-dts: fix LEDs and pinctrl on GB-PC1 devicetree
	staging: mt7621-dts: fix formatting
	staging: mt7621-dts: fix pinctrl properties for ethernet
	staging: mt7621-dts: fix GB-PC2 devicetree
	pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
	pinctrl: mediatek: paris: Fix PIN_CONFIG_BIAS_* readback
	pinctrl: mediatek: paris: Fix "argument" argument type for mtk_pinconf_get()
	pinctrl: mediatek: paris: Fix pingroup pin config state readback
	pinctrl: mediatek: paris: Skip custom extra pin config dump for virtual GPIOs
	pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
	pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
	tty: hvc: fix return value of __setup handler
	kgdboc: fix return value of __setup handler
	serial: 8250: fix XOFF/XON sending when DMA is used
	kgdbts: fix return value of __setup handler
	firmware: google: Properly state IOMEM dependency
	driver core: dd: fix return value of __setup handler
	jfs: fix divide error in dbNextAG
	netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
	NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
	kdb: Fix the putarea helper function
	clk: qcom: gcc-msm8994: Fix gpll4 width
	clk: Initialize orphan req_rate
	xen: fix is_xen_pmu()
	net: enetc: report software timestamping via SO_TIMESTAMPING
	net: hns3: fix bug when PF set the duplicate MAC address for VFs
	net: phy: broadcom: Fix brcm_fet_config_init()
	selftests: test_vxlan_under_vrf: Fix broken test case
	qlcnic: dcb: default to returning -EOPNOTSUPP
	net/x25: Fix null-ptr-deref caused by x25_disconnect
	NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
	net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
	fs: fd tables have to be multiples of BITS_PER_LONG
	lib/test: use after free in register_test_dev_kmod()
	fs: fix fd table size alignment properly
	LSM: general protection fault in legacy_parse_param
	regulator: rpi-panel: Handle I2C errors/timing to the Atmel
	gcc-plugins/stackleak: Exactly match strings instead of prefixes
	pinctrl: npcm: Fix broken references to chip->parent_device
	block, bfq: don't move oom_bfqq
	selinux: use correct type for context length
	selinux: allow FIOCLEX and FIONCLEX with policy capability
	loop: use sysfs_emit() in the sysfs xxx show()
	Fix incorrect type in assignment of ipv6 port for audit
	irqchip/qcom-pdc: Fix broken locking
	irqchip/nvic: Release nvic_base upon failure
	fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
	bfq: fix use-after-free in bfq_dispatch_request
	ACPICA: Avoid walking the ACPI Namespace if it is not there
	lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
	Revert "Revert "block, bfq: honor already-setup queue merges""
	ACPI/APEI: Limit printable size of BERT table data
	PM: core: keep irq flags in device_pm_check_callbacks()
	parisc: Fix handling off probe non-access faults
	nvme-tcp: lockdep: annotate in-kernel sockets
	spi: tegra20: Use of_device_get_match_data()
	locking/lockdep: Iterate lock_classes directly when reading lockdep files
	ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb
	ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit
	ext4: don't BUG if someone dirty pages without asking ext4 first
	f2fs: fix to do sanity check on curseg->alloc_type
	NFSD: Fix nfsd_breaker_owns_lease() return values
	f2fs: compress: fix to print raw data size in error path of lz4 decompression
	ntfs: add sanity check on allocation size
	media: staging: media: zoran: move videodev alloc
	media: staging: media: zoran: calculate the right buffer number for zoran_reap_stat_com
	media: staging: media: zoran: fix various V4L2 compliance errors
	media: ir_toy: free before error exiting
	video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
	video: fbdev: w100fb: Reset global state
	video: fbdev: cirrusfb: check pixclock to avoid divide by zero
	video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
	ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
	ARM: dts: bcm2837: Add the missing L1/L2 cache information
	ASoC: madera: Add dependencies on MFD
	media: atomisp_gmin_platform: Add DMI quirk to not turn AXP ELDO2 regulator off on some boards
	media: atomisp: fix dummy_ptr check to avoid duplicate active_bo
	ARM: ftrace: avoid redundant loads or clobbering IP
	ARM: dts: imx7: Use audio_mclk_post_div instead audio_mclk_root_clk
	arm64: defconfig: build imx-sdma as a module
	video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
	video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
	video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit
	ARM: dts: bcm2711: Add the missing L1/L2 cache information
	ASoC: soc-core: skip zero num_dai component in searching dai name
	media: cx88-mpeg: clear interrupt status register before streaming video
	uaccess: fix type mismatch warnings from access_ok()
	lib/test_lockup: fix kernel pointer check for separate address spaces
	ARM: tegra: tamonten: Fix I2C3 pad setting
	ARM: mmp: Fix failure to remove sram device
	video: fbdev: sm712fb: Fix crash in smtcfb_write()
	media: Revert "media: em28xx: add missing em28xx_close_extension"
	media: hdpvr: initialize dev->worker at hdpvr_register_videodev
	mmc: host: Return an error when ->enable_sdio_irq() ops is missing
	media: atomisp: fix bad usage at error handling logic
	ALSA: hda/realtek: Add alc256-samsung-headphone fixup
	KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU
	powerpc/kasan: Fix early region not updated correctly
	powerpc/lib/sstep: Fix 'sthcx' instruction
	powerpc/lib/sstep: Fix build errors with newer binutils
	powerpc: Fix build errors with newer binutils
	scsi: qla2xxx: Fix stuck session in gpdb
	scsi: qla2xxx: Fix scheduling while atomic
	scsi: qla2xxx: Fix wrong FDMI data for 64G adapter
	scsi: qla2xxx: Fix warning for missing error code
	scsi: qla2xxx: Fix device reconnect in loop topology
	scsi: qla2xxx: Add devids and conditionals for 28xx
	scsi: qla2xxx: Check for firmware dump already collected
	scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
	scsi: qla2xxx: Fix disk failure to rediscover
	scsi: qla2xxx: Fix incorrect reporting of task management failure
	scsi: qla2xxx: Fix hang due to session stuck
	scsi: qla2xxx: Fix missed DMA unmap for NVMe ls requests
	scsi: qla2xxx: Fix N2N inconsistent PLOGI
	scsi: qla2xxx: Reduce false trigger to login
	scsi: qla2xxx: Use correct feature type field during RFF_ID processing
	platform: chrome: Split trace include file
	KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
	KVM: Prevent module exit until all VMs are freed
	KVM: x86: fix sending PV IPI
	KVM: SVM: fix panic on out-of-bounds guest IRQ
	ASoC: SOF: Intel: Fix NULL ptr dereference when ENOMEM
	ubifs: rename_whiteout: Fix double free for whiteout_ui->data
	ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
	ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
	ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
	ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
	ubifs: Fix to add refcount once page is set private
	ubifs: rename_whiteout: correct old_dir size computing
	wireguard: queueing: use CFI-safe ptr_ring cleanup function
	wireguard: socket: free skb in send6 when ipv6 is disabled
	wireguard: socket: ignore v6 endpoints when ipv6 is disabled
	XArray: Fix xas_create_range() when multi-order entry present
	can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
	can: mcba_usb: properly check endpoint type
	can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
	XArray: Update the LRU list in xas_split()
	rtc: check if __rtc_read_time was successful
	gfs2: Make sure FITRIM minlen is rounded up to fs block size
	net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
	rxrpc: Fix call timer start racing with call destruction
	mailbox: imx: fix wakeup failure from freeze mode
	crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
	watch_queue: Free the page array when watch_queue is dismantled
	pinctrl: pinconf-generic: Print arguments for bias-pull-*
	watchdog: rti-wdt: Add missing pm_runtime_disable() in probe function
	pinctrl: nuvoton: npcm7xx: Rename DS() macro to DSTR()
	pinctrl: nuvoton: npcm7xx: Use %zu printk format for ARRAY_SIZE()
	ASoC: mediatek: mt6358: add missing EXPORT_SYMBOLs
	ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
	ARM: iop32x: offset IRQ numbers by 1
	io_uring: fix memory leak of uid in files registration
	riscv module: remove (NOLOAD)
	ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
	platform/chrome: cros_ec_typec: Check for EC device
	can: isotp: restore accidentally removed MSG_PEEK feature
	proc: bootconfig: Add null pointer check
	staging: mt7621-dts: fix pinctrl-0 items to be size-1 items on ethernet
	ASoC: soc-compress: Change the check for codec_dai
	batman-adv: Check ptr for NULL before reducing its refcnt
	mm/mmap: return 1 from stack_guard_gap __setup() handler
	ARM: 9187/1: JIVE: fix return value of __setup handler
	mm/memcontrol: return 1 from cgroup.memory __setup() handler
	mm/usercopy: return 1 from hardened_usercopy __setup() handler
	bpf: Adjust BPF stack helper functions to accommodate skip > 0
	bpf: Fix comment for helper bpf_current_task_under_cgroup()
	dt-bindings: mtd: nand-controller: Fix the reg property description
	dt-bindings: mtd: nand-controller: Fix a comment in the examples
	dt-bindings: spi: mxic: The interrupt property is not mandatory
	ubi: fastmap: Return error code if memory allocation fails in add_aeb()
	ASoC: topology: Allow TLV control to be either read or write
	ARM: dts: spear1340: Update serial node properties
	ARM: dts: spear13xx: Update SPI dma properties
	um: Fix uml_mconsole stop/go
	docs: sysctl/kernel: add missing bit to panic_print
	openvswitch: Fixed nd target mask field in the flow dump.
	KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
	can: m_can: m_can_tx_handler(): fix use after free of skb
	can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
	coredump: Snapshot the vmas in do_coredump
	coredump: Remove the WARN_ON in dump_vma_snapshot
	coredump/elf: Pass coredump_params into fill_note_info
	coredump: Use the vma snapshot in fill_files_note
	arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
	PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
	Linux 5.10.110

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I12fbe227793dd40c0582588e1700cf88cafd0ac6
2022-04-18 17:41:18 +02:00
Kaiwen Hu
a044bca8ef btrfs: prevent subvol with swapfile from being deleted
commit 60021bd754c6ca0addc6817994f20290a321d8d6 upstream.

A subvolume with an active swapfile must not be deleted otherwise it
would not be possible to deactivate it.

After the subvolume is deleted, we cannot swapoff the swapfile in this
deleted subvolume because the path is unreachable.  The swapfile is
still active and holding references, the filesystem cannot be unmounted.

The test looks like this:

  mkfs.btrfs -f $dev > /dev/null
  mount $dev $mnt

  btrfs sub create $mnt/subvol
  touch $mnt/subvol/swapfile
  chmod 600 $mnt/subvol/swapfile
  chattr +C $mnt/subvol/swapfile
  dd if=/dev/zero of=$mnt/subvol/swapfile bs=1K count=4096
  mkswap $mnt/subvol/swapfile
  swapon $mnt/subvol/swapfile

  btrfs sub delete $mnt/subvol
  swapoff $mnt/subvol/swapfile  # failed: No such file or directory
  swapoff --all

  unmount $mnt                  # target is busy.

To prevent above issue, we simply check that whether the subvolume
contains any active swapfile, and stop the deleting process.  This
behavior is like snapshot ioctl dealing with a swapfile.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Kaiwen Hu <kevinhu@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13 21:01:08 +02:00
Ethan Lien
82ae73ac96 btrfs: fix qgroup reserve overflow the qgroup limit
commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream.

We use extent_changeset->bytes_changed in qgroup_reserve_data() to record
how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the
bytes_changed is set as "unsigned int", and it will overflow if we try to
fallocate a range larger than 4GiB. The result is we reserve less bytes
and eventually break the qgroup limit.

Unlike regular buffered/direct write, which we use one changeset for
each ordered extent, which can never be larger than 256M.  For
fallocate, we use one changeset for the whole range, thus it no longer
respects the 256M per extent limit, and caused the problem.

The following example test script reproduces the problem:

  $ cat qgroup-overflow.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Set qgroup limit to 2GiB.
  btrfs quota enable $MNT
  btrfs qgroup limit 2G $MNT

  # Try to fallocate a 3GiB file. This should fail.
  echo
  echo "Try to fallocate a 3GiB file..."
  fallocate -l 3G $MNT/3G.file

  # Try to fallocate a 5GiB file.
  echo
  echo "Try to fallocate a 5GiB file..."
  fallocate -l 5G $MNT/5G.file

  # See we break the qgroup limit.
  echo
  sync
  btrfs qgroup show -r $MNT

  umount $MNT

When running the test:

  $ ./qgroup-overflow.sh
  (...)

  Try to fallocate a 3GiB file...
  fallocate: fallocate failed: Disk quota exceeded

  Try to fallocate a 5GiB file...

  qgroupid         rfer         excl     max_rfer
  --------         ----         ----     --------
  0/5           5.00GiB      5.00GiB      2.00GiB

Since we have no control of how bytes_changed is used, it's better to
set it to u64.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Ethan Lien <ethanlien@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13 21:01:08 +02:00
Filipe Manana
2c4741d1b0 btrfs: fix unexpected error path when reflinking an inline extent
[ Upstream commit 1f4613cdbe7739ce291554b316bff8e551383389 ]

When reflinking an inline extent, we assert that its file offset is 0 and
that its uncompressed length is not greater than the sector size. We then
return an error if one of those conditions is not satisfied. However we
use a return statement, which results in returning from btrfs_clone()
without freeing the path and buffer that were allocated before, as well as
not clearing the flag BTRFS_INODE_NO_DELALLOC_FLUSH for the destination
inode.

Fix that by jumping to the 'out' label instead, and also add a WARN_ON()
for each condition so that in case assertions are disabled, we get to
known which of the unexpected conditions triggered the error.

Fixes: a61e1e0df9 ("Btrfs: simplify inline extent handling when doing reflinks")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:40:04 +02:00
Greg Kroah-Hartman
0773736e48 This is the 5.10.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmInm+kACgkQONu9yGCS
 aT4/lw/6AoFz3oHbzlft4tCUn4UXQIut8gIiJfrXBnIqw6pa5YQPA1E7hgQBxTnG
 v9llnwDRFR7qouXq1qhoXU01vETiRqZ2ClkT/MnvLfRMQcHgtS9B61VgwnpNuNBg
 qStZORcqFi6rQHXySUgs2ObF6NZQ3BRyHY2LYqZPj0YkuHIdQo48WtQ9cy0XWZLV
 WT49LIVxkTZ6D0fs/k10qRv+M4lmeOzCqEZf4591F0sjoVuTFj3F1xMsSbu8W3xZ
 xXxE0hPbN0If3JFhnb3DqdQ20kRNSmrrV1CzYaN09jyP7KHpdIDVT8R1hSau3TFP
 3zd2fBWmOP0FPzOVkNnqetMuKspKH8p3kKW2rkTyHYcGtUFzh54Hm0QRpA3CVB3L
 JZje9HCkxWiBSl1mwypmBGp88kWOe+n3NRUOhX3yqPoT3R2n45coBV+sfSOakkxv
 K8mUw1FFbJTPjgJtMCs57zzxybInnMrAF5/7XA2MgHCr3SVvYQA7+joSPn3CO+0K
 zKO4kTdEmD9jTT+3vMDL4Z3VSmOJMVcxCHBTUrac/OBIiBz+7y9WQSc7a6aRbfdu
 k3wy7HJ98pmjYB6g73MJcXOtTwXuoTqur4QWU5MCgTw6+qglgRQHr5ILSs35ZeAV
 LO1zvAsklOWFMc/3fD8heLmKGBZ0GHcn+0Y7ZqfFKLmqOTnx7tA=
 =W+lo
 -----END PGP SIGNATURE-----

Merge 5.10.104 into android12-5.10-lts

Changes in 5.10.104
	mac80211_hwsim: report NOACK frames in tx_status
	mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
	i2c: bcm2835: Avoid clock stretching timeouts
	ASoC: rt5668: do not block workqueue if card is unbound
	ASoC: rt5682: do not block workqueue if card is unbound
	regulator: core: fix false positive in regulator_late_cleanup()
	Input: clear BTN_RIGHT/MIDDLE on buttonpads
	KVM: arm64: vgic: Read HW interrupt pending state from the HW
	tipc: fix a bit overflow in tipc_crypto_key_rcv()
	cifs: fix double free race when mount fails in cifs_get_root()
	selftests/seccomp: Fix seccomp failure by adding missing headers
	dmaengine: shdma: Fix runtime PM imbalance on error
	i2c: cadence: allow COMPILE_TEST
	i2c: qup: allow COMPILE_TEST
	net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
	usb: gadget: don't release an existing dev->buf
	usb: gadget: clear related members when goto fail
	exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
	exfat: fix i_blocks for files truncated over 4 GiB
	tracing: Add test for user space strings when filtering on string pointers
	serial: stm32: prevent TDR register overwrite when sending x_char
	ata: pata_hpt37x: fix PCI clock detection
	drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
	tracing: Add ustring operation to filtering string pointers
	ALSA: intel_hdmi: Fix reference to PCM buffer address
	riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
	riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
	riscv: Fix config KASAN && DEBUG_VIRTUAL
	ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
	iommu/amd: Recover from event log overflow
	drm/i915: s/JSP2/ICP2/ PCH
	xen/netfront: destroy queues before real_num_tx_queues is zeroed
	thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
	ntb: intel: fix port config status offset for SPR
	mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
	xfrm: fix MTU regression
	netfilter: fix use-after-free in __nf_register_net_hook()
	bpf, sockmap: Do not ignore orig_len parameter
	xfrm: fix the if_id check in changelink
	xfrm: enforce validity of offload input flags
	e1000e: Correct NVM checksum verification flow
	net: fix up skbs delta_truesize in UDP GRO frag_list
	netfilter: nf_queue: don't assume sk is full socket
	netfilter: nf_queue: fix possible use-after-free
	netfilter: nf_queue: handle socket prefetch
	batman-adv: Request iflink once in batadv-on-batadv check
	batman-adv: Request iflink once in batadv_get_real_netdevice
	batman-adv: Don't expect inter-netns unique iflink indices
	net: ipv6: ensure we call ipv6_mc_down() at most once
	net: dcb: flush lingering app table entries for unregistered devices
	net/smc: fix connection leak
	net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
	net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
	rcu/nocb: Fix missed nocb_timer requeue
	ice: Fix race conditions between virtchnl handling and VF ndo ops
	ice: fix concurrent reset and removal of VFs
	sched/topology: Make sched_init_numa() use a set for the deduplicating sort
	sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
	ia64: ensure proper NUMA distance and possible map initialization
	mac80211: fix forwarded mesh frames AC & queue selection
	net: stmmac: fix return value of __setup handler
	mac80211: treat some SAE auth steps as final
	iavf: Fix missing check for running netdev
	net: sxgbe: fix return value of __setup handler
	ibmvnic: register netdev after init of adapter
	net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
	ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
	efivars: Respect "block" flag in efivar_entry_set_safe()
	firmware: arm_scmi: Remove space in MODULE_ALIAS name
	ASoC: cs4265: Fix the duplicated control name
	can: gs_usb: change active_channels's type from atomic_t to u8
	arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
	igc: igc_read_phy_reg_gpy: drop premature return
	ARM: Fix kgdb breakpoint for Thumb2
	ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
	selftests: mlxsw: tc_police_scale: Make test more robust
	pinctrl: sunxi: Use unique lockdep classes for IRQs
	igc: igc_write_phy_reg_gpy: drop premature return
	ibmvnic: free reset-work-item when flushing
	memfd: fix F_SEAL_WRITE after shmem huge page allocated
	s390/extable: fix exception table sorting
	ARM: dts: switch timer config to common devkit8000 devicetree
	ARM: dts: Use 32KiHz oscillator on devkit8000
	soc: fsl: guts: Revert commit 3c0d64e867
	soc: fsl: guts: Add a missing memory allocation failure check
	soc: fsl: qe: Check of ioremap return value
	ARM: tegra: Move panels to AUX bus
	ibmvnic: complete init_done on transport events
	net: chelsio: cxgb3: check the return value of pci_find_capability()
	iavf: Refactor iavf state machine tracking
	nl80211: Handle nla_memdup failures in handle_nan_filter
	drm/amdgpu: fix suspend/resume hang regression
	net: dcb: disable softirqs in dcbnl_flush_dev()
	Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
	Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
	Input: samsung-keypad - properly state IOMEM dependency
	HID: add mapping for KEY_DICTATE
	HID: add mapping for KEY_ALL_APPLICATIONS
	tracing/histogram: Fix sorting on old "cpu" value
	tracing: Fix return value of __setup handlers
	btrfs: fix lost prealloc extents beyond eof after full fsync
	btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
	btrfs: add missing run of delayed items after unlink during log replay
	Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
	hamradio: fix macro redefine warning
	Linux 5.10.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I24dabeba483a0b0123a4e8c10d1a568b11dfb9c8
2022-03-12 13:57:09 +01:00
Filipe Manana
292e1c88b8 btrfs: add missing run of delayed items after unlink during log replay
commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream.

During log replay, whenever we need to check if a name (dentry) exists in
a directory we do searches on the subvolume tree for inode references or
or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
keys as well, before kernel 5.17). However when during log replay we
unlink a name, through btrfs_unlink_inode(), we may not delete inode
references and dir index keys from a subvolume tree and instead just add
the deletions to the delayed inode's delayed items, which will only be
run when we commit the transaction used for log replay. This means that
after an unlink operation during log replay, if we attempt to search for
the same name during log replay, we will not see that the name was already
deleted, since the deletion is recorded only on the delayed items.

We run delayed items after every unlink operation during log replay,
except at unlink_old_inode_refs() and at add_inode_ref(). This was due
to an overlook, as delayed items should be run after evert unlink, for
the reasons stated above.

So fix those two cases.

Fixes: 0d836392ca ("Btrfs: fix mount failure after fsync due to hard link recreation")
Fixes: 1f250e929a ("Btrfs: fix log replay failure after unlink and link combination")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:38 +01:00
Sidong Yang
41712c5fa5 btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream.

The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and
qgroup rescan worker") by Kawasaki resolves deadlock between quota
disable and qgroup rescan worker. But also there is a deadlock case like
it. It's about enabling or disabling quota and creating or removing
qgroup. It can be reproduced in simple script below.

for i in {1..100}
do
    btrfs quota enable /mnt &
    btrfs qgroup create 1/0 /mnt &
    btrfs qgroup destroy 1/0 /mnt &
    btrfs quota disable /mnt &
done

Here's why the deadlock happens:

1) The quota rescan task is running.

2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
   mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
   the quota rescan task to complete.

3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
   the qgroup_ioctl_lock mutex, because it's being held by task A. At that
   point task B is holding a transaction handle for the current transaction.

4) The quota rescan task calls btrfs_commit_transaction(). This results
   in it waiting for all other tasks to release their handles on the
   transaction, but task B is blocked on the qgroup_ioctl_lock mutex
   while holding a handle on the transaction, and that mutex is being held
   by task A, which is waiting for the quota rescan task to complete,
   resulting in a deadlock between these 3 tasks.

To resolve this issue, the thread disabling quota should unlock
qgroup_ioctl_lock before waiting rescan completion. Move
btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.

Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:38 +01:00
Filipe Manana
6e0319e770 btrfs: fix lost prealloc extents beyond eof after full fsync
commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream.

When doing a full fsync, if we have prealloc extents beyond (or at) eof,
and the leaves that contain them were not modified in the current
transaction, we end up not logging them. This results in losing those
extents when we replay the log after a power failure, since the inode is
truncated to the current value of the logged i_size.

Just like for the fast fsync path, we need to always log all prealloc
extents starting at or beyond i_size. The fast fsync case was fixed in
commit 471d557afe ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay") but it missed the full fsync path. The problem
exists since the very early days, when the log tree was added by
commit e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize
synchronous operations").

Example reproducer:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  # Create our test file with many file extent items, so that they span
  # several leaves of metadata, even if the node/page size is 64K. Use
  # direct IO and not fsync/O_SYNC because it's both faster and it avoids
  # clearing the full sync flag from the inode - we want the fsync below
  # to trigger the slow full sync code path.
  $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo

  # Now add two preallocated extents to our file without extending the
  # file's size. One right at i_size, and another further beyond, leaving
  # a gap between the two prealloc extents.
  $ xfs_io -c "falloc -k 16M 1M" /mnt/foo
  $ xfs_io -c "falloc -k 20M 1M" /mnt/foo

  # Make sure everything is durably persisted and the transaction is
  # committed. This makes all created extents to have a generation lower
  # than the generation of the transaction used by the next write and
  # fsync.
  sync

  # Now overwrite only the first extent, which will result in modifying
  # only the first leaf of metadata for our inode. Then fsync it. This
  # fsync will use the slow code path (inode full sync bit is set) because
  # it's the first fsync since the inode was created/loaded.
  $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo

  # Extent list before power failure.
  $ xfs_io -c "fiemap -v" /mnt/foo
  /mnt/foo:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..7]:          2178048..2178055     8   0x0
     1: [8..16383]:      26632..43007     16376   0x0
     2: [16384..32767]:  2156544..2172927 16384   0x0
     3: [32768..34815]:  2172928..2174975  2048 0x800
     4: [34816..40959]:  hole              6144
     5: [40960..43007]:  2174976..2177023  2048 0x801

  <power fail>

  # Mount fs again, trigger log replay.
  $ mount /dev/sdc /mnt

  # Extent list after power failure and log replay.
  $ xfs_io -c "fiemap -v" /mnt/foo
  /mnt/foo:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..7]:          2178048..2178055     8   0x0
     1: [8..16383]:      26632..43007     16376   0x0
     2: [16384..32767]:  2156544..2172927 16384   0x1

  # The prealloc extents at file offsets 16M and 20M are missing.

So fix this by calling btrfs_log_prealloc_extents() when we are doing a
full fsync, so that we always log all prealloc extents beyond eof.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:38 +01:00
Greg Kroah-Hartman
d172937367 This is the 5.10.103 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmIfSjYACgkQONu9yGCS
 aT5uuQ/9GcUx5ur2RT/8hqUmkDZU3PV2KvPbLY9BYfn5i/An/WS7KNlcXMatEd7n
 E8G+Sh3hLlb8h+3B9EGYvumssERJSxAaecxhha6NU8dSsUdpKzLvjwfy/5L+giJP
 rR6q4yhQaWqt0k7lSdohosIbTuDrr78Q9ifGPRpa2SIDEUDO4R+O/l9XqsCXXDMA
 qa5MIC3vZwU6jbwcfkS1cc/kdB5aLT1DRCXW7Ca7YYMFTba3eV8FWr5pC82cjgsS
 uIZ34yhMLQm70IDNqZRsMtj7JvpwHGAWsOTDd4HoI+4MdyyrgadSPDRRPzcytStZ
 TEllgey/6U+i6Et1wPIpTb/FOOQ3S7uvVBeSdPnDpuv7/BOD75fFg7lOoE9OGcLo
 14bmQGUc0FqOUxmtqYu/LmTOc4o/l0S1DCMhn4JquGjqCQa8R7aWVzYjHlG9wF6v
 ZI04pB+5hec8vElICFUAdaJe6cR5ttkFq4UEkmkXLeSs1RKtAlE5VZkEU9dTykNn
 IY9KYbXNwe492xVCaIZUO2pHpu07tuJ2YLqZypktQt7ndPIjRTeHt3QnFQIVMcug
 MyAtDtZaqHQ459xF9caMHnThxsei7t6YWoPGK8/04ngpZS61ORuNIxigKmK/B62H
 hmoJolC007zXYeisDgSkhFk4TaDbVpxne9cruVd50mLOZlvDUZY=
 =8KvB
 -----END PGP SIGNATURE-----

Merge 5.10.103 into android12-5.10-lts

Changes in 5.10.103
	cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
	btrfs: tree-checker: check item_size for inode_item
	btrfs: tree-checker: check item_size for dev_item
	clk: jz4725b: fix mmc0 clock gating
	vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
	parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
	parisc/unaligned: Fix ldw() and stw() unalignment handlers
	KVM: x86/mmu: make apf token non-zero to fix bug
	drm/amdgpu: disable MMHUB PG for Picasso
	drm/i915: Correctly populate use_sagv_wm for all pipes
	sr9700: sanity check for packet length
	USB: zaurus: support another broken Zaurus
	CDC-NCM: avoid overflow in sanity checking
	netfilter: nf_tables_offload: incorrect flow offload action array size
	x86/fpu: Correct pkru/xstate inconsistency
	tee: export teedev_open() and teedev_close_context()
	optee: use driver internal tee_context for some rpc
	ping: remove pr_err from ping_lookup
	perf data: Fix double free in perf_session__delete()
	bnx2x: fix driver load from initrd
	bnxt_en: Fix active FEC reporting to ethtool
	hwmon: Handle failure to register sensor with thermal zone correctly
	bpf: Do not try bpf_msg_push_data with len 0
	selftests: bpf: Check bpf_msg_push_data return value
	bpf: Add schedule points in batch ops
	io_uring: add a schedule point in io_add_buffers()
	net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
	tipc: Fix end of loop tests for list_for_each_entry()
	gso: do not skip outer ip header in case of ipip and net_failover
	openvswitch: Fix setting ipv6 fields causing hw csum failure
	drm/edid: Always set RGB444
	net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
	net/sched: act_ct: Fix flow table lookup after ct clear or switching zones
	net: ll_temac: check the return value of devm_kmalloc()
	net: Force inlining of checksum functions in net/checksum.h
	nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac()
	netfilter: nf_tables: fix memory leak during stateful obj update
	net/smc: Use a mutex for locking "struct smc_pnettable"
	surface: surface3_power: Fix battery readings on batteries without a serial number
	udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()
	net/mlx5: Fix possible deadlock on rule deletion
	net/mlx5: Fix wrong limitation of metadata match on ecpf
	net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
	spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op()
	regmap-irq: Update interrupt clear register for proper reset
	RDMA/rtrs-clt: Fix possible double free in error case
	RDMA/rtrs-clt: Kill wait_for_inflight_permits
	RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close
	configfs: fix a race in configfs_{,un}register_subsystem()
	RDMA/ib_srp: Fix a deadlock
	tracing: Have traceon and traceoff trigger honor the instance
	iio: adc: men_z188_adc: Fix a resource leak in an error handling path
	iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
	iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
	iio: Fix error handling for PM
	sc16is7xx: Fix for incorrect data being transmitted
	ata: pata_hpt37x: disable primary channel on HPT371
	Revert "USB: serial: ch341: add new Product ID for CH341A"
	usb: gadget: rndis: add spinlock for rndis response list
	USB: gadget: validate endpoint index for xilinx udc
	tracefs: Set the group ownership in apply_options() not parse_options()
	USB: serial: option: add support for DW5829e
	USB: serial: option: add Telit LE910R1 compositions
	usb: dwc2: drd: fix soft connect when gadget is unconfigured
	usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
	usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
	xhci: re-initialize the HC during resume if HCE was set
	xhci: Prevent futile URB re-submissions due to incorrect return value.
	driver core: Free DMA range map when device is released
	RDMA/cma: Do not change route.addr.src_addr outside state checks
	thermal: int340x: fix memory leak in int3400_notify()
	riscv: fix oops caused by irqsoff latency tracer
	tty: n_gsm: fix encoding of control signal octet bit DV
	tty: n_gsm: fix proper link termination after failed open
	tty: n_gsm: fix NULL pointer access due to DLCI release
	tty: n_gsm: fix wrong tty control line for flow control
	tty: n_gsm: fix deadlock in gsmtty_open()
	gpio: tegra186: Fix chip_data type confusion
	memblock: use kfree() to release kmalloced memblock regions
	Linux 5.10.103

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6b1b827ba2740b36680033a44f04d62b4e5565ab
2022-03-02 15:41:32 +01:00
Su Yue
72a5b01875 btrfs: tree-checker: check item_size for dev_item
commit ea1d1ca4025ac6c075709f549f9aa036b5b6597d upstream.

Check item size before accessing the device item to avoid out of bound
access, similar to inode_item check.

Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:42:45 +01:00
Su Yue
5c967dd073 btrfs: tree-checker: check item_size for inode_item
commit 0c982944af27d131d3b74242f3528169f66950ad upstream.

while mounting the crafted image, out-of-bounds access happens:

  [350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
  [350.429636] index 1048096 is out of range for type 'page *[16]'
  [350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1
  [350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
  [350.429772] Call Trace:
  [350.429774]  <TASK>
  [350.429776]  dump_stack_lvl+0x47/0x5c
  [350.429780]  ubsan_epilogue+0x5/0x50
  [350.429786]  __ubsan_handle_out_of_bounds+0x66/0x70
  [350.429791]  btrfs_get_16+0xfd/0x120 [btrfs]
  [350.429832]  check_leaf+0x754/0x1a40 [btrfs]
  [350.429874]  ? filemap_read+0x34a/0x390
  [350.429878]  ? load_balance+0x175/0xfc0
  [350.429881]  validate_extent_buffer+0x244/0x310 [btrfs]
  [350.429911]  btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
  [350.429935]  end_bio_extent_readpage+0x3af/0x850 [btrfs]
  [350.429969]  ? newidle_balance+0x259/0x480
  [350.429972]  end_workqueue_fn+0x29/0x40 [btrfs]
  [350.429995]  btrfs_work_helper+0x71/0x330 [btrfs]
  [350.430030]  ? __schedule+0x2fb/0xa40
  [350.430033]  process_one_work+0x1f6/0x400
  [350.430035]  ? process_one_work+0x400/0x400
  [350.430036]  worker_thread+0x2d/0x3d0
  [350.430037]  ? process_one_work+0x400/0x400
  [350.430038]  kthread+0x165/0x190
  [350.430041]  ? set_kthread_struct+0x40/0x40
  [350.430043]  ret_from_fork+0x1f/0x30
  [350.430047]  </TASK>
  [350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2

check_leaf() is checking the leaf:

  corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912)
  leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE
  leaf 29396992 flags 0x1(WRITTEN) backref revision 1
  fs uuid a62e00e8-e94e-4200-8217-12444de93c2e
  chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
	  item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40
		  generation 0 transid 0 size 0 nbytes 17592186044416
		  block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136
		  sequence 100305 flags 0x2409000(none)
		  atime 0.0 (1970-01-01 08:00:00)
		  ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03)
		  mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00)
		  otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58)
	  item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48
		  dev extent chunk_tree 3
		  chunk_objectid 256 chunk_offset 12582912 length 8388608
		  chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8

The corrupted leaf of device tree has an inode item. The leaf passed
checksum and others checks in validate_extent_buffer until check_leaf_item().
Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we
are in the device tree. Since the
item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access
is triggered.

The item end vs leaf boundary check has been done before
check_leaf_item(), so fix it by checking item size in check_inode_item()
before access of the inode item in extent buffer.

Other check functions except check_dev_item() in check_leaf_item()
have their item size checks.
The commit for check_dev_item() is followed.

No regression observed during running fstests.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
CC: stable@vger.kernel.org # 5.10+
CC: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:42:45 +01:00
Greg Kroah-Hartman
79553fad5c This is the 5.10.102 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmIWE/gACgkQONu9yGCS
 aT74Cw//awQY9EgXHMRNNU5+qun0ESkj918YvfAIKGDRutaOXK00YUdkqjuxEP6V
 gOso0rbXe1LiravgrbPSaYbgSvhGVPOPXHLhb7QQkiodxMZ0oaEEzST2VzM4a19k
 xfhne2BeqCCl4oUUA84NZPcCpohaLhRjZesMWR23zWNbVMEIt7uV8eeqRcC0S4YA
 r7cA4PURBMUiTlWWssBwlrEVy5JZrlcTYCvljxcpd8ws1Xs0KnmxX4gIDpQzNoFj
 Dfwg3lSFcl+ar/Xrg14qL3q3iO8zpaJ2msguZ4vq+OLiYWoTfoFcBF625eQJn6/B
 eWrzpNUqzFv5Eu31vMnv4Wm/iN8/1Yd+lacSqKwxejdmZzwX79VRqv9kD2naT/eP
 C/tkI1b93Ig7+teIjm+aqMAFDoAeR75oVK0KwKEkDIT19mbX1CSex9VnJtaSHZTM
 ON7N8NBUyIPLmhc3jencrsFNIY11W7xUedcN/+JmrZoi4Xy4Yr0mh3H4fZVs/vAH
 I6uYZgylTZiX0esCqJeiHrasETL0IStDw57+wbrjLlBrATgkMbhZQ+VTBxio/mxV
 yok4TiIB+EHaNMOgIAuT6uaZrXwDwHekR1EX1s4p2cPTz1kstp1ZIC2Gdv910xKB
 gprzPa/ocwqk71QODgwhogwntR/a4edbbYvCVHKLXQH9Pd81jkM=
 =TZTu
 -----END PGP SIGNATURE-----

Merge 5.10.102 into android12-5.10-lts

Changes in 5.10.102
	drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
	mm: memcg: synchronize objcg lists with a dedicated spinlock
	rcu: Do not report strict GPs for outgoing CPUs
	fget: clarify and improve __fget_files() implementation
	fs/proc: task_mmu.c: don't read mapcount for migration entry
	can: isotp: prevent race between isotp_bind() and isotp_setsockopt()
	can: isotp: add SF_BROADCAST support for functional addressing
	scsi: lpfc: Fix mailbox command failure during driver initialization
	HID:Add support for UGTABLET WP5540
	Revert "svm: Add warning message for AVIC IPI invalid target"
	serial: parisc: GSC: fix build when IOSAPIC is not set
	parisc: Drop __init from map_pages declaration
	parisc: Fix data TLB miss in sba_unmap_sg
	parisc: Fix sglist access in ccio-dma.c
	mmc: block: fix read single on recovery logic
	mm: don't try to NUMA-migrate COW pages that have other uses
	PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
	parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
	btrfs: send: in case of IO error log it
	platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
	platform/x86: ISST: Fix possible circular locking dependency detected
	selftests: rtc: Increase test timeout so that all tests run
	kselftest: signal all child processes
	net: ieee802154: at86rf230: Stop leaking skb's
	selftests/zram: Skip max_comp_streams interface on newer kernel
	selftests/zram01.sh: Fix compression ratio calculation
	selftests/zram: Adapt the situation that /dev/zram0 is being used
	selftests: openat2: Print also errno in failure messages
	selftests: openat2: Add missing dependency in Makefile
	selftests: openat2: Skip testcases that fail with EOPNOTSUPP
	selftests: skip mincore.check_file_mmap when fs lacks needed support
	ax25: improve the incomplete fix to avoid UAF and NPD bugs
	vfs: make freeze_super abort when sync_filesystem returns error
	quota: make dquot_quota_sync return errors from ->sync_fs
	scsi: pm8001: Fix use-after-free for aborted TMF sas_task
	scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
	nvme: fix a possible use-after-free in controller reset during load
	nvme-tcp: fix possible use-after-free in transport error_recovery work
	nvme-rdma: fix possible use-after-free in transport error_recovery work
	drm/amdgpu: fix logic inversion in check
	x86/Xen: streamline (and fix) PV CPU enumeration
	Revert "module, async: async_synchronize_full() on module init iff async is used"
	gcc-plugins/stackleak: Use noinstr in favor of notrace
	random: wake up /dev/random writers after zap
	kbuild: lto: merge module sections
	kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled
	iwlwifi: fix use-after-free
	drm/radeon: Fix backlight control on iMac 12,1
	drm/i915/opregion: check port number bounds for SWSCI display power state
	vsock: remove vsock from connected table when connect is interrupted by a signal
	drm/i915/gvt: Make DRM_I915_GVT depend on X86
	iwlwifi: pcie: fix locking when "HW not ready"
	iwlwifi: pcie: gen2: fix locking when "HW not ready"
	selftests: netfilter: fix exit value for nft_concat_range
	netfilter: nft_synproxy: unregister hooks on init error path
	ipv6: per-netns exclusive flowlabel checks
	net: dsa: lan9303: fix reset on probe
	net: dsa: lantiq_gswip: fix use after free in gswip_remove()
	net: ieee802154: ca8210: Fix lifs/sifs periods
	ping: fix the dif and sdif check in ping_lookup
	bonding: force carrier update when releasing slave
	drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
	net_sched: add __rcu annotation to netdev->qdisc
	bonding: fix data-races around agg_select_timer
	libsubcmd: Fix use-after-free for realloc(..., 0)
	dpaa2-eth: Initialize mutex used in one step timestamping path
	perf bpf: Defer freeing string after possible strlen() on it
	selftests/exec: Add non-regular to TEST_GEN_PROGS
	ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
	ALSA: hda/realtek: Fix deadlock by COEF mutex
	ALSA: hda: Fix regression on forced probe mask option
	ALSA: hda: Fix missing codec probe on Shenker Dock 15
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
	powerpc/lib/sstep: fix 'ptesync' build error
	mtd: rawnand: gpmi: don't leak PM reference in error path
	KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests
	ASoC: tas2770: Insert post reset delay
	block/wbt: fix negative inflight counter when remove scsi device
	NFS: LOOKUP_DIRECTORY is also ok with symlinks
	NFS: Do not report writeback errors in nfs_getattr()
	tty: n_tty: do not look ahead for EOL character past the end of the buffer
	mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
	mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
	Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
	KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
	KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
	NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache
	ARM: OMAP2+: hwmod: Add of_node_put() before break
	ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
	phy: usb: Leave some clocks running during suspend
	irqchip/sifive-plic: Add missing thead,c900-plic match string
	netfilter: conntrack: don't refresh sctp entries in closed state
	arm64: dts: meson-gx: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
	pidfd: fix test failure due to stack overflow on some arches
	selftests: fixup build warnings in pidfd / clone3 tests
	kconfig: let 'shell' return enough output for deep path names
	lib/iov_iter: initialize "flags" in new pipe_buffer
	ata: libata-core: Disable TRIM on M88V29
	soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
	xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
	drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
	tracing: Fix tp_printk option related with tp_printk_stop_on_boot
	net: usb: qmi_wwan: Add support for Dell DW5829e
	net: macb: Align the dma and coherent dma masks
	kconfig: fix failing to generate auto.conf
	scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
	EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
	net: sched: limit TC_ACT_REPEAT loops
	dmaengine: sh: rcar-dmac: Check for error num after setting mask
	dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
	dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
	i2c: qcom-cci: don't delete an unregistered adapter
	i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
	copy_process(): Move fd_install() out of sighand->siglock critical section
	i2c: brcmstb: fix support for DSL and CM variants
	lockdep: Correct lock_classes index mapping
	Linux 5.10.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ief12c0d77b23f9796b81a8fc3b79ac6589e81dc9
2022-02-23 12:56:37 +01:00
Dāvis Mosāns
0b17d4b51c btrfs: send: in case of IO error log it
commit 2e7be9db125a0bf940c5d65eb5c40d8700f738b5 upstream.

Currently if we get IO error while doing send then we abort without
logging information about which file caused issue.  So log it to help
with debugging.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:00:58 +01:00
Greg Kroah-Hartman
c3b53fcd90 This is the 5.10.99 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmICqMUACgkQONu9yGCS
 aT7v+A//WsRXEi3x+sMAWYLGQ59WRJLtwf0XCf2I6ym31uJ6y2ZxKtjCoO9HfWuo
 C0RVqH4A56ZcSy1pCalDWGv1fbZWZ/8KhpTiu7GgeqcA4PqwAmNbidZRR8qGzWvQ
 t9pRMf4B8krs8cco60Aq9w6FGXVWOz8CgJFa9ymB5b0d3lKbR16AVTFljaOk+V3X
 iApsaZliiS9U0cnTD1g7mjJq4BQR5h5w6MtVqk0uQH9sPQZXv0XE+gZrkbx+N8L/
 mRPk1pdMJgrh9T6kl84AanSNt9UQ0I/huQY5aUHi3ugUQfWaiPJTALnjsoflNntY
 YdTYi5AkAhZo/thlPBtICxmI1/NwIgH2MnuP9ZDsHThn6zC66KHTn8gCYKS/6Rlt
 LsOMUXZDcMs+d4i9WTKEKtWLsMhJewJzesFpJh4smO/7wR3WGbJhUHr5nVHlmiDy
 FSF7DjZjv9hC49xj0TJBzYK22uh8pjQP8pibXWKVlUG1k/pRLFHQPCM2KBUl//J/
 ArsERFhhayUqdoAe0Qiki6MA859wNf877QFxtCm7eCeoJEvtfgdHJPGDnBO0NfcC
 FsDYRanPhKTDq4eAqYFDUDniL2KWJGH/hQUW4o6Jo7bjkk9GuZFmkEaDygvoqDz1
 hTLZW1vtuBBYi5hGkr1t0GBh2tMCJjLDC0B0FBDZZqhVXpcxl5Q=
 =Iyyd
 -----END PGP SIGNATURE-----

Merge 5.10.99 into android12-5.10-lts

Changes in 5.10.99
	selinux: fix double free of cond_list on error paths
	audit: improve audit queue handling when "audit=1" on cmdline
	ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()
	ASoC: ops: Reject out of bounds values in snd_soc_put_volsw_sx()
	ASoC: ops: Reject out of bounds values in snd_soc_put_xr_sx()
	ALSA: usb-audio: Correct quirk for VF0770
	ALSA: hda: Fix UAF of leds class devs at unbinding
	ALSA: hda: realtek: Fix race at concurrent COEF updates
	ALSA: hda/realtek: Add quirk for ASUS GU603
	ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
	ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
	ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
	btrfs: fix deadlock between quota disable and qgroup rescan worker
	drm/nouveau: fix off by one in BIOS boundary checking
	drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
	nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
	mm/debug_vm_pgtable: remove pte entry from the page table
	mm/pgtable: define pte_index so that preprocessor could recognize it
	mm/kmemleak: avoid scanning potential huge holes
	block: bio-integrity: Advance seed correctly for larger interval sizes
	dma-buf: heaps: Fix potential spectre v1 gadget
	IB/hfi1: Fix AIP early init panic
	Revert "ASoC: mediatek: Check for error clk pointer"
	memcg: charge fs_context and legacy_fs_context
	RDMA/cma: Use correct address when leaving multicast group
	RDMA/ucma: Protect mc during concurrent multicast leaves
	IB/rdmavt: Validate remote_addr during loopback atomic tests
	RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
	RDMA/mlx4: Don't continue event handler after memory allocation failure
	iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
	iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
	spi: bcm-qspi: check for valid cs before applying chip select
	spi: mediatek: Avoid NULL pointer crash in interrupt
	spi: meson-spicc: add IRQ check in meson_spicc_probe
	spi: uniphier: fix reference count leak in uniphier_spi_probe()
	net: ieee802154: hwsim: Ensure proper channel selection at probe time
	net: ieee802154: mcr20a: Fix lifs/sifs periods
	net: ieee802154: ca8210: Stop leaking skb's
	net: ieee802154: Return meaningful error codes from the netlink helpers
	net: macsec: Fix offload support for NETDEV_UNREGISTER event
	net: macsec: Verify that send_sci is on when setting Tx sci explicitly
	net: stmmac: dump gmac4 DMA registers correctly
	net: stmmac: ensure PTP time register reads are consistent
	drm/i915/overlay: Prevent divide by zero bugs in scaling
	ASoC: fsl: Add missing error handling in pcm030_fabric_probe
	ASoC: xilinx: xlnx_formatter_pcm: Make buffer bytes multiple of period bytes
	ASoC: cpcap: Check for NULL pointer after calling of_get_child_by_name
	ASoC: max9759: fix underflow in speaker_gain_control_put()
	pinctrl: intel: Fix a glitch when updating IRQ flags on a preconfigured line
	pinctrl: intel: fix unexpected interrupt
	pinctrl: bcm2835: Fix a few error paths
	scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
	nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
	gve: fix the wrong AdminQ buffer queue index check
	bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
	selftests/exec: Remove pipe from TEST_GEN_FILES
	selftests: futex: Use variable MAKE instead of make
	tools/resolve_btfids: Do not print any commands when building silently
	rtc: cmos: Evaluate century appropriate
	Revert "fbcon: Disable accelerated scrolling"
	fbcon: Add option to enable legacy hardware acceleration
	perf stat: Fix display of grouped aliased events
	perf/x86/intel/pt: Fix crash with stop filters in single-range mode
	x86/perf: Default set FREEZE_ON_SMI for all
	EDAC/altera: Fix deferred probing
	EDAC/xgene: Fix deferred probing
	ext4: prevent used blocks from being allocated during fast commit replay
	ext4: modify the logic of ext4_mb_new_blocks_simple
	ext4: fix error handling in ext4_restore_inline_data()
	ext4: fix error handling in ext4_fc_record_modified_inode()
	ext4: fix incorrect type issue during replay_del_range
	net: dsa: mt7530: make NET_DSA_MT7530 select MEDIATEK_GE_PHY
	cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
	selftests: nft_concat_range: add test for reload with no element add/del
	Linux 5.10.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idc1d987b935d86d2a201e0b4a8db801c08c71b98
2022-02-09 12:14:04 +01:00
Shin'ichiro Kawasaki
32747e0143 btrfs: fix deadlock between quota disable and qgroup rescan worker
commit e804861bd4e69cc5fe1053eedcb024982dde8e48 upstream.

Quota disable ioctl starts a transaction before waiting for the qgroup
rescan worker completes. However, this wait can be infinite and results
in deadlock because of circular dependency among the quota disable
ioctl, the qgroup rescan worker and the other task with transaction such
as block group relocation task.

The deadlock happens with the steps following:

1) Task A calls ioctl to disable quota. It starts a transaction and
   waits for qgroup rescan worker completes.
2) Task B such as block group relocation task starts a transaction and
   joins to the transaction that task A started. Then task B commits to
   the transaction. In this commit, task B waits for a commit by task A.
3) Task C as the qgroup rescan worker starts its job and starts a
   transaction. In this transaction start, task C waits for completion
   of the transaction that task A started and task B committed.

This deadlock was found with fstests test case btrfs/115 and a zoned
null_blk device. The test case enables and disables quota, and the
block group reclaim was triggered during the quota disable by chance.
The deadlock was also observed by running quota enable and disable in
parallel with 'btrfs balance' command on regular null_blk devices.

An example report of the deadlock:

  [372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
  [372.479944]       Not tainted 5.16.0-rc8 #7
  [372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [372.493898] task:kworker/u16:6   state:D stack:    0 pid:  103 ppid:     2 flags:0x00004000
  [372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
  [372.510782] Call Trace:
  [372.514092]  <TASK>
  [372.521684]  __schedule+0xb56/0x4850
  [372.530104]  ? io_schedule_timeout+0x190/0x190
  [372.538842]  ? lockdep_hardirqs_on+0x7e/0x100
  [372.547092]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [372.555591]  schedule+0xe0/0x270
  [372.561894]  btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
  [372.570506]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
  [372.578875]  ? free_unref_page+0x3f2/0x650
  [372.585484]  ? finish_wait+0x270/0x270
  [372.591594]  ? release_extent_buffer+0x224/0x420 [btrfs]
  [372.599264]  btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
  [372.607157]  ? lock_release+0x3a9/0x6d0
  [372.613054]  ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
  [372.620960]  ? do_raw_spin_lock+0x11e/0x250
  [372.627137]  ? rwlock_bug.part.0+0x90/0x90
  [372.633215]  ? lock_is_held_type+0xe4/0x140
  [372.639404]  btrfs_work_helper+0x1ae/0xa90 [btrfs]
  [372.646268]  process_one_work+0x7e9/0x1320
  [372.652321]  ? lock_release+0x6d0/0x6d0
  [372.658081]  ? pwq_dec_nr_in_flight+0x230/0x230
  [372.664513]  ? rwlock_bug.part.0+0x90/0x90
  [372.670529]  worker_thread+0x59e/0xf90
  [372.676172]  ? process_one_work+0x1320/0x1320
  [372.682440]  kthread+0x3b9/0x490
  [372.687550]  ? _raw_spin_unlock_irq+0x24/0x50
  [372.693811]  ? set_kthread_struct+0x100/0x100
  [372.700052]  ret_from_fork+0x22/0x30
  [372.705517]  </TASK>
  [372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
  [372.729827]       Not tainted 5.16.0-rc8 #7
  [372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [372.767106] task:btrfs-transacti state:D stack:    0 pid: 2347 ppid:     2 flags:0x00004000
  [372.787776] Call Trace:
  [372.801652]  <TASK>
  [372.812961]  __schedule+0xb56/0x4850
  [372.830011]  ? io_schedule_timeout+0x190/0x190
  [372.852547]  ? lockdep_hardirqs_on+0x7e/0x100
  [372.871761]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [372.886792]  schedule+0xe0/0x270
  [372.901685]  wait_current_trans+0x22c/0x310 [btrfs]
  [372.919743]  ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
  [372.938923]  ? finish_wait+0x270/0x270
  [372.959085]  ? join_transaction+0xc75/0xe30 [btrfs]
  [372.977706]  start_transaction+0x938/0x10a0 [btrfs]
  [372.997168]  transaction_kthread+0x19d/0x3c0 [btrfs]
  [373.013021]  ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs]
  [373.031678]  kthread+0x3b9/0x490
  [373.047420]  ? _raw_spin_unlock_irq+0x24/0x50
  [373.064645]  ? set_kthread_struct+0x100/0x100
  [373.078571]  ret_from_fork+0x22/0x30
  [373.091197]  </TASK>
  [373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds.
  [373.114147]       Not tainted 5.16.0-rc8 #7
  [373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [373.130393] task:btrfs           state:D stack:    0 pid: 3145 ppid:  3141 flags:0x00004000
  [373.140998] Call Trace:
  [373.145501]  <TASK>
  [373.149654]  __schedule+0xb56/0x4850
  [373.155306]  ? io_schedule_timeout+0x190/0x190
  [373.161965]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.168469]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [373.175468]  schedule+0xe0/0x270
  [373.180814]  wait_for_commit+0x104/0x150 [btrfs]
  [373.187643]  ? test_and_set_bit+0x20/0x20 [btrfs]
  [373.194772]  ? kmem_cache_free+0x124/0x550
  [373.201191]  ? btrfs_put_transaction+0x69/0x3d0 [btrfs]
  [373.208738]  ? finish_wait+0x270/0x270
  [373.214704]  ? __btrfs_end_transaction+0x347/0x7b0 [btrfs]
  [373.222342]  btrfs_commit_transaction+0x44d/0x2610 [btrfs]
  [373.230233]  ? join_transaction+0x255/0xe30 [btrfs]
  [373.237334]  ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs]
  [373.245251]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
  [373.253296]  relocate_block_group+0x105/0xc20 [btrfs]
  [373.260533]  ? mutex_lock_io_nested+0x1270/0x1270
  [373.267516]  ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs]
  [373.275155]  ? merge_reloc_roots+0x710/0x710 [btrfs]
  [373.283602]  ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs]
  [373.291934]  ? kmem_cache_free+0x124/0x550
  [373.298180]  btrfs_relocate_block_group+0x35c/0x930 [btrfs]
  [373.306047]  btrfs_relocate_chunk+0x85/0x210 [btrfs]
  [373.313229]  btrfs_balance+0x12f4/0x2d20 [btrfs]
  [373.320227]  ? lock_release+0x3a9/0x6d0
  [373.326206]  ? btrfs_relocate_chunk+0x210/0x210 [btrfs]
  [373.333591]  ? lock_is_held_type+0xe4/0x140
  [373.340031]  ? rcu_read_lock_sched_held+0x3f/0x70
  [373.346910]  btrfs_ioctl_balance+0x548/0x700 [btrfs]
  [373.354207]  btrfs_ioctl+0x7f2/0x71b0 [btrfs]
  [373.360774]  ? lockdep_hardirqs_on_prepare+0x410/0x410
  [373.367957]  ? lockdep_hardirqs_on_prepare+0x410/0x410
  [373.375327]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
  [373.383841]  ? find_held_lock+0x2c/0x110
  [373.389993]  ? lock_release+0x3a9/0x6d0
  [373.395828]  ? mntput_no_expire+0xf7/0xad0
  [373.402083]  ? lock_is_held_type+0xe4/0x140
  [373.408249]  ? vfs_fileattr_set+0x9f0/0x9f0
  [373.414486]  ? selinux_file_ioctl+0x349/0x4e0
  [373.420938]  ? trace_raw_output_lock+0xb4/0xe0
  [373.427442]  ? selinux_inode_getsecctx+0x80/0x80
  [373.434224]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.440660]  ? force_qs_rnp+0x2a0/0x6b0
  [373.446534]  ? lock_is_held_type+0x9b/0x140
  [373.452763]  ? __blkcg_punt_bio_submit+0x1b0/0x1b0
  [373.459732]  ? security_file_ioctl+0x50/0x90
  [373.466089]  __x64_sys_ioctl+0x127/0x190
  [373.472022]  do_syscall_64+0x3b/0x90
  [373.477513]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [373.484823] RIP: 0033:0x7f8f4af7e2bb
  [373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb
  [373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003
  [373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0
  [373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001
  [373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002
  [373.546506]  </TASK>
  [373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds.
  [373.559383]       Not tainted 5.16.0-rc8 #7
  [373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [373.575748] task:btrfs           state:D stack:    0 pid: 3146 ppid:  2168 flags:0x00000000
  [373.586314] Call Trace:
  [373.590846]  <TASK>
  [373.595121]  __schedule+0xb56/0x4850
  [373.600901]  ? __lock_acquire+0x23db/0x5030
  [373.607176]  ? io_schedule_timeout+0x190/0x190
  [373.613954]  schedule+0xe0/0x270
  [373.619157]  schedule_timeout+0x168/0x220
  [373.625170]  ? usleep_range_state+0x150/0x150
  [373.631653]  ? mark_held_locks+0x9e/0xe0
  [373.637767]  ? do_raw_spin_lock+0x11e/0x250
  [373.643993]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
  [373.651267]  ? _raw_spin_unlock_irq+0x24/0x50
  [373.657677]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.664103]  wait_for_completion+0x163/0x250
  [373.670437]  ? bit_wait_timeout+0x160/0x160
  [373.676585]  btrfs_quota_disable+0x176/0x9a0 [btrfs]
  [373.683979]  ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs]
  [373.691340]  ? down_write+0xd0/0x130
  [373.696880]  ? down_write_killable+0x150/0x150
  [373.703352]  btrfs_ioctl+0x3945/0x71b0 [btrfs]
  [373.710061]  ? find_held_lock+0x2c/0x110
  [373.716192]  ? lock_release+0x3a9/0x6d0
  [373.722047]  ? __handle_mm_fault+0x23cd/0x3050
  [373.728486]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
  [373.737032]  ? set_pte+0x6a/0x90
  [373.742271]  ? do_raw_spin_unlock+0x55/0x1f0
  [373.748506]  ? lock_is_held_type+0xe4/0x140
  [373.754792]  ? vfs_fileattr_set+0x9f0/0x9f0
  [373.761083]  ? selinux_file_ioctl+0x349/0x4e0
  [373.767521]  ? selinux_inode_getsecctx+0x80/0x80
  [373.774247]  ? __up_read+0x182/0x6e0
  [373.780026]  ? count_memcg_events.constprop.0+0x46/0x60
  [373.787281]  ? up_write+0x460/0x460
  [373.792932]  ? security_file_ioctl+0x50/0x90
  [373.799232]  __x64_sys_ioctl+0x127/0x190
  [373.805237]  do_syscall_64+0x3b/0x90
  [373.810947]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [373.818102] RIP: 0033:0x7f1383ea02bb
  [373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  [373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb
  [373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003
  [373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
  [373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a
  [373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000
  [373.879838]  </TASK>
  [373.884018]
               Showing all locks held in the system:
  [373.894250] 3 locks held by kworker/4:1/58:
  [373.900356] 1 lock held by khungtaskd/63:
  [373.906333]  #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260
  [373.917307] 3 locks held by kworker/u16:6/103:
  [373.923938]  #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320
  [373.936555]  #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320
  [373.951109]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs]
  [373.964027] 2 locks held by less/1803:
  [373.969982]  #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80
  [373.981295]  #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060
  [373.992969] 1 lock held by btrfs-transacti/2347:
  [373.999893]  #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs]
  [374.015872] 3 locks held by btrfs/3145:
  [374.022298]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs]
  [374.034456]  #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs]
  [374.047646]  #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs]
  [374.063295] 4 locks held by btrfs/3146:
  [374.069647]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs]
  [374.081601]  #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs]
  [374.094283]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs]
  [374.106885]  #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs]

  [374.126780] =============================================

To avoid the deadlock, wait for the qgroup rescan worker to complete
before starting the transaction for the quota disable ioctl. Clear
BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to
request the worker to complete. On transaction start failure, set the
BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag
changes can be done safely since the function btrfs_quota_disable is not
called concurrently because of fs_info->subvol_sem.

Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid
another qgroup rescan worker to start after the previous qgroup worker
completed.

CC: stable@vger.kernel.org # 5.4+
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08 18:30:35 +01:00
Greg Kroah-Hartman
0b4470b56e This is the 5.10.96 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmH5XyEACgkQONu9yGCS
 aT7z0xAAmnpQ4KYDjmIJW/QjwHMk26wAZGs+w5UbtaFokqqpMuloeNj1HJdpt9gq
 JG6A7Yz2tqXvd+4BzS+XKimqo8CXXbVx5zLhk5lkJE7NTAJN0hP4uvzuBjNFyE+V
 p35aCsBNnMJpjimp+B2c0WuUJX++/n0olBpVzAr1/kwwkz/KODyK8D++S8E+y3Fv
 gkDt3CYJ8E2XPCEQfPHd5blWav0cmyR/O3qViWPuakg1zQ2xxKVjuIAVBuwlBJfB
 zRnnsM9qy3b+h/x43aX++5vucZgDF4BZXJ63kIkkAfe2quqGC683NyjTjOfGLHSW
 bDaVO7IK69tDNznE3jjoLoQ66+soUgvAS5kVYL6sQ7DW8ObghYnD6xfw7J1npva3
 //SZgkC8ifgucLn/ERK5R5mQNmiKYeoukUYSLLB0eG2aYvtQkaChL98t7O9Js1fH
 sZOf4a9SpAnyHQKv6PMV6rsIxE0rkuAqeBVuerOjknjFLbkdPDTijk5GeER3+C+j
 0Ihud+n+HrTRrw7wZOOosPrMVoXNy2LPxphFpsrpFt5QFYu3q24QefwL/EWI1Y4W
 gC3VBqIQS/otZ2aCYlIiPZls38UoQxdWS2pgG5aUOKX7iHduzMkTOYZzq4aYOBws
 Oo+2rl7tzWirrYeIkSLvmwW3BsluQGOvul1IwGk9m1nZ1PDdUIw=
 =39jp
 -----END PGP SIGNATURE-----

Merge 5.10.96 into android12-5.10-lts

Changes in 5.10.96
	Bluetooth: refactor malicious adv data check
	media: venus: core: Drop second v4l2 device unregister
	net: sfp: ignore disabled SFP node
	net: stmmac: skip only stmmac_ptp_register when resume from suspend
	s390/module: fix loading modules with a lot of relocations
	s390/hypfs: include z/VM guests with access control group set
	bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
	scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
	udf: Restore i_lenAlloc when inode expansion fails
	udf: Fix NULL ptr deref when converting from inline format
	efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
	PM: wakeup: simplify the output logic of pm_show_wakelocks()
	tracing/histogram: Fix a potential memory leak for kstrdup()
	tracing: Don't inc err_log entry count if entry allocation fails
	ceph: properly put ceph_string reference after async create attempt
	ceph: set pool_ns in new inode layout for async creates
	fsnotify: fix fsnotify hooks in pseudo filesystems
	Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
	perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX
	drm/etnaviv: relax submit size limits
	KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
	arm64: errata: Fix exec handling in erratum 1418040 workaround
	netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
	serial: 8250: of: Fix mapped region size when using reg-offset property
	serial: stm32: fix software flow control transfer
	tty: n_gsm: fix SW flow control encoding/handling
	tty: Add support for Brainboxes UC cards.
	usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
	usb: xhci-plat: fix crash when suspend if remote wake enable
	usb: common: ulpi: Fix crash in ulpi_match()
	usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
	USB: core: Fix hang in usb_kill_urb by adding memory barriers
	usb: typec: tcpm: Do not disconnect while receiving VBUS off
	ucsi_ccg: Check DEV_INT bit only when starting CCG4
	jbd2: export jbd2_journal_[grab|put]_journal_head
	ocfs2: fix a deadlock when commit trans
	sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
	x86/MCE/AMD: Allow thresholding interface updates after init
	powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
	powerpc/32s: Fix kasan_init_region() for KASAN
	powerpc/32: Fix boot failure with GCC latent entropy plugin
	i40e: Increase delay to 1 s after global EMP reset
	i40e: Fix issue when maximum queues is exceeded
	i40e: Fix queues reservation for XDP
	i40e: Fix for failed to init adminq while VF reset
	i40e: fix unsigned stat widths
	usb: roles: fix include/linux/usb/role.h compile issue
	rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
	rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
	scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
	ipv6_tunnel: Rate limit warning messages
	net: fix information leakage in /proc/net/ptype
	hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
	hwmon: (lm90) Mark alert as broken for MAX6680
	ping: fix the sk_bound_dev_if match in ping_lookup
	ipv4: avoid using shared IP generator for connected sockets
	hwmon: (lm90) Reduce maximum conversion rate for G781
	NFSv4: Handle case where the lookup of a directory fails
	NFSv4: nfs_atomic_open() can race when looking up a non-regular file
	net-procfs: show net devices bound packet types
	drm/msm: Fix wrong size calculation
	drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
	drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
	ipv6: annotate accesses to fn->fn_sernum
	NFS: Ensure the server has an up to date ctime before hardlinking
	NFS: Ensure the server has an up to date ctime before renaming
	powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
	netfilter: conntrack: don't increment invalid counter on NF_REPEAT
	kernel: delete repeated words in comments
	perf: Fix perf_event_read_local() time
	sched/pelt: Relax the sync of util_sum with util_avg
	net: phy: broadcom: hook up soft_reset for BCM54616S
	phylib: fix potential use-after-free
	octeontx2-pf: Forward error codes to VF
	rxrpc: Adjust retransmission backoff
	efi/libstub: arm64: Fix image check alignment at entry
	hwmon: (lm90) Mark alert as broken for MAX6654
	powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
	net: ipv4: Move ip_options_fragment() out of loop
	net: ipv4: Fix the warning for dereference
	ipv4: fix ip option filtering for locally generated fragments
	ibmvnic: init ->running_cap_crqs early
	ibmvnic: don't spin in tasklet
	video: hyperv_fb: Fix validation of screen resolution
	drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
	drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
	yam: fix a memory leak in yam_siocdevprivate()
	net: cpsw: Properly initialise struct page_pool_params
	net: hns3: handle empty unknown interrupt for VF
	Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
	net: bridge: vlan: fix single net device option dumping
	ipv4: raw: lock the socket in raw_bind()
	ipv4: tcp: send zero IPID in SYNACK messages
	ipv4: remove sparse error in ip_neigh_gw4()
	net: bridge: vlan: fix memory leak in __allowed_ingress
	dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
	usr/include/Makefile: add linux/nfc.h to the compile-test coverage
	fsnotify: invalidate dcache before IN_DELETE event
	block: Fix wrong offset in bio_truncate()
	mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()
	Linux 5.10.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie34be06fa082557e93eda246f1a9ebf9f155a138
2022-02-07 11:17:58 +01:00
Amir Goldstein
0b4e82403c fsnotify: invalidate dcache before IN_DELETE event
commit a37d9a17f099072fe4d3a9048b0321978707a918 upstream.

Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.

Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

To fix the regression, create a new hook fsnotify_delete() that takes
the unlinked inode as an argument and use a helper d_delete_notify() to
pin the inode, so we can pass it to fsnotify_delete() after d_delete().

Backporting hint: this regression is from v5.3. Although patch will
apply with only trivial conflicts to v5.4 and v5.10, it won't build,
because fsnotify_delete() implementation is different in each of those
versions (see fsnotify_link()).

A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo
filesystem that do not need to call d_delete().

Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01 17:25:48 +01:00
Greg Kroah-Hartman
4ec3c2eea5 This is the 5.10.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmHya+IACgkQONu9yGCS
 aT7k/RAAqdd8bNhAiO6iDpvZbQtxq9jepx4KCkcd+gsYvSvePEnBQHaTaavfCK/7
 +taWsc2i6Hwc1Z4oVfUiU02cCsqMJJXqb0nfJaavE5cZspbTc7QeS0zv9BkzSKUZ
 +DxXWyjzeJquI7EbdU0n7inb0iwvBxmTGNIg2j1pPi81Q7XnpjmDsSvuoftRQ0AN
 DGYefowWL1VcfsZRfhEudnpxWC/DjOdB8zD34SgXxKat6/O8YG4T5pM2BOUlKtOW
 QcXFhpG6gU0XqmI3QQDvAESujOQxzC1u3AwIOHYJ/tlFYsUJXjiZIEVnMqPYGpNl
 fXS8xcNSeo6fipXWkzsc489Tteq9H+bfm8sBG0jhExRnyGckkihJDhRFZ9yBrzo/
 1PtUdUIJ4d5fUmdZp7gxucncFyIYFsyJm/5nsYmObP794oStPGKsH8llhp/PcEFF
 ua1+Gy2WW2f6BOaDVFmt+jWZRMa/3oZnFhe8/FPRsAjGOj+q/+V6bsksGDYupsrM
 x8/QQI6OVlnOZsrdpX7zkW46qLov0J0bO9cANTm7kcRaesrkikFKqiIF2uIW2OU6
 W0tZZf0Jy/gSLEljkZ3SuMHCmldWKm/KxMYSjnQ3Y34QvOLYPNAZGF78rbV3s8/0
 otGR3ra5TKCz1kxuDaE1FqZXxpPQidXbF4QUoaRIaPwA1k5NvLY=
 =ktJL
 -----END PGP SIGNATURE-----

Merge 5.10.94 into android12-5.10-lts

Changes in 5.10.94
	KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock
	HID: uhid: Fix worker destroying device without any protection
	HID: wacom: Reset expected and received contact counts at the same time
	HID: wacom: Ignore the confidence flag when a touch is removed
	HID: wacom: Avoid using stale array indicies to read contact count
	f2fs: fix to do sanity check in is_alive()
	nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
	mtd: rawnand: gpmi: Add ERR007117 protection for nfc_apply_timings
	mtd: rawnand: gpmi: Remove explicit default gpmi clock setting for i.MX6
	mtd: Fixed breaking list in __mtd_del_partition.
	mtd: rawnand: davinci: Don't calculate ECC when reading page
	mtd: rawnand: davinci: Avoid duplicated page read
	mtd: rawnand: davinci: Rewrite function description
	x86/gpu: Reserve stolen memory for first integrated Intel GPU
	tools/nolibc: x86-64: Fix startup code bug
	tools/nolibc: i386: fix initial stack alignment
	tools/nolibc: fix incorrect truncation of exit code
	rtc: cmos: take rtc_lock while reading from CMOS
	media: v4l2-ioctl.c: readbuffers depends on V4L2_CAP_READWRITE
	media: flexcop-usb: fix control-message timeouts
	media: mceusb: fix control-message timeouts
	media: em28xx: fix control-message timeouts
	media: cpia2: fix control-message timeouts
	media: s2255: fix control-message timeouts
	media: dib0700: fix undefined behavior in tuner shutdown
	media: redrat3: fix control-message timeouts
	media: pvrusb2: fix control-message timeouts
	media: stk1160: fix control-message timeouts
	media: cec-pin: fix interrupt en/disable handling
	can: softing_cs: softingcs_probe(): fix memleak on registration failure
	iio: adc: ti-adc081c: Partial revert of removal of ACPI IDs
	lkdtm: Fix content of section containing lkdtm_rodata_do_nothing()
	iommu/io-pgtable-arm-v7s: Add error handle for page table allocation failure
	gpu: host1x: Add back arm_iommu_detach_device()
	dma_fence_array: Fix PENDING_ERROR leak in dma_fence_array_signaled()
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller
	mm_zone: add function to check if managed dma zone exists
	dma/pool: create dma atomic pool only if dma zone has managed pages
	mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
	shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode
	drm/ttm: Put BO in its memory manager's lru list
	Bluetooth: L2CAP: Fix not initializing sk_peer_pid
	drm/bridge: display-connector: fix an uninitialized pointer in probe()
	drm: fix null-ptr-deref in drm_dev_init_release()
	drm/panel: kingdisplay-kd097d04: Delete panel on attach() failure
	drm/panel: innolux-p079zca: Delete panel on attach() failure
	drm/rockchip: dsi: Fix unbalanced clock on probe error
	drm/rockchip: dsi: Hold pm-runtime across bind/unbind
	drm/rockchip: dsi: Disable PLL clock on bind error
	drm/rockchip: dsi: Reconfigure hardware on resume()
	Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails
	clk: bcm-2835: Pick the closest clock rate
	clk: bcm-2835: Remove rounding up the dividers
	drm/vc4: hdmi: Set a default HSM rate
	wcn36xx: ensure pairing of init_scan/finish_scan and start_scan/end_scan
	wcn36xx: Indicate beacon not connection loss on MISSED_BEACON_IND
	wcn36xx: Fix DMA channel enable/disable cycle
	wcn36xx: Release DMA channel descriptor allocations
	wcn36xx: Put DXE block into reset before freeing memory
	wcn36xx: populate band before determining rate on RX
	wcn36xx: fix RX BD rate mapping for 5GHz legacy rates
	ath11k: Send PPDU_STATS_CFG with proper pdev mask to firmware
	mtd: hyperbus: rpc-if: Check return value of rpcif_sw_init()
	media: videobuf2: Fix the size printk format
	media: atomisp: add missing media_device_cleanup() in atomisp_unregister_entities()
	media: atomisp: fix punit_ddr_dvfs_enable() argument for mrfld_power up case
	media: atomisp: fix inverted logic in buffers_needed()
	media: atomisp: do not use err var when checking port validity for ISP2400
	media: atomisp: fix inverted error check for ia_css_mipi_is_source_port_valid()
	media: atomisp: fix ifdefs in sh_css.c
	media: staging: media: atomisp: pci: Balance braces around conditional statements in file atomisp_cmd.c
	media: atomisp: add NULL check for asd obtained from atomisp_video_pipe
	media: atomisp: fix enum formats logic
	media: atomisp: fix uninitialized bug in gmin_get_pmic_id_and_addr()
	media: aspeed: fix mode-detect always time out at 2nd run
	media: em28xx: fix memory leak in em28xx_init_dev
	media: aspeed: Update signal status immediately to ensure sane hw state
	arm64: dts: amlogic: meson-g12: Fix GPU operating point table node name
	arm64: dts: amlogic: Fix SPI NOR flash node name for ODROID N2/N2+
	arm64: dts: meson-gxbb-wetek: fix HDMI in early boot
	arm64: dts: meson-gxbb-wetek: fix missing GPIO binding
	fs: dlm: use sk->sk_socket instead of con->sock
	fs: dlm: don't call kernel_getpeername() in error_report()
	memory: renesas-rpc-if: Return error in case devm_ioremap_resource() fails
	Bluetooth: stop proccessing malicious adv data
	ath11k: Fix ETSI regd with weather radar overlap
	ath11k: clear the keys properly via DISABLE_KEY
	ath11k: reset RSN/WPA present state for open BSS
	tee: fix put order in teedev_close_context()
	fs: dlm: fix build with CONFIG_IPV6 disabled
	drm/vboxvideo: fix a NULL vs IS_ERR() check
	arm64: dts: renesas: cat875: Add rx/tx delays
	media: dmxdev: fix UAF when dvb_register_device() fails
	crypto: qce - fix uaf on qce_ahash_register_one
	crypto: qce - fix uaf on qce_skcipher_register_one
	mtd: hyperbus: rpc-if: fix bug in rpcif_hb_remove
	ARM: dts: stm32: fix dtbs_check warning on ili9341 dts binding on stm32f429 disco
	crypto: qat - fix spelling mistake: "messge" -> "message"
	crypto: qat - remove unnecessary collision prevention step in PFVF
	crypto: qat - make pfvf send message direction agnostic
	crypto: qat - fix undetected PFVF timeout in ACK loop
	ath11k: Use host CE parameters for CE interrupts configuration
	arm64: dts: ti: k3-j721e: correct cache-sets info
	tty: serial: atmel: Check return code of dmaengine_submit()
	tty: serial: atmel: Call dma_async_issue_pending()
	mfd: atmel-flexcom: Remove #ifdef CONFIG_PM_SLEEP
	mfd: atmel-flexcom: Use .resume_noirq
	media: rcar-csi2: Correct the selection of hsfreqrange
	media: imx-pxp: Initialize the spinlock prior to using it
	media: si470x-i2c: fix possible memory leak in si470x_i2c_probe()
	media: mtk-vcodec: call v4l2_m2m_ctx_release first when file is released
	media: coda: fix CODA960 JPEG encoder buffer overflow
	media: venus: pm_helpers: Control core power domain manually
	media: venus: core, venc, vdec: Fix probe dependency error
	media: venus: core: Fix a potential NULL pointer dereference in an error handling path
	media: venus: core: Fix a resource leak in the error handling path of 'venus_probe()'
	thermal/drivers/imx: Implement runtime PM support
	netfilter: bridge: add support for pppoe filtering
	arm64: dts: qcom: msm8916: fix MMC controller aliases
	cgroup: Trace event cgroup id fields should be u64
	ACPI: EC: Rework flushing of EC work while suspended to idle
	thermal/drivers/imx8mm: Enable ADC when enabling monitor
	drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
	drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()
	arm64: dts: ti: k3-j7200: Fix the L2 cache sets
	arm64: dts: ti: k3-j721e: Fix the L2 cache sets
	arm64: dts: ti: k3-j7200: Correct the d-cache-sets info
	tty: serial: uartlite: allow 64 bit address
	serial: amba-pl011: do not request memory region twice
	floppy: Fix hang in watchdog when disk is ejected
	staging: rtl8192e: return error code from rtllib_softmac_init()
	staging: rtl8192e: rtllib_module: fix error handle case in alloc_rtllib()
	Bluetooth: btmtksdio: fix resume failure
	sched/fair: Fix detection of per-CPU kthreads waking a task
	sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacity
	bpf: Adjust BTF log size limit.
	bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD)
	bpf: Remove config check to enable bpf support for branch records
	arm64: lib: Annotate {clear, copy}_page() as position-independent
	arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
	media: dib8000: Fix a memleak in dib8000_init()
	media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach()
	media: si2157: Fix "warm" tuner state detection
	wireless: iwlwifi: Fix a double free in iwl_txq_dyn_alloc_dma
	sched/rt: Try to restart rt period timer when rt runtime exceeded
	drm/msm/dp: displayPort driver need algorithm rational
	rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
	mwifiex: Fix possible ABBA deadlock
	xfrm: fix a small bug in xfrm_sa_len()
	x86/uaccess: Move variable into switch case statement
	selftests: clone3: clone3: add case CLONE3_ARGS_NO_TEST
	selftests: harness: avoid false negatives if test has no ASSERTs
	crypto: stm32 - Fix last sparse warning in stm32_cryp_check_ctr_counter
	crypto: stm32/cryp - fix CTR counter carry
	crypto: stm32/cryp - fix xts and race condition in crypto_engine requests
	crypto: stm32/cryp - check early input data
	crypto: stm32/cryp - fix double pm exit
	crypto: stm32/cryp - fix lrw chaining mode
	crypto: stm32/cryp - fix bugs and crash in tests
	crypto: stm32 - Revert broken pm_runtime_resume_and_get changes
	ath11k: Fix deleting uninitialized kernel timer during fragment cache flush
	ARM: dts: gemini: NAS4220-B: fis-index-block with 128 KiB sectors
	media: dw2102: Fix use after free
	media: msi001: fix possible null-ptr-deref in msi001_probe()
	media: coda/imx-vdoa: Handle dma_set_coherent_mask error codes
	ath11k: Fix a NULL pointer dereference in ath11k_mac_op_hw_scan()
	arm64: dts: qcom: c630: Fix soundcard setup
	arm64: dts: qcom: ipq6018: Fix gpio-ranges property
	drm/msm/dpu: fix safe status debugfs file
	drm/bridge: ti-sn65dsi86: Set max register for regmap
	drm/tegra: vic: Fix DMA API misuse
	media: hantro: Fix probe func error path
	xfrm: interface with if_id 0 should return error
	xfrm: state and policy should fail if XFRMA_IF_ID 0
	ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding
	usb: ftdi-elan: fix memory leak on device disconnect
	arm64: dts: marvell: cn9130: add GPIO and SPI aliases
	arm64: dts: marvell: cn9130: enable CP0 GPIO controllers
	ARM: dts: armada-38x: Add generic compatible to UART nodes
	iwlwifi: mvm: fix 32-bit build in FTM
	iwlwifi: mvm: test roc running status bits before removing the sta
	mmc: meson-mx-sdhc: add IRQ check
	mmc: meson-mx-sdio: add IRQ check
	selinux: fix potential memleak in selinux_add_opt()
	um: fix ndelay/udelay defines
	um: virtio_uml: Fix time-travel external time propagation
	Bluetooth: L2CAP: Fix using wrong mode
	bpftool: Enable line buffering for stdout
	backlight: qcom-wled: Validate enabled string indices in DT
	backlight: qcom-wled: Pass number of elements to read to read_u32_array
	backlight: qcom-wled: Fix off-by-one maximum with default num_strings
	backlight: qcom-wled: Override default length with qcom,enabled-strings
	backlight: qcom-wled: Use cpu_to_le16 macro to perform conversion
	backlight: qcom-wled: Respect enabled-strings in set_brightness
	software node: fix wrong node passed to find nargs_prop
	Bluetooth: hci_qca: Stop IBS timer during BT OFF
	x86/boot/compressed: Move CLANG_FLAGS to beginning of KBUILD_CFLAGS
	hwmon: (mr75203) fix wrong power-up delay value
	x86/mce/inject: Avoid out-of-bounds write when setting flags
	ACPI: scan: Create platform device for BCM4752 and LNV4752 ACPI nodes
	pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region()
	pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region()
	power: reset: mt6397: Check for null res pointer
	netfilter: ipt_CLUSTERIP: fix refcount leak in clusterip_tg_check()
	bpf: Don't promote bogus looking registers after null check.
	bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt().
	netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone
	ppp: ensure minimum packet size in ppp_write()
	rocker: fix a sleeping in atomic bug
	staging: greybus: audio: Check null pointer
	fsl/fman: Check for null pointer after calling devm_ioremap
	Bluetooth: hci_bcm: Check for error irq
	Bluetooth: hci_qca: Fix NULL vs IS_ERR_OR_NULL check in qca_serdev_probe
	usb: dwc3: qcom: Fix NULL vs IS_ERR checking in dwc3_qcom_probe
	HID: hid-uclogic-params: Invalid parameter check in uclogic_params_init
	HID: hid-uclogic-params: Invalid parameter check in uclogic_params_get_str_desc
	HID: hid-uclogic-params: Invalid parameter check in uclogic_params_huion_init
	HID: hid-uclogic-params: Invalid parameter check in uclogic_params_frame_init_v1_buttonpad
	debugfs: lockdown: Allow reading debugfs files that are not world readable
	net/mlx5e: Fix page DMA map/unmap attributes
	net/mlx5e: Don't block routes with nexthop objects in SW
	Revert "net/mlx5e: Block offload of outer header csum for UDP tunnels"
	net/mlx5: Set command entry semaphore up once got index free
	lib/mpi: Add the return value check of kcalloc()
	Bluetooth: L2CAP: uninitialized variables in l2cap_sock_setsockopt()
	spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe
	ax25: uninitialized variable in ax25_setsockopt()
	netrom: fix api breakage in nr_setsockopt()
	regmap: Call regmap_debugfs_exit() prior to _init()
	can: mcp251xfd: add missing newline to printed strings
	tpm: add request_locality before write TPM_INT_ENABLE
	tpm_tis: Fix an error handling path in 'tpm_tis_core_init()'
	can: softing: softing_startstop(): fix set but not used variable warning
	can: xilinx_can: xcan_probe(): check for error irq
	pcmcia: fix setting of kthread task states
	iwlwifi: mvm: Use div_s64 instead of do_div in iwl_mvm_ftm_rtt_smoothing()
	net: mcs7830: handle usb read errors properly
	ext4: avoid trim error on fs with small groups
	ALSA: jack: Add missing rwsem around snd_ctl_remove() calls
	ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls
	ALSA: hda: Add missing rwsem around snd_ctl_remove() calls
	RDMA/bnxt_re: Scan the whole bitmap when checking if "disabling RCFW with pending cmd-bit"
	RDMA/hns: Validate the pkey index
	scsi: pm80xx: Update WARN_ON check in pm8001_mpi_build_cmd()
	clk: imx8mn: Fix imx8mn_clko1_sels
	powerpc/prom_init: Fix improper check of prom_getprop()
	ASoC: uniphier: drop selecting non-existing SND_SOC_UNIPHIER_AIO_DMA
	dt-bindings: thermal: Fix definition of cooling-maps contribution property
	powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
	powerpc/perf: MMCR0 control for PMU registers under PMCC=00
	powerpc/perf: move perf irq/nmi handling details into traps.c
	powerpc/irq: Add helper to set regs->softe
	powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
	powerpc/32s: Fix shift-out-of-bounds in KASAN init
	clocksource: Reduce clocksource-skew threshold
	clocksource: Avoid accidental unstable marking of clocksources
	ALSA: oss: fix compile error when OSS_DEBUG is enabled
	ALSA: usb-audio: Drop superfluous '0' in Presonus Studio 1810c's ID
	char/mwave: Adjust io port register size
	binder: fix handling of error during copy
	openrisc: Add clone3 ABI wrapper
	iommu/io-pgtable-arm: Fix table descriptor paddr formatting
	scsi: ufs: Fix race conditions related to driver data
	RDMA/qedr: Fix reporting max_{send/recv}_wr attrs
	PCI/MSI: Fix pci_irq_vector()/pci_irq_get_affinity()
	powerpc/powermac: Add additional missing lockdep_register_key()
	RDMA/core: Let ib_find_gid() continue search even after empty entry
	RDMA/cma: Let cma_resolve_ib_dev() continue search even after empty entry
	ASoC: rt5663: Handle device_property_read_u32_array error codes
	of: unittest: fix warning on PowerPC frame size warning
	of: unittest: 64 bit dma address test requires arch support
	clk: stm32: Fix ltdc's clock turn off by clk_disable_unused() after system enter shell
	mips: add SYS_HAS_CPU_MIPS64_R5 config for MIPS Release 5 support
	mips: fix Kconfig reference to PHYS_ADDR_T_64BIT
	dmaengine: pxa/mmp: stop referencing config->slave_id
	iommu/amd: Remove iommu_init_ga()
	iommu/amd: Restore GA log/tail pointer on host resume
	ASoC: Intel: catpt: Test dmaengine_submit() result before moving on
	iommu/iova: Fix race between FQ timeout and teardown
	scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()
	phy: uniphier-usb3ss: fix unintended writing zeros to PHY register
	ASoC: mediatek: Check for error clk pointer
	ASoC: samsung: idma: Check of ioremap return value
	misc: lattice-ecp3-config: Fix task hung when firmware load failed
	counter: stm32-lptimer-cnt: remove iio counter abi
	arm64: tegra: Fix Tegra194 HDA {clock,reset}-names ordering
	arm64: tegra: Remove non existent Tegra194 reset
	mips: lantiq: add support for clk_set_parent()
	mips: bcm63xx: add support for clk_set_parent()
	powerpc/xive: Add missing null check after calling kmalloc
	ASoC: fsl_mqs: fix MODULE_ALIAS
	RDMA/cxgb4: Set queue pair state when being queried
	ASoC: fsl_asrc: refine the check of available clock divider
	clk: bm1880: remove kfrees on static allocations
	of: base: Fix phandle argument length mismatch error message
	ARM: dts: omap3-n900: Fix lp5523 for multi color
	Bluetooth: Fix debugfs entry leak in hci_register_dev()
	fs: dlm: filter user dlm messages for kernel locks
	drm/lima: fix warning when CONFIG_DEBUG_SG=y & CONFIG_DMA_API_DEBUG=y
	selftests/bpf: Fix bpf_object leak in skb_ctx selftest
	ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply
	drm/bridge: dw-hdmi: handle ELD when DRM_BRIDGE_ATTACH_NO_CONNECTOR
	drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR
	media: atomisp: fix try_fmt logic
	media: atomisp: set per-device's default mode
	media: atomisp-ov2680: Fix ov2680_set_fmt() clobbering the exposure
	ARM: shmobile: rcar-gen2: Add missing of_node_put()
	batman-adv: allow netlink usage in unprivileged containers
	media: atomisp: handle errors at sh_css_create_isp_params()
	ath11k: Fix crash caused by uninitialized TX ring
	usb: gadget: f_fs: Use stream_open() for endpoint files
	drm: panel-orientation-quirks: Add quirk for the Lenovo Yoga Book X91F/L
	HID: apple: Do not reset quirks when the Fn key is not found
	media: b2c2: Add missing check in flexcop_pci_isr:
	EDAC/synopsys: Use the quirk for version instead of ddr version
	ARM: imx: rename DEBUG_IMX21_IMX27_UART to DEBUG_IMX27_UART
	drm/amd/display: check top_pipe_to_program pointer
	drm/amdgpu/display: set vblank_disable_immediate for DC
	soc: ti: pruss: fix referenced node in error message
	mlxsw: pci: Add shutdown method in PCI driver
	drm/bridge: megachips: Ensure both bridges are probed before registration
	tty: serial: imx: disable UCR4_OREN in .stop_rx() instead of .shutdown()
	gpiolib: acpi: Do not set the IRQ type if the IRQ is already in use
	HSI: core: Fix return freed object in hsi_new_client
	crypto: jitter - consider 32 LSB for APT
	mwifiex: Fix skb_over_panic in mwifiex_usb_recv()
	rsi: Fix use-after-free in rsi_rx_done_handler()
	rsi: Fix out-of-bounds read in rsi_read_pkt()
	ath11k: Avoid NULL ptr access during mgmt tx cleanup
	media: venus: avoid calling core_clk_setrate() concurrently during concurrent video sessions
	ACPI / x86: Drop PWM2 device on Lenovo Yoga Book from always present table
	ACPI: Change acpi_device_always_present() into acpi_device_override_status()
	ACPI / x86: Allow specifying acpi_device_override_status() quirks by path
	ACPI / x86: Add not-present quirk for the PCI0.SDHB.BRC1 device on the GPD win
	arm64: dts: ti: j7200-main: Fix 'dtbs_check' serdes_ln_ctrl node
	usb: uhci: add aspeed ast2600 uhci support
	floppy: Add max size check for user space request
	x86/mm: Flush global TLB when switching to trampoline page-table
	drm: rcar-du: Fix CRTC timings when CMM is used
	media: uvcvideo: Increase UVC_CTRL_CONTROL_TIMEOUT to 5 seconds.
	media: rcar-vin: Update format alignment constraints
	media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach()
	media: m920x: don't use stack on USB reads
	thunderbolt: Runtime PM activate both ends of the device link
	iwlwifi: mvm: synchronize with FW after multicast commands
	iwlwifi: mvm: avoid clearing a just saved session protection id
	ath11k: avoid deadlock by change ieee80211_queue_work for regd_update_work
	ath10k: Fix tx hanging
	net-sysfs: update the queue counts in the unregistration path
	net: phy: prefer 1000baseT over 1000baseKX
	gpio: aspeed: Convert aspeed_gpio.lock to raw_spinlock
	selftests/ftrace: make kprobe profile testcase description unique
	ath11k: Avoid false DEADLOCK warning reported by lockdep
	x86/mce: Allow instrumentation during task work queueing
	x86/mce: Mark mce_panic() noinstr
	x86/mce: Mark mce_end() noinstr
	x86/mce: Mark mce_read_aux() noinstr
	net: bonding: debug: avoid printing debug logs when bond is not notifying peers
	bpf: Do not WARN in bpf_warn_invalid_xdp_action()
	HID: quirks: Allow inverting the absolute X/Y values
	media: igorplugusb: receiver overflow should be reported
	media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach()
	mmc: core: Fixup storing of OCR for MMC_QUIRK_NONSTD_SDIO
	audit: ensure userspace is penalized the same as the kernel when under pressure
	arm64: dts: ls1028a-qds: move rtc node to the correct i2c bus
	arm64: tegra: Adjust length of CCPLEX cluster MMIO region
	PM: runtime: Add safety net to supplier device release
	cpufreq: Fix initialization of min and max frequency QoS requests
	usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0
	ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream
	rtw88: 8822c: update rx settings to prevent potential hw deadlock
	PM: AVS: qcom-cpr: Use div64_ul instead of do_div
	iwlwifi: fix leaks/bad data after failed firmware load
	iwlwifi: remove module loading failure message
	iwlwifi: mvm: Fix calculation of frame length
	iwlwifi: pcie: make sure prph_info is set when treating wakeup IRQ
	um: registers: Rename function names to avoid conflicts and build problems
	ath11k: Fix napi related hang
	Bluetooth: vhci: Set HCI_QUIRK_VALID_LE_STATES
	xfrm: rate limit SA mapping change message to user space
	drm/etnaviv: consider completed fence seqno in hang check
	jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
	ACPICA: actypes.h: Expand the ACPI_ACCESS_ definitions
	ACPICA: Utilities: Avoid deleting the same object twice in a row
	ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
	ACPICA: Fix wrong interpretation of PCC address
	ACPICA: Hardware: Do not flush CPU cache when entering S4 and S5
	drm/amdgpu: fixup bad vram size on gmc v8
	amdgpu/pm: Make sysfs pm attributes as read-only for VFs
	ACPI: battery: Add the ThinkPad "Not Charging" quirk
	btrfs: remove BUG_ON() in find_parent_nodes()
	btrfs: remove BUG_ON(!eie) in find_parent_nodes
	net: mdio: Demote probed message to debug print
	mac80211: allow non-standard VHT MCS-10/11
	dm btree: add a defensive bounds check to insert_at()
	dm space map common: add bounds check to sm_ll_lookup_bitmap()
	mlxsw: pci: Avoid flow control for EMAD packets
	net: phy: marvell: configure RGMII delays for 88E1118
	net: gemini: allow any RGMII interface mode
	regulator: qcom_smd: Align probe function with rpmh-regulator
	serial: pl010: Drop CR register reset on set_termios
	serial: core: Keep mctrl register state and cached copy in sync
	random: do not throw away excess input to crng_fast_load
	parisc: Avoid calling faulthandler_disabled() twice
	x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs
	powerpc/6xx: add missing of_node_put
	powerpc/powernv: add missing of_node_put
	powerpc/cell: add missing of_node_put
	powerpc/btext: add missing of_node_put
	powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
	i2c: i801: Don't silently correct invalid transfer size
	powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
	i2c: mpc: Correct I2C reset procedure
	clk: meson: gxbb: Fix the SDM_EN bit for MPLL0 on GXBB
	powerpc/powermac: Add missing lockdep_register_key()
	KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
	KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
	w1: Misuse of get_user()/put_user() reported by sparse
	nvmem: core: set size for sysfs bin file
	dm: fix alloc_dax error handling in alloc_dev
	scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup
	ALSA: seq: Set upper limit of processed events
	MIPS: Loongson64: Use three arguments for slti
	powerpc/40x: Map 32Mbytes of memory at startup
	selftests/powerpc/spectre_v2: Return skip code when miss_percent is high
	powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
	powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
	udf: Fix error handling in udf_new_inode()
	MIPS: OCTEON: add put_device() after of_find_device_by_node()
	irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
	i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters
	MIPS: Octeon: Fix build errors using clang
	scsi: sr: Don't use GFP_DMA
	ASoC: mediatek: mt8173: fix device_node leak
	ASoC: mediatek: mt8183: fix device_node leak
	phy: mediatek: Fix missing check in mtk_mipi_tx_probe
	rpmsg: core: Clean up resources on announce_create failure.
	crypto: omap-aes - Fix broken pm_runtime_and_get() usage
	crypto: stm32/crc32 - Fix kernel BUG triggered in probe()
	crypto: caam - replace this_cpu_ptr with raw_cpu_ptr
	ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
	tpm: fix NPE on probe for missing device
	spi: uniphier: Fix a bug that doesn't point to private data correctly
	xen/gntdev: fix unmap notification order
	fuse: Pass correct lend value to filemap_write_and_wait_range()
	serial: Fix incorrect rs485 polarity on uart open
	cputime, cpuacct: Include guest time in user time in cpuacct.stat
	tracing/kprobes: 'nmissed' not showed correctly for kretprobe
	iwlwifi: mvm: Increase the scan timeout guard to 30 seconds
	s390/mm: fix 2KB pgtable release race
	device property: Fix fwnode_graph_devcon_match() fwnode leak
	drm/etnaviv: limit submit sizes
	drm/nouveau/kms/nv04: use vzalloc for nv04_display
	drm/bridge: analogix_dp: Make PSR-exit block less
	parisc: Fix lpa and lpa_user defines
	powerpc/64s/radix: Fix huge vmap false positive
	PCI: xgene: Fix IB window setup
	PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
	PCI: pci-bridge-emul: Make expansion ROM Base Address register read-only
	PCI: pci-bridge-emul: Properly mark reserved PCIe bits in PCI config space
	PCI: pci-bridge-emul: Fix definitions of reserved bits
	PCI: pci-bridge-emul: Correctly set PCIe capabilities
	PCI: pci-bridge-emul: Set PCI_STATUS_CAP_LIST for PCIe device
	xfrm: fix policy lookup for ipv6 gre packets
	btrfs: fix deadlock between quota enable and other quota operations
	btrfs: check the root node for uptodate before returning it
	btrfs: respect the max size in the header when activating swap file
	ext4: make sure to reset inode lockdep class when quota enabling fails
	ext4: make sure quota gets properly shutdown on error
	ext4: fix a possible ABBA deadlock due to busy PA
	ext4: initialize err_blk before calling __ext4_get_inode_loc
	ext4: fix fast commit may miss tracking range for FALLOC_FL_ZERO_RANGE
	ext4: set csum seed in tmp inode while migrating to extents
	ext4: Fix BUG_ON in ext4_bread when write quota data
	ext4: use ext4_ext_remove_space() for fast commit replay delete range
	ext4: fast commit may miss tracking unwritten range during ftruncate
	ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
	ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
	ext4: don't use the orphan list when migrating an inode
	drm/radeon: fix error handling in radeon_driver_open_kms
	of: base: Improve argument length mismatch error
	firmware: Update Kconfig help text for Google firmware
	can: mcp251xfd: mcp251xfd_tef_obj_read(): fix typo in error message
	media: rcar-csi2: Optimize the selection PHTW register
	drm/vc4: hdmi: Make sure the device is powered with CEC
	media: correct MEDIA_TEST_SUPPORT help text
	Documentation: dmaengine: Correctly describe dmatest with channel unset
	Documentation: ACPI: Fix data node reference documentation
	Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
	Documentation: fix firewire.rst ABI file path error
	Bluetooth: hci_sync: Fix not setting adv set duration
	scsi: core: Show SCMD_LAST in text form
	dmaengine: uniphier-xdmac: Fix type of address variables
	RDMA/hns: Modify the mapping attribute of doorbell to device
	RDMA/rxe: Fix a typo in opcode name
	dmaengine: stm32-mdma: fix STM32_MDMA_CTBR_TSEL_MASK
	Revert "net/mlx5: Add retry mechanism to the command entry index allocation"
	powerpc/cell: Fix clang -Wimplicit-fallthrough warning
	powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
	block: Fix fsync always failed if once failed
	bpftool: Remove inclusion of utilities.mak from Makefiles
	xdp: check prog type before updating BPF link
	perf evsel: Override attr->sample_period for non-libpfm4 events
	ipv4: update fib_info_cnt under spinlock protection
	ipv4: avoid quadratic behavior in netns dismantle
	net/fsl: xgmac_mdio: Add workaround for erratum A-009885
	net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
	parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries
	f2fs: compress: fix potential deadlock of compress file
	f2fs: fix to reserve space for IO align feature
	af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress
	clk: Emit a stern warning with writable debugfs enabled
	clk: si5341: Fix clock HW provider cleanup
	net/smc: Fix hung_task when removing SMC-R devices
	net: axienet: increase reset timeout
	net: axienet: Wait for PhyRstCmplt after core reset
	net: axienet: reset core on initialization prior to MDIO access
	net: axienet: add missing memory barriers
	net: axienet: limit minimum TX ring size
	net: axienet: Fix TX ring slot available check
	net: axienet: fix number of TX ring slots for available check
	net: axienet: fix for TX busy handling
	net: axienet: increase default TX ring size to 128
	HID: vivaldi: fix handling devices not using numbered reports
	rtc: pxa: fix null pointer dereference
	vdpa/mlx5: Fix wrong configuration of virtio_version_1_0
	virtio_ring: mark ring unused on error
	taskstats: Cleanup the use of task->exit_code
	inet: frags: annotate races around fqdir->dead and fqdir->high_thresh
	netns: add schedule point in ops_exit_list()
	xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
	gre: Don't accidentally set RTO_ONLINK in gre_fill_metadata_dst()
	libcxgb: Don't accidentally set RTO_ONLINK in cxgb_find_route()
	perf script: Fix hex dump character output
	dmaengine: at_xdmac: Don't start transactions at tx_submit level
	dmaengine: at_xdmac: Start transfer for cyclic channels in issue_pending
	dmaengine: at_xdmac: Print debug message after realeasing the lock
	dmaengine: at_xdmac: Fix concurrency over xfers_list
	dmaengine: at_xdmac: Fix lld view setting
	dmaengine: at_xdmac: Fix at_xdmac_lld struct definition
	perf probe: Fix ppc64 'perf probe add events failed' case
	devlink: Remove misleading internal_flags from health reporter dump
	arm64: dts: qcom: msm8996: drop not documented adreno properties
	net: bonding: fix bond_xmit_broadcast return value error bug
	net_sched: restore "mpu xxx" handling
	bcmgenet: add WOL IRQ check
	net: ethernet: mtk_eth_soc: fix error checking in mtk_mac_config()
	net: sfp: fix high power modules without diagnostic monitoring
	net: mscc: ocelot: fix using match before it is set
	dt-bindings: display: meson-dw-hdmi: add missing sound-name-prefix property
	dt-bindings: display: meson-vpu: Add missing amlogic,canvas property
	dt-bindings: watchdog: Require samsung,syscon-phandle for Exynos7
	scripts/dtc: dtx_diff: remove broken example from help text
	lib82596: Fix IRQ check in sni_82596_probe
	mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault
	lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test
	mtd: nand: bbt: Fix corner case in bad block table handling
	ath10k: Fix the MTU size on QCA9377 SDIO
	scripts: sphinx-pre-install: add required ctex dependency
	scripts: sphinx-pre-install: Fix ctex support on Debian
	Linux 5.10.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I857f2417c899508815a1ba13d1285fd400a1f133
2022-01-27 11:49:22 +01:00
Filipe Manana
f8c3ec2e21 btrfs: respect the max size in the header when activating swap file
commit c2f822635df873c510bda6fb7fd1b10b7c31be2d upstream.

If we extended the size of a swapfile after its header was created (by the
mkswap utility) and then try to activate it, we will map the entire file
when activating the swap file, instead of limiting to the max size defined
in the swap file's header.

Currently test case generic/643 from fstests fails because we do not
respect that size limit defined in the swap file's header.

So fix this by not mapping file ranges beyond the max size defined in the
swap header.

This is the same type of bug that iomap used to have, and was fixed in
commit 36ca7943ac18ae ("mm/swap: consider max pages in
iomap_swapfile_add_extent").

Fixes: ed46ff3d42 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:54:27 +01:00
Josef Bacik
e7764bccae btrfs: check the root node for uptodate before returning it
commit 120de408e4b97504a2d9b5ca534b383de2c73d49 upstream.

Now that we clear the extent buffer uptodate if we fail to write it out
we need to check to see if our root node is uptodate before we search
down it.  Otherwise we could return stale data (or potentially corrupt
data that was caught by the write verification step) and think that the
path is OK to search down.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:54:27 +01:00
Filipe Manana
09e0ef287e btrfs: fix deadlock between quota enable and other quota operations
commit 232796df8c1437c41d308d161007f0715bac0a54 upstream.

When enabling quotas, we attempt to commit a transaction while holding the
mutex fs_info->qgroup_ioctl_lock. This can result on a deadlock with other
quota operations such as:

- qgroup creation and deletion, ioctl BTRFS_IOC_QGROUP_CREATE;

- adding and removing qgroup relations, ioctl BTRFS_IOC_QGROUP_ASSIGN.

This is because these operations join a transaction and after that they
attempt to lock the mutex fs_info->qgroup_ioctl_lock. Acquiring that mutex
after joining or starting a transaction is a pattern followed everywhere
in qgroups, so the quota enablement operation is the one at fault here,
and should not commit a transaction while holding that mutex.

Fix this by making the transaction commit while not holding the mutex.
We are safe from two concurrent tasks trying to enable quotas because
we are serialized by the rw semaphore fs_info->subvol_sem at
btrfs_ioctl_quota_ctl(), which is the only call site for enabling
quotas.

When this deadlock happens, it produces a trace like the following:

  INFO: task syz-executor:25604 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:24800 pid:25604 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  btrfs_commit_transaction+0x994/0x2e90 fs/btrfs/transaction.c:2201
  btrfs_quota_enable+0x95c/0x1790 fs/btrfs/qgroup.c:1120
  btrfs_ioctl_quota_ctl fs/btrfs/ioctl.c:4229 [inline]
  btrfs_ioctl+0x637e/0x7b70 fs/btrfs/ioctl.c:5010
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f86920b2c4d
  RSP: 002b:00007f868f61ac58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007f86921d90a0 RCX: 00007f86920b2c4d
  RDX: 0000000020005e40 RSI: 00000000c0109428 RDI: 0000000000000008
  RBP: 00007f869212bd80 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f86921d90a0
  R13: 00007fff6d233e4f R14: 00007fff6d233ff0 R15: 00007f868f61adc0
  INFO: task syz-executor:25628 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:29080 pid:25628 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
  __mutex_lock_common kernel/locking/mutex.c:669 [inline]
  __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
  btrfs_remove_qgroup+0xb7/0x7d0 fs/btrfs/qgroup.c:1548
  btrfs_ioctl_qgroup_create fs/btrfs/ioctl.c:4333 [inline]
  btrfs_ioctl+0x683c/0x7b70 fs/btrfs/ioctl.c:5014
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: Hao Sun <sunhao.th@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsZQF19bQ1C6=yetF3BvL10OSORpFUcWXTP6HErshDB4dQ@mail.gmail.com/
Fixes: 340f1aa27f ("btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable")
CC: stable@vger.kernel.org # 4.19
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:54:27 +01:00
Josef Bacik
6b22c9824d btrfs: remove BUG_ON(!eie) in find_parent_nodes
[ Upstream commit 9f05c09d6baef789726346397438cca4ec43c3ee ]

If we're looking for leafs that point to a data extent we want to record
the extent items that point at our bytenr.  At this point we have the
reference and we know for a fact that this leaf should have a reference
to our bytenr.  However if there's some sort of corruption we may not
find any references to our leaf, and thus could end up with eie == NULL.
Replace this BUG_ON() with an ASSERT() and then return -EUCLEAN for the
mortals.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27 10:54:19 +01:00
Josef Bacik
623c65bc73 btrfs: remove BUG_ON() in find_parent_nodes()
[ Upstream commit fcba0120edf88328524a4878d1d6f4ad39f2ec81 ]

We search for an extent entry with .offset = -1, which shouldn't be a
thing, but corruption happens.  Add an ASSERT() for the developers,
return -EUCLEAN for mortals.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27 10:54:19 +01:00
Greg Kroah-Hartman
f45f895af5 Merge branch 'android12-5.10' into android12-5.10-lts
Sync up with android12-5.10 for the following commits:

2c152aa329 UPSTREAM: f2fs: reduce the scope of setting fsck tag when de->name_len is zero
c29dd368ef ANDROID: GKI: Update symbols to abi_gki_aarch64_oplus
428d0bb762 ANDROID: Add initial ASUS symbol list
87a74496ed ANDROID: configfs: add proper module namespace marking
b7a6c15a6f ANDROID: Configure out the macros in android_kabi and android_vendor
bdc772adbd ANDROID: kernel: fix debug_kinfo set twice crash issue
d483eed85f ANDROID: GKI: set vfs-only exports into their own namespace
27fc5a7c69 UPSTREAM: net/packet: rx_owner_map depends on pg_vec
f70ea63f3b ANDROID: GKI: Update symbols to symbol list
a593acdae8 FROMLIST: module.h: allow #define strings to work with MODULE_IMPORT_NS
eb171b4cbe FROMLIST: export: fix string handling of namespace in EXPORT_SYMBOL_NS
05c23b7a50 ANDROID: vendor_hooks: Add hooks for binder
e99926fdfa ANDROID: mm/oom_kill: allow process_mrelease reclaim memory in parallel with exit_mmap
f4f2c619d5 FROMLIST: mm/oom_kill: allow process_mrelease to run under mmap_lock protection
2452622293 FROMLIST: mm: protect free_pgtables with mmap_lock write lock in exit_mmap
fd7af95538 UPSTREAM: mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
fe50dcab7a UPSTREAM: mm: wire up syscall process_mrelease
7fc3ac4968 UPSTREAM: mm: introduce process_mrelease system call
ac44888155 Revert "FROMGIT: mm: improve mprotect(R|W) efficiency on pages referenced once"
3a624c9ccd ANDROID: fips140: add show_invalid_inputs command to fips140_lab_util
a481d43521 ANDROID: fips140: refactor and rename fips140_lab_test
d4b5ca56b5 ANDROID: GKI: add lenovo symbol list
47874cc690 ANDROID: abi_gki_aarch64_qcom: Add rproc_set_firmware
c41767a8ec UPSTREAM: remoteproc: Add a rproc_set_firmware() API
28d62c68d1 FROMGIT: iommu/io-pgtable-arm-v7s: Add error handle for page table allocation failure
99ad261273 UPSTREAM: sctp: add param size validation for SCTP_PARAM_SET_PRIMARY
282a4de8f0 UPSTREAM: sctp: validate chunk size in __rcv_asconf_lookup
fef7dba3a7 UPSTREAM: bpf: Fix integer overflow in prealloc_elems_and_freelist()
893425f545 ANDROID: GKI: Update symbol list
cef0df2218 ANDROID: ABI: update allowed list for galaxy
a7ab784f60 ANDROID: vendor_hooks: Add hooks for futex
84fc3abca0 ANDROID: dma-contiguous: Add tracehook to allow subpage allocations in dma_alloc_contiguous
d94655c43e ANDROID: Update the ABI xml and symbol list
414c32d38e UPSTREAM: ALSA: memalloc: Align buffer allocations in page size
75617df5b3 ANDROID: Fix mmu_notifier_trylock definition for !CONFIG_MMU_NOTIFIER config
7531e63661 FROMGIT: USB: gadget: bRequestType is a bitfield, not a enum
70c9301d9c ANDROID: qcom: Add flush_delayed_fput to ABI
5d8520b557 ANDROID: fix ABI breakage caused by mm_struct->mmu_notifier_lock addition
a4d26b9a4b ANDROID: fix ABI breakage caused by percpu_rw_semaphore changes
6971350406 ANDROID: fix mmu_notifier race caused by not taking mmap_lock during SPF
2fc2c66b9c ANDROID: percpu-rwsem: enable percpu_sem destruction in atomic context
f3f87608d8 FROMLIST: virtio_mmio: pm: Add notification handlers for restore and freeze
9180348b91 FROMLIST: virtio: do not reset stateful devices on resume
392cb940f6 FROMGIT: f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
ddd9e01504 UPSTREAM: mm, slub: fix incorrect memcg slab count for bulk free
82ac5b0b1d UPSTREAM: mm, slub: fix potential use-after-free in slab_debugfs_fops
e07a663f5d UPSTREAM: mm, slub: fix potential memoryleak in kmem_cache_open()
cd02f347ab UPSTREAM: mm, slub: fix mismatch between reconstructed freelist depth and cnt
6b6725f77d UPSTREAM: mm, slub: fix two bugs in slab_debug_trace_open()
791f85d16d UPSTREAM: mm, slub: allocate private object map for debugfs listings
1260b47d4f FROMGIT: dma-buf: remove restriction of IOCTL:DMA_BUF_SET_NAME
e80be54e4b UPSTREAM: usb: dwc3: core: balance phy init and exit
89137e0047 UPSTREAM: xhci: Fix failure to give back some cached cancelled URBs.
f37b6d79f8 ANDROID: mm/memory_hotplug: Don't special case memory_block_size_bytes
8b7ffd60a5 UPSTREAM: usb: gadget: uvc: fix multiple opens
ae22ebebbb UPSTREAM: aio: fix use-after-free due to missing POLLFREE handling
b9c8788830 UPSTREAM: aio: keep poll requests on waitqueue until completed
f965176884 UPSTREAM: signalfd: use wake_up_pollfree()
49744a390d UPSTREAM: binder: use wake_up_pollfree()
e50fe1de2f UPSTREAM: wait: add wake_up_pollfree()
53afb231f5 UPSTREAM: USB: gadget: zero allocate endpoint 0 buffers
593309a377 BACKPORT: scsi: ufs: Improve SCSI abort handling further
21949c429a FROMGIT: scsi: ufs: Introduce ufshcd_release_scsi_cmd()
d600bdedac FROMGIT: scsi: ufs: Remove the 'update_scaling' local variable
5f9614157c UPSTREAM: scsi: ufs: core: Fix another task management completion race
76760a995c BACKPORT: scsi: ufs: core: Fix task management completion timeout race
dab2a8a288 ANDROID: qcom: Add android_rvh_do_ptrauth_fault to ABI
b4604acd52 UPSTREAM: USB: gadget: detect too-big endpoint 0 requests
2d6a43c036 ANDROID: ABI: Add symbols used by frame buffer driver
183905923f UPSTREAM: xhci: Add bus number to some debug messages
5b15c955a6 UPSTREAM: xhci: Add additional dynamic debug to follow URBs in cancel and error cases.
f4cbe34956 UPSTREAM: Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
c23b0e7c47 UPSTREAM: xhci: Fix failure to give back some cached cancelled URBs.
7320fb1abd UPSTREAM: HID: check for valid USB device for many HID drivers
e98c96b8b8 UPSTREAM: HID: wacom: fix problems when device is not a valid USB device
5a72ef56c8 UPSTREAM: HID: bigbenff: prevent null pointer dereference
7b8a19b917 UPSTREAM: HID: add USB_HID dependancy on some USB HID drivers
8219b106a3 UPSTREAM: HID: add USB_HID dependancy to hid-chicony
a4909c90b7 UPSTREAM: HID: add USB_HID dependancy to hid-prodikeys
ddea17081f UPSTREAM: HID: add hid_is_usb() function to make it simpler for USB detection
81b6ea435e FROMGIT: clk: Don't parent clks until the parent is fully registered
78ea29e570 UPSTREAM: mm/gup: remove the vma allocation from gup_longterm_locked()
709fde7c61 BACKPORT: usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
ea4a92c6af ANDROID: ABI: Add iio_write_channel_raw symbol
f803f248dd ANDROID: GKI: Update symbols to symbol list
672d51b2a7 UPSTREAM: ASoC: dapm: use component prefix when checking widget names
686cd3b2d8 ANDROID: ABI: Add symbols used by clocksource driver
80c1cef2d1 ANDROID: GKI: Export clocksource_mmio_init
84881c68b1 ANDROID: GKI: Export sched_clock_register
9e5446d7de FROMGIT: f2fs: show number of pending discard commands
28de741861 ANDROID: workqueue: export symbol of the function wq_worker_comm()
71f00d64c1 ANDROID: GKI: Update symbols to symbol list
05aa93d251 ANDROID: vendor_hooks: Add hooks for binder proc transaction

Also update the .xml file for the following new symbols that are now
being tracked:

Leaf changes summary: 165 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 154 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 11 Added variables

154 Added functions:

  [A] 'function void __bforget(buffer_head*)'
  [A] 'function ssize_t __blockdev_direct_IO(kiocb*, inode*, block_device*, iov_iter*, get_block_t*, dio_iodone_t*, dio_submit_t*, int)'
  [A] 'function buffer_head* __bread_gfp(block_device*, sector_t, unsigned int, gfp_t)'
  [A] 'function void __breadahead(block_device*, sector_t, unsigned int)'
  [A] 'function void __brelse(buffer_head*)'
  [A] 'function void __cancel_dirty_page(page*)'
  [A] 'function void __cleancache_invalidate_inode(address_space*)'
  [A] 'function void __filemap_set_wb_err(address_space*, int)'
  [A] 'function void __insert_inode_hash(inode*, unsigned long int)'
  [A] 'function void __mark_inode_dirty(inode*, int)'
  [A] 'function void __pagevec_release(pagevec*)'
  [A] 'function void __remove_inode_hash(inode*)'
  [A] 'function int __set_page_dirty_buffers(page*)'
  [A] 'function int __set_page_dirty_nobuffers(page*)'
  [A] 'function int __test_set_page_writeback(page*, bool)'
  [A] 'function int __traceiter_android_vh_binder_free_proc(void*, binder_proc*)'
  [A] 'function int __traceiter_android_vh_binder_has_work_ilocked(void*, binder_thread*, bool, int*)'
  [A] 'function int __traceiter_android_vh_binder_looper_state_registered(void*, binder_thread*, binder_proc*)'
  [A] 'function int __traceiter_android_vh_binder_proc_transaction_end(void*, task_struct*, task_struct*, task_struct*, unsigned int, bool, bool)'
  [A] 'function int __traceiter_android_vh_binder_read_done(void*, binder_proc*, binder_thread*)'
  [A] 'function int __traceiter_android_vh_binder_thread_read(void*, list_head**, binder_proc*, binder_thread*)'
  [A] 'function int __traceiter_android_vh_binder_thread_release(void*, binder_proc*, binder_thread*)'
  [A] 'function int __traceiter_android_vh_futex_sleep_start(void*, task_struct*)'
  [A] 'function int __traceiter_block_bio_remap(void*, request_queue*, bio*, dev_t, sector_t)'
  [A] 'function int add_to_page_cache_locked(page*, address_space*, unsigned long int, gfp_t)'
  [A] 'function bio* bio_split(bio*, int, gfp_t, bio_set*)'
  [A] 'function wait_queue_head* bit_waitqueue(void*, int)'
  [A] 'function blk_plug_cb* blk_check_plugged(blk_plug_cb_fn, void*, int)'
  [A] 'function void blk_queue_max_write_same_sectors(request_queue*, unsigned int)'
  [A] 'function int blkdev_issue_discard(block_device*, sector_t, sector_t, gfp_t, unsigned long int)'
  [A] 'function void block_invalidatepage(page*, unsigned int, unsigned int)'
  [A] 'function int block_is_partially_uptodate(page*, unsigned long int, unsigned long int)'
  [A] 'function int buffer_migrate_page(address_space*, page*, page*, migrate_mode)'
  [A] 'function bool capable_wrt_inode_uidgid(const inode*, int)'
  [A] 'function void clean_bdev_aliases(block_device*, sector_t, sector_t)'
  [A] 'function void clear_inode(inode*)'
  [A] 'function int clear_page_dirty_for_io(page*)'
  [A] 'function int clk_set_duty_cycle(clk*, unsigned int, unsigned int)'
  [A] 'function int clocksource_mmio_init(void*, const char*, unsigned long int, int, unsigned int, typedef u64 (clocksource*)*)'
  [A] 'function u64 clocksource_mmio_readl_up(clocksource*)'
  [A] 'function void create_empty_buffers(page*, unsigned long int, unsigned long int)'
  [A] 'function int current_umask()'
  [A] 'function dentry* d_add_ci(dentry*, inode*, qstr*)'
  [A] 'function void d_instantiate(dentry*, inode*)'
  [A] 'function dentry* d_obtain_alias(inode*)'
  [A] 'function dentry* d_splice_alias(inode*, dentry*)'
  [A] 'function void delete_from_page_cache(page*)'
  [A] 'function void disk_stack_limits(gendisk*, block_device*, sector_t)'
  [A] 'function void drop_nlink(inode*)'
  [A] 'function void end_buffer_write_sync(buffer_head*, int)'
  [A] 'function void end_page_writeback(page*)'
  [A] 'function errseq_t errseq_set(errseq_t*, int)'
  [A] 'function int fb_get_options(const char*, char**)'
  [A] 'function int fiemap_fill_next_extent(fiemap_extent_info*, u64, u64, u64, u32)'
  [A] 'function int fiemap_prep(inode*, fiemap_extent_info*, u64, u64*, u32)'
  [A] 'function int file_remove_privs(file*)'
  [A] 'function int file_update_time(file*)'
  [A] 'function int file_write_and_wait_range(file*, loff_t, loff_t)'
  [A] 'function vm_fault_t filemap_fault(vm_fault*)'
  [A] 'function int filemap_fdatawait_range(address_space*, loff_t, loff_t)'
  [A] 'function int filemap_fdatawrite(address_space*)'
  [A] 'function int filemap_flush(address_space*)'
  [A] 'function int filemap_write_and_wait_range(address_space*, loff_t, loff_t)'
  [A] 'function file* filp_open(const char*, int, umode_t)'
  [A] 'function void flush_delayed_fput()'
  [A] 'function int freq_qos_remove_notifier(freq_constraints*, freq_qos_req_type, notifier_block*)'
  [A] 'function int generic_error_remove_page(address_space*, page*)'
  [A] 'function ssize_t generic_file_direct_write(kiocb*, iov_iter*)'
  [A] 'function int generic_file_mmap(file*, vm_area_struct*)'
  [A] 'function int generic_file_open(inode*, file*)'
  [A] 'function ssize_t generic_file_splice_read(file*, loff_t*, pipe_inode_info*, size_t, unsigned int)'
  [A] 'function void generic_fillattr(inode*, kstat*)'
  [A] 'function ssize_t generic_read_dir(file*, char*, size_t, loff_t*)'
  [A] 'function page* grab_cache_page_write_begin(address_space*, unsigned long int, unsigned int)'
  [A] 'function inode* iget5_locked(super_block*, unsigned long int, int (inode*, void*)*, int (inode*, void*)*, void*)'
  [A] 'function inode* igrab(inode*)'
  [A] 'function void ihold(inode*)'
  [A] 'function int iio_write_channel_raw(iio_channel*, int)'
  [A] 'function inode* ilookup5(super_block*, unsigned long int, int (inode*, void*)*, void*)'
  [A] 'function int in_group_p(kgid_t)'
  [A] 'function void inc_nlink(inode*)'
  [A] 'function void init_special_inode(inode*, umode_t, dev_t)'
  [A] 'function void inode_dio_wait(inode*)'
  [A] 'function void inode_init_once(inode*)'
  [A] 'function void inode_init_owner(inode*, const inode*, umode_t)'
  [A] 'function int inode_newsize_ok(const inode*, loff_t)'
  [A] 'function void inode_set_flags(inode*, unsigned int, unsigned int)'
  [A] 'function void io_schedule()'
  [A] 'function void iov_iter_advance(iov_iter*, size_t)'
  [A] 'function unsigned long int iov_iter_alignment(const iov_iter*)'
  [A] 'function size_t iov_iter_copy_from_user_atomic(page*, iov_iter*, unsigned long int, size_t)'
  [A] 'function int iov_iter_fault_in_readable(iov_iter*, size_t)'
  [A] 'function ssize_t iov_iter_get_pages(iov_iter*, page**, size_t, unsigned int, size_t*)'
  [A] 'function size_t iov_iter_single_seg_count(const iov_iter*)'
  [A] 'function bool is_bad_inode(inode*)'
  [A] 'function ssize_t iter_file_splice_write(pipe_inode_info*, file*, loff_t*, size_t, unsigned int)'
  [A] 'function ino_t iunique(super_block*, ino_t)'
  [A] 'function void kill_block_super(super_block*)'
  [A] 'function void ll_rw_block(int, int, int, buffer_head**)'
  [A] 'function nls_table* load_nls(char*)'
  [A] 'function nls_table* load_nls_default()'
  [A] 'function void lru_cache_add(page*)'
  [A] 'function void make_bad_inode(inode*)'
  [A] 'function void mark_buffer_async_write(buffer_head*)'
  [A] 'function void mark_buffer_dirty(buffer_head*)'
  [A] 'function void mark_buffer_write_io_error(buffer_head*)'
  [A] 'function void mark_page_accessed(page*)'
  [A] 'function void mnt_drop_write_file(file*)'
  [A] 'function int mnt_want_write_file(file*)'
  [A] 'function dentry* mount_bdev(file_system_type*, int, const char*, void*, int (super_block*, void*, int)*)'
  [A] 'function void mpage_readahead(readahead_control*, get_block_t*)'
  [A] 'function int mpage_readpage(page*, get_block_t*)'
  [A] 'function int notify_change(dentry*, iattr*, inode**)'
  [A] 'function unsigned long int page_cache_next_miss(address_space*, unsigned long int, unsigned long int)'
  [A] 'function unsigned long int page_cache_prev_miss(address_space*, unsigned long int, unsigned long int)'
  [A] 'function bool page_mapped(page*)'
  [A] 'function int page_mkclean(page*)'
  [A] 'function void page_zero_new_buffers(page*, unsigned int, unsigned int)'
  [A] 'function page* pagecache_get_page(address_space*, unsigned long int, int, gfp_t)'
  [A] 'function unsigned int pagevec_lookup_range(pagevec*, address_space*, unsigned long int*, unsigned long int)'
  [A] 'function unsigned int pagevec_lookup_range_tag(pagevec*, address_space*, unsigned long int*, unsigned long int, xa_mark_t)'
  [A] 'function void put_pages_list(list_head*)'
  [A] 'function gfp_t readahead_gfp_mask(address_space*)'
  [A] 'function int redirty_page_for_writepage(writeback_control*, page*)'
  [A] 'function int rproc_set_firmware(rproc*, const char*)'
  [A] 'function int sb_min_blocksize(super_block*, int)'
  [A] 'function int sb_set_blocksize(super_block*, int)'
  [A] 'function void sched_clock_register(typedef u64 ()*, int, unsigned long int)'
  [A] 'function int security_inode_init_security(inode*, inode*, const qstr*, const initxattrs, void*)'
  [A] 'function void set_nlink(inode*, unsigned int)'
  [A] 'function int setattr_prepare(dentry*, iattr*)'
  [A] 'function blk_qc_t submit_bio_noacct(bio*)'
  [A] 'function int sync_dirty_buffer(buffer_head*)'
  [A] 'function int sync_filesystem(super_block*)'
  [A] 'function int sync_inode_metadata(inode*, int)'
  [A] 'function void tag_pages_for_writeback(address_space*, unsigned long int, unsigned long int)'
  [A] 'function timespec64 timestamp_truncate(timespec64, inode*)'
  [A] 'function void touch_atime(const path*)'
  [A] 'function void truncate_inode_pages(address_space*, loff_t)'
  [A] 'function void truncate_inode_pages_final(address_space*)'
  [A] 'function void truncate_pagecache(inode*, loff_t)'
  [A] 'function void truncate_setsize(inode*, loff_t)'
  [A] 'function int try_to_release_page(page*, gfp_t)'
  [A] 'function void try_to_writeback_inodes_sb(super_block*, wb_reason)'
  [A] 'function void unload_nls(nls_table*)'
  [A] 'function void unlock_buffer(buffer_head*)'
  [A] 'function void unlock_new_inode(inode*)'
  [A] 'function void usbnet_cdc_unbind(usbnet*, usb_interface*)'
  [A] 'function int usbnet_generic_cdc_bind(usbnet*, usb_interface*)'
  [A] 'function void wait_on_page_bit(page*, int)'
  [A] 'function int wake_bit_function(wait_queue_entry*, unsigned int, int, void*)'
  [A] 'function void wq_worker_comm(char*, size_t, task_struct*)'
  [A] 'function int write_inode_now(inode*, int)'
  [A] 'function int write_one_page(page*)'

11 Added variables:

  [A] 'tracepoint __tracepoint_android_rvh_do_ptrauth_fault'
  [A] 'tracepoint __tracepoint_android_vh_binder_free_proc'
  [A] 'tracepoint __tracepoint_android_vh_binder_has_work_ilocked'
  [A] 'tracepoint __tracepoint_android_vh_binder_looper_state_registered'
  [A] 'tracepoint __tracepoint_android_vh_binder_proc_transaction_end'
  [A] 'tracepoint __tracepoint_android_vh_binder_read_done'
  [A] 'tracepoint __tracepoint_android_vh_binder_thread_read'
  [A] 'tracepoint __tracepoint_android_vh_binder_thread_release'
  [A] 'tracepoint __tracepoint_android_vh_futex_sleep_start'
  [A] 'tracepoint __tracepoint_android_vh_subpage_dma_contig_alloc'
  [A] 'tracepoint __tracepoint_block_bio_remap'

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I064b6ff0420cee2d64e17814f43fbff8e2d9b019

Change-Id: Ia880c70c912f2b801a770feb1fdc4f4eb390d34d
2022-01-17 18:47:02 +01:00
Greg Kroah-Hartman
d483eed85f ANDROID: GKI: set vfs-only exports into their own namespace
We have namespaces, so use them for all vfs-exported namespaces so that
filesystems can use them, but not anything else.

Some in-kernel drivers that do direct filesystem accesses (because they
serve up files) are also allowed access to these symbols to keep 'make
allmodconfig' builds working properly, but it is not needed for Android
kernel images.

Bug: 157965270
Bug: 210074446
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaf6140baf3a18a516ab2d5c3966235c42f3f70de
2022-01-11 09:30:47 +01:00
Greg Kroah-Hartman
ba13eb1927 This is the 5.10.88 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmHC4kgACgkQONu9yGCS
 aT4l7w/8Da5EworPQ45NW5oEnNDU+tVjVft6yLyEjNhqe7/dXgden9MIhXl/rd38
 vIWHWxdR3g+h2aEBuQSFhkCYLgaiO822DWqx8XkK9F1EJWbypycb6IP/XVy7QqbX
 TyUakCg1hN0TcZ9qhkugRMqAmzt5X4K+R6LtvlfQT2GAlEchSzyAl0tvS594Oxi7
 tMo6SQfAdD+IWwg72t80odMANcoX7GIcv2Kn4UA+c8eDEdiD5MZQTbz6Xi28hKiE
 zN+GQ4wFgt0/+sWiKEGPW7hcs2hnIU/dEwmgim+mgQyMSx+d1EFNNPsOaXyuLAaE
 MLhhgyOn6ZYfAI6+2A3xDw6hnvHWT1xCDxfgfDhz24sHSQY5K96pFAB3DXsKm0nC
 LtpHZopqJ9T0t1JkPpHCqoXXmEZmvHUlYVWwvQAwDBbqEkJ/i/w9QTlSDDP1+K6r
 CLhGWo9titrpcpRLnbcrweXgjuiI13Rz5mlgKCafE/jEtR5jzZ2neBvTepzuGVrY
 OW6jVmOXmVQEl9O73uWUvTizyUq+nl78R1lr9Xva0SpX/OXKKfO3q2w36tTQNyf+
 /g7+sgTJcLax1eZh0n17xiKzQYzwn5FMYPhzUF6fc0Rrz1CUCt1nCjq28hMqG32v
 dOFYqU/CrfFMKfR0YiKI7tpSba8QNP8f0Ig+KzDE2dKA+Lty1kw=
 =fpH+
 -----END PGP SIGNATURE-----

Merge 5.10.88 into android12-5.10-lts

Changes in 5.10.88
	KVM: selftests: Make sure kvm_create_max_vcpus test won't hit RLIMIT_NOFILE
	KVM: downgrade two BUG_ONs to WARN_ON_ONCE
	mac80211: fix regression in SSN handling of addba tx
	mac80211: mark TX-during-stop for TX in in_reconfig
	mac80211: send ADDBA requests using the tid/queue of the aggregation session
	mac80211: validate extended element ID is present
	firmware: arm_scpi: Fix string overflow in SCPI genpd driver
	bpf: Fix signed bounds propagation after mov32
	bpf: Make 32->64 bounds propagation slightly more robust
	bpf, selftests: Add test case trying to taint map value pointer
	virtio_ring: Fix querying of maximum DMA mapping size for virtio device
	vdpa: check that offsets are within bounds
	recordmcount.pl: look for jgnop instruction as well as bcrl on s390
	dm btree remove: fix use after free in rebalance_children()
	audit: improve robustness of the audit queue handling
	arm64: dts: imx8m: correct assigned clocks for FEC
	arm64: dts: imx8mp-evk: Improve the Ethernet PHY description
	arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edge
	arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supply
	arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supply
	arm64: dts: rockchip: fix audio-supply for Rock Pi 4
	mac80211: track only QoS data frames for admission control
	tee: amdtee: fix an IS_ERR() vs NULL bug
	ceph: fix duplicate increment of opened_inodes metric
	ceph: initialize pathlen variable in reconnect_caps_cb
	ARM: socfpga: dts: fix qspi node compatible
	clk: Don't parent clks until the parent is fully registered
	soc: imx: Register SoC device only on i.MX boards
	virtio/vsock: fix the transport to work with VMADDR_CID_ANY
	selftests: net: Correct ping6 expected rc from 2 to 1
	s390/kexec_file: fix error handling when applying relocations
	sch_cake: do not call cake_destroy() from cake_init()
	inet_diag: fix kernel-infoleak for UDP sockets
	net: hns3: fix use-after-free bug in hclgevf_send_mbx_msg
	selftests: Add duplicate config only for MD5 VRF tests
	selftests: Fix raw socket bind tests with VRF
	selftests: Fix IPv6 address bind tests
	dmaengine: st_fdma: fix MODULE_ALIAS
	net/sched: sch_ets: don't remove idle classes from the round-robin list
	selftest/net/forwarding: declare NETIFS p9 p10
	drm/ast: potential dereference of null pointer
	mac80211: agg-tx: don't schedule_and_wake_txq() under sta->lock
	mac80211: fix lookup when adding AddBA extension element
	flow_offload: return EOPNOTSUPP for the unsupported mpls action type
	rds: memory leak in __rds_conn_create()
	drm/amd/pm: fix a potential gpu_metrics_table memory leak
	mptcp: clear 'kern' flag from fallback sockets
	soc/tegra: fuse: Fix bitwise vs. logical OR warning
	igb: Fix removal of unicast MAC filters of VFs
	igbvf: fix double free in `igbvf_probe`
	igc: Fix typo in i225 LTR functions
	ixgbe: Document how to enable NBASE-T support
	ixgbe: set X550 MDIO speed before talking to PHY
	netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
	net/packet: rx_owner_map depends on pg_vec
	sfc_ef100: potential dereference of null pointer
	net: Fix double 0x prefix print in SKB dump
	net/smc: Prevent smc_release() from long blocking
	net: systemport: Add global locking for descriptor lifecycle
	sit: do not call ipip6_dev_free() from sit_init_net()
	bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
	powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
	USB: gadget: bRequestType is a bitfield, not a enum
	Revert "usb: early: convert to readl_poll_timeout_atomic()"
	KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
	tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
	USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04)
	usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe
	PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
	PCI/MSI: Mask MSI-X vectors only on success
	usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
	USB: serial: cp210x: fix CP2105 GPIO registration
	USB: serial: option: add Telit FN990 compositions
	btrfs: fix memory leak in __add_inode_ref()
	btrfs: fix double free of anon_dev after failure to create subvolume
	zonefs: add MODULE_ALIAS_FS
	iocost: Fix divide-by-zero on donation from low hweight cgroup
	serial: 8250_fintek: Fix garbled text for console
	timekeeping: Really make sure wall_to_monotonic isn't positive
	libata: if T_LENGTH is zero, dma direction should be DMA_NONE
	drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE
	Input: touchscreen - avoid bitwise vs logical OR warning
	ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad name
	xsk: Do not sleep in poll() when need_wakeup set
	media: mxl111sf: change mutex_init() location
	fuse: annotate lock in fuse_reverse_inval_entry()
	ovl: fix warning in ovl_create_real()
	scsi: scsi_debug: Don't call kcalloc() if size arg is zero
	scsi: scsi_debug: Fix type in min_t to avoid stack OOB
	scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
	rcu: Mark accesses to rcu_state.n_force_qs
	bus: ti-sysc: Fix variable set but not used warning for reinit_modules
	Revert "xsk: Do not sleep in poll() when need_wakeup set"
	xen/blkfront: harden blkfront against event channel storms
	xen/netfront: harden netfront against event channel storms
	xen/console: harden hvc_xen against event channel storms
	xen/netback: fix rx queue stall detection
	xen/netback: don't queue unlimited number of packages
	Linux 5.10.88

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie9143ca88b59cd27f4b2101e83e25017a5565c6a
2021-12-22 10:12:32 +01:00
Filipe Manana
1c414ff63b btrfs: fix double free of anon_dev after failure to create subvolume
commit 33fab972497ae66822c0b6846d4f9382938575b6 upstream.

When creating a subvolume, at create_subvol(), we allocate an anonymous
device and later call btrfs_get_new_fs_root(), which in turn just calls
btrfs_get_root_ref(). There we call btrfs_init_fs_root() which assigns
the anonymous device to the root, but if after that call there's an error,
when we jump to 'fail' label, we call btrfs_put_root(), which frees the
anonymous device and then returns an error that is propagated back to
create_subvol(). Than create_subvol() frees the anonymous device again.

When this happens, if the anonymous device was not reallocated after
the first time it was freed with btrfs_put_root(), we get a kernel
message like the following:

  (...)
  [13950.282466] BTRFS: error (device dm-0) in create_subvol:663: errno=-5 IO failure
  [13950.283027] ida_free called for id=65 which is not allocated.
  [13950.285974] BTRFS info (device dm-0): forced readonly
  (...)

If the anonymous device gets reallocated by another btrfs filesystem
or any other kernel subsystem, then bad things can happen.

So fix this by setting the root's anonymous device to 0 at
btrfs_get_root_ref(), before we call btrfs_put_root(), if an error
happened.

Fixes: 2dfb1e43f5 ("btrfs: preallocate anon block device at first phase of snapshot creation")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 09:30:57 +01:00
Jianglei Nie
005d9292b5 btrfs: fix memory leak in __add_inode_ref()
commit f35838a6930296fc1988764cfa54cb3f705c0665 upstream.

Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(),
but  when the function returns in line 1184 (#4) victim_name allocated
by line 1169 (#3) is not freed, which will lead to a memory leak.
There is a similar snippet of code in this function as allocating a memory
chunk for victim_name in line 1104 (#1) as well as releasing the memory
in line 1116 (#2).

We should kfree() victim_name when the return value of backref_in_log()
is less than zero and before the function returns in line 1184 (#4).

1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
1058 				  struct btrfs_root *root,
1059 				  struct btrfs_path *path,
1060 				  struct btrfs_root *log_root,
1061 				  struct btrfs_inode *dir,
1062 				  struct btrfs_inode *inode,
1063 				  u64 inode_objectid, u64 parent_objectid,
1064 				  u64 ref_index, char *name, int namelen,
1065 				  int *search_done)
1066 {

1104 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
	// #1: kmalloc (victim_name-1)
1105 	if (!victim_name)
1106 		return -ENOMEM;

1112	ret = backref_in_log(log_root, &search_key,
1113			parent_objectid, victim_name,
1114			victim_name_len);
1115	if (ret < 0) {
1116		kfree(victim_name); // #2: kfree (victim_name-1)
1117		return ret;
1118	} else if (!ret) {

1169 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
	// #3: kmalloc (victim_name-2)
1170 	if (!victim_name)
1171 		return -ENOMEM;

1180 	ret = backref_in_log(log_root, &search_key,
1181 			parent_objectid, victim_name,
1182 			victim_name_len);
1183 	if (ret < 0) {
1184 		return ret; // #4: missing kfree (victim_name-2)
1185 	} else if (!ret) {

1241 	return 0;
1242 }

Fixes: d3316c8233 ("btrfs: Properly handle backref_in_log retval")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 09:30:57 +01:00
Greg Kroah-Hartman
afc997898e This is the 5.10.85 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmG4cugACgkQONu9yGCS
 aT4+uxAAiLJvOIA6ElsmMq2c3pNu9UDPh58j1FlmhggAxr7baIvR1UuEZURTSLW9
 pnu+r9bHkhJGBOpANfPJAQZqv+JtCi3crMhw0mwHJ0Mls3TNSmclzB1+jGM4w93E
 cT+5hoDeQqwZwpKYvWI7u9UGEE0BXluVTRCvmuncaJ8wGbxnDgV0AEXe32XDlFxB
 kSLAXO1FFn8Z1yMg9BMVURU9IAszdwCIhqbcNWnPunOPowdHDWBJdF7eSZUqmJfq
 TMyFgm6c6ENbBrwYt+9qf+1oS/D9r6TEjwaFoRJ4ApWSAD4iKV7U6dA0rxm9mrMl
 nSAxNXuSXXYszoYjrxPPRGhCY/URahs1Vmju1WK//4vz3brxk5N88T+2nN7PMYKn
 bIHGTl9SadlrHvi/OBqOvbsNMHX+ln/V/y7ct0fsxXeNBHYXdXCtimRDvtsj9kmp
 HO9OLXVsF6DUM22ODW28Vxt7HGN3XQzs0y2jwfzdMV81p3oEqP9wWQITwG2LVVAE
 WdkvQ/3ugdkR9F15Vp0cjkbJQLN4UbYUJW5K7SZTW013TgsYyIHQrP5qj9xvfwNt
 KdILXVMH2JqPXAkRqmFqIeNIUX2oevWwpgV/SwqIj3T5ytmXPluZsaMlGg5xZ401
 Xuhp0LPU6rl+wcCt1bInZvn8nrTVDADpao7xTbSyoel5TAunVrk=
 =u1Am
 -----END PGP SIGNATURE-----

Merge 5.10.85 into android12-5.10-lts

Changes in 5.10.85
	usb: gadget: uvc: fix multiple opens
	gcc-plugins: simplify GCC plugin-dev capability test
	gcc-plugins: fix gcc 11 indigestion with plugins...
	HID: quirks: Add quirk for the Microsoft Surface 3 type-cover
	HID: google: add eel USB id
	HID: add hid_is_usb() function to make it simpler for USB detection
	HID: add USB_HID dependancy to hid-prodikeys
	HID: add USB_HID dependancy to hid-chicony
	HID: add USB_HID dependancy on some USB HID drivers
	HID: bigbenff: prevent null pointer dereference
	HID: wacom: fix problems when device is not a valid USB device
	HID: check for valid USB device for many HID drivers
	nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
	IB/hfi1: Insure use of smp_processor_id() is preempt disabled
	IB/hfi1: Fix early init panic
	IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr
	can: kvaser_usb: get CAN clock frequency from device
	can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
	can: sja1000: fix use after free in ems_pcmcia_add_card()
	x86/sme: Explicitly map new EFI memmap table as encrypted
	drm/amd/amdkfd: adjust dummy functions' placement
	drm/amdkfd: separate kfd_iommu_resume from kfd_resume
	drm/amdgpu: add amdgpu_amdkfd_resume_iommu
	drm/amdgpu: move iommu_resume before ip init/resume
	drm/amdgpu: init iommu after amdkfd device init
	drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
	nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
	selftests: netfilter: add a vrf+conntrack testcase
	vrf: don't run conntrack on vrf with !dflt qdisc
	bpf, x86: Fix "no previous prototype" warning
	bpf: Fix the off-by-two error in range markings
	ice: ignore dropped packets during init
	bonding: make tx_rebalance_counter an atomic
	nfp: Fix memory leak in nfp_cpp_area_cache_add()
	seg6: fix the iif in the IPv6 socket control block
	udp: using datalen to cap max gso segments
	netfilter: conntrack: annotate data-races around ct->timeout
	iavf: restore MSI state on reset
	iavf: Fix reporting when setting descriptor count
	IB/hfi1: Correct guard on eager buffer deallocation
	devlink: fix netns refcount leak in devlink_nl_cmd_reload()
	net/sched: fq_pie: prevent dismantle issue
	KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall
	mm: bdi: initialize bdi_min_ratio when bdi is unregistered
	ALSA: ctl: Fix copy of updated id with element read/write
	ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform
	ALSA: hda/realtek: Fix quirk for TongFang PHxTxX1
	ALSA: pcm: oss: Fix negative period/buffer sizes
	ALSA: pcm: oss: Limit the period size to 16MB
	ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*()
	scsi: qla2xxx: Format log strings only if needed
	btrfs: clear extent buffer uptodate when we fail to write it
	btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling
	md: fix update super 1.0 on rdev size change
	nfsd: fix use-after-free due to delegation race
	nfsd: Fix nsfd startup race (again)
	tracefs: Have new files inherit the ownership of their parent
	mmc: renesas_sdhi: initialize variable properly when tuning
	clk: qcom: regmap-mux: fix parent clock lookup
	drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence.
	can: pch_can: pch_can_rx_normal: fix use after free
	can: m_can: Disable and ignore ELO interrupt
	libata: add horkage for ASMedia 1092
	wait: add wake_up_pollfree()
	binder: use wake_up_pollfree()
	signalfd: use wake_up_pollfree()
	aio: keep poll requests on waitqueue until completed
	aio: fix use-after-free due to missing POLLFREE handling
	net: mvpp2: fix XDP rx queues registering
	tracefs: Set all files to the same group ownership as the mount option
	block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2)
	scsi: pm80xx: Do not call scsi_remove_host() in pm8001_alloc()
	scsi: scsi_debug: Fix buffer size of REPORT ZONES command
	qede: validate non LSO skb length
	PM: runtime: Fix pm_runtime_active() kerneldoc comment
	ASoC: rt5682: Fix crash due to out of scope stack vars
	ASoC: qdsp6: q6routing: Fix return value from msm_routing_put_audio_mixer
	ASoC: codecs: wsa881x: fix return values from kcontrol put
	ASoC: codecs: wcd934x: handle channel mappping list correctly
	ASoC: codecs: wcd934x: return correct value from mixer put
	RDMA/hns: Do not halt commands during reset until later
	RDMA/hns: Do not destroy QP resources in the hw resetting phase
	clk: imx: use module_platform_driver
	i40e: Fix failed opcode appearing if handling messages from VF
	i40e: Fix pre-set max number of queues for VF
	mtd: rawnand: fsmc: Take instruction delay into account
	mtd: rawnand: fsmc: Fix timing computation
	i40e: Fix NULL pointer dereference in i40e_dbg_dump_desc
	Revert "PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge"
	perf tools: Fix SMT detection fast read path
	Documentation/locking/locktypes: Update migrate_disable() bits.
	dt-bindings: net: Reintroduce PHY no lane swap binding
	tools build: Remove needless libpython-version feature check that breaks test-all fast path
	net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero
	net: altera: set a couple error code in probe()
	net: fec: only clear interrupt of handling queue in fec_enet_rx_queue()
	net, neigh: clear whole pneigh_entry at alloc time
	net/qla3xxx: fix an error code in ql_adapter_up()
	selftests/fib_tests: Rework fib_rp_filter_test()
	USB: gadget: detect too-big endpoint 0 requests
	USB: gadget: zero allocate endpoint 0 buffers
	usb: core: config: fix validation of wMaxPacketValue entries
	xhci: Remove CONFIG_USB_DEFAULT_PERSIST to prevent xHCI from runtime suspending
	usb: core: config: using bit mask instead of individual bits
	xhci: avoid race between disable slot command and host runtime suspend
	iio: gyro: adxrs290: fix data signedness
	iio: trigger: Fix reference counting
	iio: trigger: stm32-timer: fix MODULE_ALIAS
	iio: stk3310: Don't return error code in interrupt handler
	iio: mma8452: Fix trigger reference couting
	iio: ltr501: Don't return error code in trigger handler
	iio: kxsd9: Don't return error code in trigger handler
	iio: itg3200: Call iio_trigger_notify_done() on error
	iio: dln2-adc: Fix lockdep complaint
	iio: dln2: Check return value of devm_iio_trigger_register()
	iio: at91-sama5d2: Fix incorrect sign extension
	iio: adc: stm32: fix a current leak by resetting pcsel before disabling vdda
	iio: adc: axp20x_adc: fix charging current reporting on AXP22x
	iio: ad7768-1: Call iio_trigger_notify_done() on error
	iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove
	csky: fix typo of fpu config macro
	irqchip/aspeed-scu: Replace update_bits with write_bits.
	irqchip/armada-370-xp: Fix return value of armada_370_xp_msi_alloc()
	irqchip/armada-370-xp: Fix support for Multi-MSI interrupts
	irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL
	irqchip: nvic: Fix offset for Interrupt Priority Offsets
	misc: fastrpc: fix improper packet size calculation
	bpf: Add selftests to cover packet access corner cases
	kbuild: simplify GCC_PLUGINS enablement in dummy-tools/gcc
	doc: gcc-plugins: update gcc-plugins.rst
	MAINTAINERS: adjust GCC PLUGINS after gcc-plugin.sh removal
	Documentation/Kbuild: Remove references to gcc-plugin.sh
	Linux 5.10.85

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I351da1b00f30a370b021125180a48b1c1ecb97ce
2021-12-14 15:11:46 +01:00
Qu Wenruo
caf9b352dc btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling
commit 8289ed9f93bef2762f9184e136d994734b16d997 upstream.

I hit the BUG_ON() with generic/475 test case, and to my surprise, all
callers of btrfs_del_root_ref() are already aborting transaction, thus
there is not need for such BUG_ON(), just go to @out label and caller
will properly handle the error.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14 11:32:38 +01:00
Josef Bacik
41b3cc57d6 btrfs: clear extent buffer uptodate when we fail to write it
commit c2e39305299f0118298c2201f6d6cc7d3485f29e upstream.

I got dmesg errors on generic/281 on our overnight fstests.  Looking at
the history this happens occasionally, with errors like this

  WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50
  CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G        W         5.16.0-rc2+ #469
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Workqueue: btrfs-cache btrfs_work_helper
  RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
  RSP: 0018:ffffae598230bc60 EFLAGS: 00010246
  RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000
  RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340
  RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000
  R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0
  R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000
  FS:  0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0
  Call Trace:

   extent_buffer_test_bit+0x3f/0x70
   free_space_test_bit+0xa6/0xc0
   load_free_space_tree+0x1d6/0x430
   caching_thread+0x454/0x630
   ? rcu_read_lock_sched_held+0x12/0x60
   ? rcu_read_lock_sched_held+0x12/0x60
   ? rcu_read_lock_sched_held+0x12/0x60
   ? lock_release+0x1f0/0x2d0
   btrfs_work_helper+0xf2/0x3e0
   ? lock_release+0x1f0/0x2d0
   ? finish_task_switch.isra.0+0xf9/0x3a0
   process_one_work+0x270/0x5a0
   worker_thread+0x55/0x3c0
   ? process_one_work+0x5a0/0x5a0
   kthread+0x174/0x1a0
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x1f/0x30

This happens because we're trying to read from a extent buffer page that
is !PageUptodate.  This happens because we will clear the page uptodate
when we have an IO error, but we don't clear the extent buffer uptodate.
If we do a read later and find this extent buffer we'll think its valid
and not return an error, and then trip over this warning.

Fix this by also clearing uptodate on the extent buffer when this
happens, so that we get an error when we do a btrfs_search_slot() and
find this block later.

CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14 11:32:38 +01:00
Greg Kroah-Hartman
1b71a028a2 This is the 5.10.84 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmGwZtwACgkQONu9yGCS
 aT7dQhAAjnlFKkXb+omHKQNkSHbD0ynEkxwtQfNFt1kcWcpJy5Df9xyNXQBohnqr
 Y0KUowpVF8gkXOelbdMrK5P6k28SpT2k+UMnUtZLNR6qMNlOY371BDasfi/dqWWR
 1JdLtQe6JVvwxo+6INRqEO27Ocyc1PbLZSo7i3Ik2+7mIRjN7+k1apFG0HOLEHIP
 3oMWDgnyQp3gTBvTFG0Vrd4f9AwrHq4JoVrhruNLqIYajlQ8dPPjuJ9alTifRddD
 eWY10Z21jAFib4WHgy6wXBVv3L5Np19liYMzv02o5pzFV1nLJCnKDA79jV7a2i2H
 lVmVpcWG0Yagyu8MW0hmOewqPXpAJH/C8g75mXeja546vCnvccNx1OXNOR4ux5Es
 IpEFAV+DnjSYgu88Cw6kF8j/B9x1n90sgywCWbRwAMJ1zX9/tvvLWSe8HpfZ2jvo
 Iuw6XDTL84DDuHY4yiK2fofxZvXp+Hk+c0Betu6GoQvoGaDRD8IWIceDWgiqy+V7
 fOrLitl8lbk1yjD7bDZMpEIgzQaaxJu6d+YWzy+PibZxQzOKHPC5gqEmajJd7ZWm
 OJ48SrNxyfjRZP/3NBgXOxje3lz3WkCdiPQrSQOQxoe+kdW5ZFuXDapWSO4dZfSe
 6XPOD/d+KVLNDQby3WnVB2MMlufHFnCs4wPgb13jfyiEbxifp+A=
 =D40k
 -----END PGP SIGNATURE-----

Merge 5.10.84 into android12-5.10-lts

Changes in 5.10.84
	NFSv42: Fix pagecache invalidation after COPY/CLONE
	can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM
	ovl: simplify file splice
	ovl: fix deadlock in splice write
	gfs2: release iopen glock early in evict
	gfs2: Fix length of holes reported at end-of-file
	powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
	drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
	mac80211: do not access the IV when it was stripped
	net/smc: Transfer remaining wait queue entries during fallback
	atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait
	net: return correct error code
	platform/x86: thinkpad_acpi: Add support for dual fan control
	platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
	s390/setup: avoid using memblock_enforce_memory_limit
	btrfs: check-integrity: fix a warning on write caching disabled disk
	thermal: core: Reset previous low and high trip during thermal zone init
	scsi: iscsi: Unblock session then wake up error handler
	drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
	drm/amd/amdgpu: fix potential memleak
	ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
	ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
	ipv6: check return value of ipv6_skip_exthdr
	net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
	net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
	perf inject: Fix ARM SPE handling
	perf hist: Fix memory leak of a perf_hpp_fmt
	perf report: Fix memory leaks around perf_tip()
	net/smc: Avoid warning of possible recursive locking
	ACPI: Add stubs for wakeup handler functions
	vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
	kprobes: Limit max data_size of the kretprobe instances
	rt2x00: do not mark device gone on EPROTO errors during start
	ipmi: Move remove_work to dedicated workqueue
	cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
	s390/pci: move pseudo-MMIO to prevent MIO overlap
	fget: check that the fd still exists after getting a ref to it
	sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
	sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
	ipv6: fix memory leak in fib6_rule_suppress
	drm/amd/display: Allow DSC on supported MST branch devices
	KVM: Disallow user memslot with size that exceeds "unsigned long"
	KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST
	KVM: x86: Use a stable condition around all VT-d PI paths
	KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1
	KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
	tracing/histograms: String compares should not care about signed values
	wireguard: selftests: increase default dmesg log size
	wireguard: allowedips: add missing __rcu annotation to satisfy sparse
	wireguard: selftests: actually test for routing loops
	wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLIST
	wireguard: device: reset peer src endpoint when netns exits
	wireguard: receive: use ring buffer for incoming handshakes
	wireguard: receive: drop handshakes if queue lock is contended
	wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()
	i2c: stm32f7: flush TX FIFO upon transfer errors
	i2c: stm32f7: recover the bus on access timeout
	i2c: stm32f7: stop dma transfer in case of NACK
	i2c: cbus-gpio: set atomic transfer callback
	natsemi: xtensa: fix section mismatch warnings
	tcp: fix page frag corruption on page fault
	net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
	net: mpls: Fix notifications when deleting a device
	siphash: use _unaligned version by default
	arm64: ftrace: add missing BTIs
	net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
	selftests: net: Correct case name
	mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode
	ASoC: tegra: Fix wrong value type in ADMAIF
	ASoC: tegra: Fix wrong value type in I2S
	ASoC: tegra: Fix wrong value type in DMIC
	ASoC: tegra: Fix wrong value type in DSPK
	ASoC: tegra: Fix kcontrol put callback in ADMAIF
	ASoC: tegra: Fix kcontrol put callback in I2S
	ASoC: tegra: Fix kcontrol put callback in DMIC
	ASoC: tegra: Fix kcontrol put callback in DSPK
	ASoC: tegra: Fix kcontrol put callback in AHUB
	rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()
	rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()
	ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
	net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available
	net: marvell: mvpp2: Fix the computation of shared CPUs
	dpaa2-eth: destroy workqueue at the end of remove function
	net: annotate data-races on txq->xmit_lock_owner
	ipv4: convert fib_num_tclassid_users to atomic_t
	net/smc: fix wrong list_del in smc_lgr_cleanup_early
	net/rds: correct socket tunable error in rds_tcp_tune()
	net/smc: Keep smc_close_final rc during active close
	drm/msm/a6xx: Allocate enough space for GMU registers
	drm/msm: Do hw_init() before capturing GPU state
	atlantic: Increase delay for fw transactions
	atlatnic: enable Nbase-t speeds with base-t
	atlantic: Fix to display FW bundle version instead of FW mac version.
	atlantic: Add missing DIDs and fix 115c.
	Remove Half duplex mode speed capabilities.
	atlantic: Fix statistics logic for production hardware
	atlantic: Remove warn trace message.
	KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
	KVM: VMX: Set failure code in prepare_vmcs02()
	x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
	x86/entry: Use the correct fence macro after swapgs in kernel CR3
	x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
	sched/uclamp: Fix rq->uclamp_max not set on first enqueue
	x86/pv: Switch SWAPGS to ALTERNATIVE
	x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
	parisc: Fix KBUILD_IMAGE for self-extracting kernel
	parisc: Fix "make install" on newer debian releases
	vgacon: Propagate console boot parameters before calling `vc_resize'
	xhci: Fix commad ring abort, write all 64 bits to CRCR register.
	USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
	usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
	x86/tsc: Add a timer to make sure TSC_adjust is always checked
	x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
	x86/64/mm: Map all kernel memory into trampoline_pgd
	tty: serial: msm_serial: Deactivate RX DMA for polling support
	serial: pl011: Add ACPI SBSA UART match id
	serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
	serial: core: fix transmit-buffer reset and memleak
	serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
	serial: 8250_pci: rewrite pericom_do_set_divisor()
	serial: 8250: Fix RTS modem control while in rs485 mode
	iwlwifi: mvm: retry init flow if failed
	parisc: Mark cr16 CPU clocksource unstable on all SMP machines
	net/tls: Fix authentication failure in CCM mode
	ipmi: msghandler: Make symbol 'remove_work_wq' static
	Linux 5.10.84

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iad592da28c6425dea7dca35b229d14c44edb412d
2021-12-08 09:41:05 +01:00
Wang Yugui
8e4d2ac434 btrfs: check-integrity: fix a warning on write caching disabled disk
[ Upstream commit a91cf0ffbc244792e0b3ecf7d0fddb2f344b461f ]

When a disk has write caching disabled, we skip submission of a bio with
flush and sync requests before writing the superblock, since it's not
needed. However when the integrity checker is enabled, this results in
reports that there are metadata blocks referred by a superblock that
were not properly flushed. So don't skip the bio submission only when
the integrity checker is enabled for the sake of simplicity, since this
is a debug tool and not meant for use in non-debug builds.

fstests/btrfs/220 trigger a check-integrity warning like the following
when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.

  btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
  ------------[ cut here ]------------
  WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
  CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
  Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
  RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
  RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
  RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
  RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
  R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
  R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
  FS:  00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
  Call Trace:
   btrfsic_process_written_block+0x2f7/0x850 [btrfs]
   __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
   ? bio_associate_blkg_from_css+0xa4/0x2c0
   btrfsic_submit_bio+0x18/0x30 [btrfs]
   write_dev_supers+0x81/0x2a0 [btrfs]
   ? find_get_pages_range_tag+0x219/0x280
   ? pagevec_lookup_range_tag+0x24/0x30
   ? __filemap_fdatawait_range+0x6d/0xf0
   ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
   ? find_first_extent_bit+0x9b/0x160 [btrfs]
   ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
   write_all_supers+0x1b3/0xa70 [btrfs]
   ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
   btrfs_commit_transaction+0x59d/0xac0 [btrfs]
   close_ctree+0x11d/0x339 [btrfs]
   generic_shutdown_super+0x71/0x110
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0xb8/0x140
   task_work_run+0x6d/0xb0
   exit_to_user_mode_prepare+0x1f0/0x200
   syscall_exit_to_user_mode+0x12/0x30
   do_syscall_64+0x46/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f009f711dfb
  RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
  RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
  R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
  R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
  ---[ end trace 2c4b82abcef9eec4 ]---
  S-65536(sdb2/65536/1)
   -->
  M-1064960(sdb2/1064960/1)

Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08 09:03:19 +01:00
Greg Kroah-Hartman
8d21bcc704 This is the 5.10.82 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmGgq10ACgkQONu9yGCS
 aT6KDA/+N9ysKF4cH2zdUMhDAkjCKB3YqTEsJxSfGBkJu2wuncAEEtrKy9jxC+lv
 fz6BE1tduit/IQIhGTCXJlfe9NIxwU87f2v5JlHnYeXg4dz72c+Ei236l7ZvkSNE
 ii8/ikHGvbbhKv+BTgcRg7jVUMMy6eEpS6iJwMNLB/sHROjZXPogFoiYjbO+Jzc5
 0jTciMZv6r4yNhrdHBjhWHe6ZhB94H//Jy8MVYk37NGc5EbJmrMN83GM5ceSmhOZ
 PgxxyrVTv+SdGm0XViyK+94HXWGQHLXQF+Nsu3YEZfnNI+HNSPQKTqBLPM1hV7Ak
 h+IYW6VHPmcBmQzEdSA67uMKJayKtEwpkqO6aLRcj/NIThRiZoznbrtZOoGSXaU1
 0MzQRPum76GA5/SVGgtB8FrE6kcFm74eq82mXvUD+rgCp0HTbIpYQK9ZKSmbFOkv
 fYjcpWHZ8PEmffMbtIlVKSffVxcUILoNuQwnr21NGiRUrd54DhNPgVahmCKnvUTb
 847bGU/wQJPIF/2SO1rdpaA9MrPqZ/9sMEX3nSdx7xS8D+h2wfJqAJkLq0KBYt7R
 sbsXbfqbri893VHBo2YUqby3+7x3uNr118SjyiA8zpHHJpTBrVVImxSnW1z626HT
 KNJU4MSulLs+settJKAw1PHGRIGuW5TGSGF94p5LcsZDM2uIabU=
 =Wg4d
 -----END PGP SIGNATURE-----

Merge 5.10.82 into android12-5.10-lts

Changes in 5.10.82
	arm64: zynqmp: Do not duplicate flash partition label property
	arm64: zynqmp: Fix serial compatible string
	ARM: dts: sunxi: Fix OPPs node name
	arm64: dts: allwinner: h5: Fix GPU thermal zone node name
	arm64: dts: allwinner: a100: Fix thermal zone node name
	staging: wfx: ensure IRQ is ready before enabling it
	ARM: dts: NSP: Fix mpcore, mmc node names
	scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
	arm64: dts: rockchip: Disable CDN DP on Pinebook Pro
	arm64: dts: hisilicon: fix arm,sp805 compatible string
	RDMA/bnxt_re: Check if the vlan is valid before reporting
	bus: ti-sysc: Add quirk handling for reinit on context lost
	bus: ti-sysc: Use context lost quirk for otg
	usb: musb: tusb6010: check return value after calling platform_get_resource()
	usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
	ARM: dts: ux500: Skomer regulator fixes
	staging: rtl8723bs: remove possible deadlock when disconnect (v2)
	ARM: BCM53016: Specify switch ports for Meraki MR32
	arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
	arm64: dts: qcom: ipq6018: Fix qcom,controlled-remotely property
	arm64: dts: freescale: fix arm,sp805 compatible string
	ASoC: SOF: Intel: hda-dai: fix potential locking issue
	clk: imx: imx6ul: Move csi_sel mux to correct base register
	ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
	scsi: advansys: Fix kernel pointer leak
	ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336 codec
	firmware_loader: fix pre-allocated buf built-in firmware use
	ARM: dts: omap: fix gpmc,mux-add-data type
	usb: host: ohci-tmio: check return value after calling platform_get_resource()
	ARM: dts: ls1021a: move thermal-zones node out of soc/
	ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
	ALSA: ISA: not for M68K
	tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
	MIPS: sni: Fix the build
	scsi: scsi_debug: Fix out-of-bound read in resp_readcap16()
	scsi: scsi_debug: Fix out-of-bound read in resp_report_tgtpgs()
	scsi: target: Fix ordered tag handling
	scsi: target: Fix alua_tg_pt_gps_count tracking
	iio: imu: st_lsm6dsx: Avoid potential array overflow in st_lsm6dsx_set_odr()
	powerpc/5200: dts: fix memory node unit name
	ARM: dts: qcom: fix memory and mdio nodes naming for RB3011
	ALSA: gus: fix null pointer dereference on pointer block
	powerpc/dcr: Use cmplwi instead of 3-argument cmpli
	powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
	sh: check return code of request_irq
	maple: fix wrong return value of maple_bus_init().
	f2fs: fix up f2fs_lookup tracepoints
	f2fs: fix to use WHINT_MODE
	sh: fix kconfig unmet dependency warning for FRAME_POINTER
	sh: math-emu: drop unused functions
	sh: define __BIG_ENDIAN for math-emu
	f2fs: compress: disallow disabling compress on non-empty compressed file
	f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()
	clk: ingenic: Fix bugs with divided dividers
	clk/ast2600: Fix soc revision for AHB
	clk: qcom: gcc-msm8996: Drop (again) gcc_aggre1_pnoc_ahb_clk
	mips: BCM63XX: ensure that CPU_SUPPORTS_32BIT_KERNEL is set
	sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
	perf/x86/vlbr: Add c->flags to vlbr event constraints
	blkcg: Remove extra blkcg_bio_issue_init
	tracing/histogram: Do not copy the fixed-size char array field over the field size
	perf bpf: Avoid memory leak from perf_env__insert_btf()
	perf bench futex: Fix memory leak of perf_cpu_map__new()
	perf tests: Remove bash construct from record+zstd_comp_decomp.sh
	drm/nouveau: hdmigv100.c: fix corrupted HDMI Vendor InfoFrame
	net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.
	net-zerocopy: Refactor skb frag fast-forward op.
	tcp: Fix uninitialized access in skb frags array for Rx 0cp.
	tracing: Add length protection to histogram string copies
	net: ipa: disable HOLB drop when updating timer
	net: bnx2x: fix variable dereferenced before check
	bnxt_en: reject indirect blk offload when hw-tc-offload is off
	tipc: only accept encrypted MSG_CRYPTO msgs
	net: reduce indentation level in sk_clone_lock()
	sock: fix /proc/net/sockstat underflow in sk_clone_lock()
	net/smc: Make sure the link_id is unique
	iavf: Fix return of set the new channel count
	iavf: check for null in iavf_fix_features
	iavf: free q_vectors before queues in iavf_disable_vf
	iavf: Fix failure to exit out from last all-multicast mode
	iavf: prevent accidental free of filter structure
	iavf: validate pointers
	iavf: Fix for the false positive ASQ/ARQ errors while issuing VF reset
	iavf: Fix for setting queues to 0
	MIPS: generic/yamon-dt: fix uninitialized variable error
	mips: bcm63xx: add support for clk_get_parent()
	mips: lantiq: add support for clk_get_parent()
	platform/x86: hp_accel: Fix an error handling path in 'lis3lv02d_probe()'
	net/mlx5e: nullify cq->dbg pointer in mlx5_debug_cq_remove()
	net/mlx5: Lag, update tracker when state change event received
	net/mlx5: E-Switch, Change mode lock from mutex to rw semaphore
	net/mlx5: E-Switch, return error if encap isn't supported
	scsi: core: sysfs: Fix hang when device state is set via sysfs
	net: sched: act_mirred: drop dst for the direction from egress to ingress
	net: dpaa2-eth: fix use-after-free in dpaa2_eth_remove
	net: virtio_net_hdr_to_skb: count transport header in UFO
	i40e: Fix correct max_pkt_size on VF RX queue
	i40e: Fix NULL ptr dereference on VSI filter sync
	i40e: Fix changing previously set num_queue_pairs for PFs
	i40e: Fix ping is lost after configuring ADq on VF
	i40e: Fix warning message and call stack during rmmod i40e driver
	i40e: Fix creation of first queue by omitting it if is not power of two
	i40e: Fix display error code in dmesg
	NFC: reorganize the functions in nci_request
	NFC: reorder the logic in nfc_{un,}register_device
	net: nfc: nci: Change the NCI close sequence
	NFC: add NCI_UNREG flag to eliminate the race
	e100: fix device suspend/resume
	KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
	pinctrl: qcom: sdm845: Enable dual edge errata
	perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
	perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
	s390/kexec: fix return code handling
	net: stmmac: dwmac-rk: Fix ethernet on rk3399 based devices
	arm64: vdso32: suppress error message for 'make mrproper'
	tun: fix bonding active backup with arp monitoring
	hexagon: export raw I/O routines for modules
	hexagon: clean up timer-regs.h
	tipc: check for null after calling kmemdup
	ipc: WARN if trying to remove ipc object which is absent
	mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
	x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
	powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
	scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()
	s390/kexec: fix memory leak of ipl report buffer
	block: Check ADMIN before NICE for IOPRIO_CLASS_RT
	KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
	udf: Fix crash after seekdir
	net: stmmac: socfpga: add runtime suspend/resume callback for stratix10 platform
	btrfs: fix memory ordering between normal and ordered work functions
	parisc/sticon: fix reverse colors
	cfg80211: call cfg80211_stop_ap when switch from P2P_GO type
	drm/amd/display: Update swizzle mode enums
	drm/udl: fix control-message timeout
	drm/nouveau: Add a dedicated mutex for the clients list
	drm/nouveau: use drm_dev_unplug() during device removal
	drm/nouveau: clean up all clients on device removal
	drm/i915/dp: Ensure sink rate values are always valid
	drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
	scsi: ufs: core: Fix task management completion
	scsi: ufs: core: Fix task management completion timeout race
	hugetlbfs: flush TLBs correctly after huge_pmd_unshare
	RDMA/netlink: Add __maybe_unused to static inline in C file
	selinux: fix NULL-pointer dereference when hashtab allocation fails
	ASoC: DAPM: Cover regression by kctl change notification fix
	usb: max-3421: Use driver data instead of maintaining a list of bound devices
	ice: Delete always true check of PF pointer
	fs: export an inode_update_time helper
	btrfs: update device path inode time instead of bd_inode
	x86/Kconfig: Fix an unused variable error in dell-smm-hwmon
	ALSA: hda: hdac_ext_stream: fix potential locking issues
	ALSA: hda: hdac_stream: fix potential locking issue in snd_hdac_stream_assign()
	Revert "perf: Rework perf_event_exit_event()"
	Linux 5.10.82

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I56e067875dafc27c2e86fc3b8c47abb3296c6a18
2021-11-26 15:37:44 +01:00
Josef Bacik
2ec78af152 btrfs: update device path inode time instead of bd_inode
commit 54fde91f52f515e0b1514f0f0fa146e87a672227 upstream.

Christoph pointed out that I'm updating bdev->bd_inode for the device
time when we remove block devices from a btrfs file system, however this
isn't actually exposed to anything.  The inode we want to update is the
one that's associated with the path to the device, usually on devtmpfs,
so that blkid notices the difference.

We still don't want to do the blkdev_open, so use kern_path() to get the
path to the given device and do the update time on that inode.

Fixes: 8f96a5bfa150 ("btrfs: update the bdev time directly when closing")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-26 10:39:22 +01:00
Nikolay Borisov
6adbc07ebc btrfs: fix memory ordering between normal and ordered work functions
commit 45da9c1767ac31857df572f0a909fbe88fd5a7e9 upstream.

Ordered work functions aren't guaranteed to be handled by the same thread
which executed the normal work functions. The only way execution between
normal/ordered functions is synchronized is via the WORK_DONE_BIT,
unfortunately the used bitops don't guarantee any ordering whatsoever.

This manifested as seemingly inexplicable crashes on ARM64, where
async_chunk::inode is seen as non-null in async_cow_submit which causes
submit_compressed_extents to be called and crash occurs because
async_chunk::inode suddenly became NULL. The call trace was similar to:

    pc : submit_compressed_extents+0x38/0x3d0
    lr : async_cow_submit+0x50/0xd0
    sp : ffff800015d4bc20

    <registers omitted for brevity>

    Call trace:
     submit_compressed_extents+0x38/0x3d0
     async_cow_submit+0x50/0xd0
     run_ordered_work+0xc8/0x280
     btrfs_work_helper+0x98/0x250
     process_one_work+0x1f0/0x4ac
     worker_thread+0x188/0x504
     kthread+0x110/0x114
     ret_from_fork+0x10/0x18

Fix this by adding respective barrier calls which ensure that all
accesses preceding setting of WORK_DONE_BIT are strictly ordered before
setting the flag. At the same time add a read barrier after reading of
WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads
would be strictly ordered after reading the bit. This in turn ensures
are all accesses before WORK_DONE_BIT are going to be strictly ordered
before any access that can occur in ordered_func.

Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: 08a9ff3264 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue")
CC: stable@vger.kernel.org # 4.4+
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-26 10:39:20 +01:00
Greg Kroah-Hartman
c553d9a246 This is the 5.10.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmGWT+QACgkQONu9yGCS
 aT5mYw//ZXKzugaeJjuIaFqr7tcM7x8EefbKd2H4oMr8SW3IFElJIbNPJGMJAG/C
 tLZVWZvIum7QzZoxTL+JCCKpDzBERNTo4e5u7UwzAdVqiEX69YkNU0FBOzb4qXJ7
 gOZMBhy4UMIKdKD12CSXXf7ZspocsNXfzdmulRQ7CQcPoPrIMKpc4vuagN1Fy/Dz
 JgXYvRUAkLxtFHoQ/TeXvR4Gv9+w2ToMdb02mI48QBO+YYrFaGt+Rza2eHTv75H+
 Lydz37Nv1Pk32tA1q2jWxCzz16+Kzn+AviKiCfQK0Fb9IqnJksUIWLHSiODlVIcf
 kQHejanPn/p1BnBl8puPk1KFtDW45p2GwYhXG7hjGh08DGlR7QLHBS5Aa3xPYfdd
 uOy4ctygSVTx5nLjPH5vr3OE0wk/TuSSf/eyk2fmcUCspwAgBOnSYSmnJOem7LTK
 VqIgXFdCRplsqN415D35ddruP2BLCKqBu4KjwJ1LGIwgsx/Pmz4hlc5YcpLm8uRg
 XMqGTdcieQFOGmZJjJ2q3ecaCjfb0nmTrOylP5b55/74TFwFo042YR1ua0fEtpD4
 euoHLfYv3BY1dCp34TOUFGX0l+J1kAtf//vfD/JgJx/nX+ksdFBHhYwdbSi2oQG/
 9CceXYJ5duEnG+JmDOWJvcZ3T49K5XaIDNfY2zGpcSu1VZKubWg=
 =tQ0m
 -----END PGP SIGNATURE-----

Merge 5.10.80 into android12-5.10-lts

Changes in 5.10.80
	xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
	usb: xhci: Enable runtime-pm by default on AMD Yellow Carp platform
	binder: use euid from cred instead of using task
	binder: use cred instead of task for selinux checks
	binder: use cred instead of task for getsecid
	Input: iforce - fix control-message timeout
	Input: elantench - fix misreporting trackpoint coordinates
	Input: i8042 - Add quirk for Fujitsu Lifebook T725
	libata: fix read log timeout value
	ocfs2: fix data corruption on truncate
	scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd()
	scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file
	scsi: qla2xxx: Fix use after free in eh_abort path
	mmc: mtk-sd: Add wait dma stop done flow
	mmc: dw_mmc: Dont wait for DRTO on Write RSP error
	exfat: fix incorrect loading of i_blocks for large files
	parisc: Fix set_fixmap() on PA1.x CPUs
	parisc: Fix ptrace check on syscall return
	tpm: Check for integer overflow in tpm2_map_response_body()
	firmware/psci: fix application of sizeof to pointer
	crypto: s5p-sss - Add error handling in s5p_aes_probe()
	media: rkvdec: Do not override sizeimage for output format
	media: ite-cir: IR receiver stop working after receive overflow
	media: rkvdec: Support dynamic resolution changes
	media: ir-kbd-i2c: improve responsiveness of hauppauge zilog receivers
	media: v4l2-ioctl: Fix check_ext_ctrls
	ALSA: hda/realtek: Fix mic mute LED for the HP Spectre x360 14
	ALSA: hda/realtek: Add a quirk for HP OMEN 15 mute LED
	ALSA: hda/realtek: Add quirk for Clevo PC70HS
	ALSA: hda/realtek: Headset fixup for Clevo NH77HJQ
	ALSA: hda/realtek: Add a quirk for Acer Spin SP513-54N
	ALSA: hda/realtek: Add quirk for ASUS UX550VE
	ALSA: hda/realtek: Add quirk for HP EliteBook 840 G7 mute LED
	ALSA: ua101: fix division by zero at probe
	ALSA: 6fire: fix control and bulk message timeouts
	ALSA: line6: fix control and interrupt message timeouts
	ALSA: usb-audio: Line6 HX-Stomp XL USB_ID for 48k-fixed quirk
	ALSA: usb-audio: Add registration quirk for JBL Quantum 400
	ALSA: hda: Free card instance properly at probe errors
	ALSA: synth: missing check for possible NULL after the call to kstrdup
	ALSA: timer: Fix use-after-free problem
	ALSA: timer: Unconditionally unlink slave instances, too
	ext4: fix lazy initialization next schedule time computation in more granular unit
	ext4: ensure enough credits in ext4_ext_shift_path_extents
	ext4: refresh the ext4_ext_path struct after dropping i_data_sem.
	fuse: fix page stealing
	x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
	x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
	x86/irq: Ensure PI wakeup handler is unregistered before module unload
	ASoC: soc-core: fix null-ptr-deref in snd_soc_del_component_unlocked()
	ALSA: hda/realtek: Fixes HP Spectre x360 15-eb1xxx speakers
	cavium: Return negative value when pci_alloc_irq_vectors() fails
	scsi: qla2xxx: Return -ENOMEM if kzalloc() fails
	scsi: qla2xxx: Fix unmap of already freed sgl
	mISDN: Fix return values of the probe function
	cavium: Fix return values of the probe function
	sfc: Export fibre-specific supported link modes
	sfc: Don't use netif_info before net_device setup
	hyperv/vmbus: include linux/bitops.h
	ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode
	reset: socfpga: add empty driver allowing consumers to probe
	mmc: winbond: don't build on M68K
	drm: panel-orientation-quirks: Add quirk for Aya Neo 2021
	fcnal-test: kill hanging ping/nettest binaries on cleanup
	bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
	bpf: Prevent increasing bpf_jit_limit above max
	gpio: mlxbf2.c: Add check for bgpio_init failure
	xen/netfront: stop tx queues during live migration
	nvmet-tcp: fix a memory leak when releasing a queue
	spi: spl022: fix Microwire full duplex mode
	net: multicast: calculate csum of looped-back and forwarded packets
	watchdog: Fix OMAP watchdog early handling
	drm: panel-orientation-quirks: Add quirk for GPD Win3
	block: schedule queue restart after BLK_STS_ZONE_RESOURCE
	nvmet-tcp: fix header digest verification
	r8169: Add device 10ec:8162 to driver r8169
	vmxnet3: do not stop tx queues after netif_device_detach()
	nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
	net/smc: Fix smc_link->llc_testlink_time overflow
	net/smc: Correct spelling mistake to TCPF_SYN_RECV
	rds: stop using dmapool
	btrfs: clear MISSING device status bit in btrfs_close_one_device
	btrfs: fix lost error handling when replaying directory deletes
	btrfs: call btrfs_check_rw_degradable only if there is a missing device
	KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup
	ia64: kprobes: Fix to pass correct trampoline address to the handler
	selinux: fix race condition when computing ocontext SIDs
	hwmon: (pmbus/lm25066) Add offset coefficients
	regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is disabled
	regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-dvs-idx property
	EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
	mwifiex: fix division by zero in fw download path
	ath6kl: fix division by zero in send path
	ath6kl: fix control-message timeout
	ath10k: fix control-message timeout
	ath10k: fix division by zero in send path
	PCI: Mark Atheros QCA6174 to avoid bus reset
	rtl8187: fix control-message timeouts
	evm: mark evm_fixmode as __ro_after_init
	ifb: Depend on netfilter alternatively to tc
	wcn36xx: Fix HT40 capability for 2Ghz band
	wcn36xx: Fix tx_status mechanism
	wcn36xx: Fix (QoS) null data frame bitrate/modulation
	PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
	mwifiex: Read a PCI register after writing the TX ring write pointer
	mwifiex: Try waking the firmware until we get an interrupt
	libata: fix checking of DMA state
	wcn36xx: handle connection loss indication
	rsi: fix occasional initialisation failure with BT coex
	rsi: fix key enabled check causing unwanted encryption for vap_id > 0
	rsi: fix rate mask set leading to P2P failure
	rsi: Fix module dev_oper_mode parameter description
	perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
	perf/x86/intel/uncore: Fix Intel ICX IIO event constraints
	RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
	signal: Remove the bogus sigkill_pending in ptrace_stop
	memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode
	signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
	soc: fsl: dpio: replace smp_processor_id with raw_smp_processor_id
	soc: fsl: dpio: use the combined functions to protect critical zone
	mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines
	power: supply: max17042_battery: Prevent int underflow in set_soc_threshold
	power: supply: max17042_battery: use VFSOC for capacity when no rsns
	KVM: arm64: Extract ESR_ELx.EC only
	KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
	can: j1939: j1939_tp_cmd_recv(): ignore abort message in the BAM transport
	can: j1939: j1939_can_recv(): ignore messages with invalid source address
	powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
	ring-buffer: Protect ring_buffer_reset() from reentrancy
	serial: core: Fix initializing and restoring termios speed
	ifb: fix building without CONFIG_NET_CLS_ACT
	ALSA: mixer: oss: Fix racy access to slots
	ALSA: mixer: fix deadlock in snd_mixer_oss_set_volume
	xen/balloon: add late_initcall_sync() for initial ballooning done
	ovl: fix use after free in struct ovl_aio_req
	PCI: pci-bridge-emul: Fix emulation of W1C bits
	PCI: cadence: Add cdns_plat_pcie_probe() missing return
	PCI: aardvark: Do not clear status bits of masked interrupts
	PCI: aardvark: Fix checking for link up via LTSSM state
	PCI: aardvark: Do not unmask unused interrupts
	PCI: aardvark: Fix reporting Data Link Layer Link Active
	PCI: aardvark: Fix configuring Reference clock
	PCI: aardvark: Fix return value of MSI domain .alloc() method
	PCI: aardvark: Read all 16-bits from PCIE_MSI_PAYLOAD_REG
	PCI: aardvark: Fix support for bus mastering and PCI_COMMAND on emulated bridge
	PCI: aardvark: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
	PCI: aardvark: Set PCI Bridge Class Code to PCI Bridge
	PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge
	quota: check block number when reading the block in quota file
	quota: correct error number in free_dqentry()
	pinctrl: core: fix possible memory leak in pinctrl_enable()
	coresight: cti: Correct the parameter for pm_runtime_put
	iio: dac: ad5446: Fix ad5622_write() return value
	iio: ad5770r: make devicetree property reading consistent
	USB: serial: keyspan: fix memleak on probe errors
	serial: 8250: fix racy uartclk update
	most: fix control-message timeouts
	USB: iowarrior: fix control-message timeouts
	USB: chipidea: fix interrupt deadlock
	power: supply: max17042_battery: Clear status bits in interrupt handler
	dma-buf: WARN on dmabuf release with pending attachments
	drm: panel-orientation-quirks: Update the Lenovo Ideapad D330 quirk (v2)
	drm: panel-orientation-quirks: Add quirk for KD Kurio Smart C15200 2-in-1
	drm: panel-orientation-quirks: Add quirk for the Samsung Galaxy Book 10.6
	Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()
	Bluetooth: fix use-after-free error in lock_sock_nested()
	drm/panel-orientation-quirks: add Valve Steam Deck
	rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
	platform/x86: wmi: do not fail if disabling fails
	MIPS: lantiq: dma: add small delay after reset
	MIPS: lantiq: dma: reset correct number of channel
	locking/lockdep: Avoid RCU-induced noinstr fail
	net: sched: update default qdisc visibility after Tx queue cnt changes
	rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
	smackfs: Fix use-after-free in netlbl_catmap_walk()
	ath11k: Align bss_chan_info structure with firmware
	x86: Increase exception stack sizes
	mwifiex: Run SET_BSS_MODE when changing from P2P to STATION vif-type
	mwifiex: Properly initialize private structure on interface type changes
	fscrypt: allow 256-bit master keys with AES-256-XTS
	drm/amdgpu: Fix MMIO access page fault
	ath11k: Avoid reg rules update during firmware recovery
	ath11k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED
	ath11k: Change DMA_FROM_DEVICE to DMA_TO_DEVICE when map reinjected packets
	ath10k: high latency fixes for beacon buffer
	media: mt9p031: Fix corrupted frame after restarting stream
	media: netup_unidvb: handle interrupt properly according to the firmware
	media: atomisp: Fix error handling in probe
	media: stm32: Potential NULL pointer dereference in dcmi_irq_thread()
	media: uvcvideo: Set capability in s_param
	media: uvcvideo: Return -EIO for control errors
	media: uvcvideo: Set unique vdev name based in type
	media: s5p-mfc: fix possible null-pointer dereference in s5p_mfc_probe()
	media: s5p-mfc: Add checking to s5p_mfc_probe().
	media: imx: set a media_device bus_info string
	media: mceusb: return without resubmitting URB in case of -EPROTO error.
	ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
	rtw88: fix RX clock gate setting while fifo dump
	brcmfmac: Add DMI nvram filename quirk for Cyberbook T116 tablet
	media: rcar-csi2: Add checking to rcsi2_start_receiver()
	ipmi: Disable some operations during a panic
	fs/proc/uptime.c: Fix idle time reporting in /proc/uptime
	ACPICA: Avoid evaluating methods too early during system resume
	media: ipu3-imgu: imgu_fmt: Handle properly try
	media: ipu3-imgu: VIDIOC_QUERYCAP: Fix bus_info
	media: usb: dvd-usb: fix uninit-value bug in dibusb_read_eeprom_byte()
	net-sysfs: try not to restart the syscall if it will fail eventually
	tracefs: Have tracefs directories not set OTH permission bits by default
	ath: dfs_pattern_detector: Fix possible null-pointer dereference in channel_detector_create()
	mmc: moxart: Fix reference count leaks in moxart_probe
	iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
	ACPI: battery: Accept charges over the design capacity as full
	drm/amdkfd: fix resume error when iommu disabled in Picasso
	net: phy: micrel: make *-skew-ps check more lenient
	leaking_addresses: Always print a trailing newline
	drm/msm: prevent NULL dereference in msm_gpu_crashstate_capture()
	block: bump max plugged deferred size from 16 to 32
	md: update superblock after changing rdev flags in state_store
	memstick: r592: Fix a UAF bug when removing the driver
	lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
	lib/xz: Validate the value before assigning it to an enum variable
	workqueue: make sysfs of unbound kworker cpumask more clever
	tracing/cfi: Fix cmp_entries_* functions signature mismatch
	mt76: mt7915: fix an off-by-one bound check
	mwl8k: Fix use-after-free in mwl8k_fw_state_machine()
	block: remove inaccurate requeue check
	media: allegro: ignore interrupt if mailbox is not initialized
	nvmet: fix use-after-free when a port is removed
	nvmet-rdma: fix use-after-free when a port is removed
	nvmet-tcp: fix use-after-free when a port is removed
	nvme: drop scan_lock and always kick requeue list when removing namespaces
	PM: hibernate: Get block device exclusively in swsusp_check()
	selftests: kvm: fix mismatched fclose() after popen()
	selftests/bpf: Fix perf_buffer test on system with offline cpus
	iwlwifi: mvm: disable RX-diversity in powersave
	smackfs: use __GFP_NOFAIL for smk_cipso_doi()
	ARM: clang: Do not rely on lr register for stacktrace
	gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
	gfs2: Cancel remote delete work asynchronously
	gfs2: Fix glock_hash_walk bugs
	ARM: 9136/1: ARMv7-M uses BE-8, not BE-32
	vrf: run conntrack only in context of lower/physdev for locally generated packets
	net: annotate data-race in neigh_output()
	ACPI: AC: Quirk GK45 to skip reading _PSR
	btrfs: reflink: initialize return value to 0 in btrfs_extent_same()
	btrfs: do not take the uuid_mutex in btrfs_rm_device
	spi: bcm-qspi: Fix missing clk_disable_unprepare() on error in bcm_qspi_probe()
	wcn36xx: Correct band/freq reporting on RX
	x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
	drm/amd/display: dcn20_resource_construct reduce scope of FPU enabled
	selftests/core: fix conflicting types compile error for close_range()
	parisc: fix warning in flush_tlb_all
	task_stack: Fix end_of_stack() for architectures with upwards-growing stack
	erofs: don't trigger WARN() when decompression fails
	parisc/unwind: fix unwinder when CONFIG_64BIT is enabled
	parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling
	netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream state
	selftests/bpf: Fix strobemeta selftest regression
	Bluetooth: fix init and cleanup of sco_conn.timeout_work
	rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
	MIPS: lantiq: dma: fix burst length for DEU
	objtool: Add xen_start_kernel() to noreturn list
	x86/xen: Mark cpu_bringup_and_idle() as dead_end_function
	objtool: Fix static_call list generation
	drm/v3d: fix wait for TMU write combiner flush
	virtio-gpu: fix possible memory allocation failure
	lockdep: Let lock_is_held_type() detect recursive read as read
	net: net_namespace: Fix undefined member in key_remove_domain()
	cgroup: Make rebind_subsystems() disable v2 controllers all at once
	wcn36xx: Fix Antenna Diversity Switching
	wilc1000: fix possible memory leak in cfg_scan_result()
	Bluetooth: btmtkuart: fix a memleak in mtk_hci_wmt_sync
	crypto: caam - disable pkc for non-E SoCs
	rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()
	net: dsa: rtl8366rb: Fix off-by-one bug
	ath11k: fix some sleeping in atomic bugs
	ath11k: Avoid race during regd updates
	ath11k: fix packet drops due to incorrect 6 GHz freq value in rx status
	ath11k: Fix memory leak in ath11k_qmi_driver_event_work
	ath10k: Fix missing frame timestamp for beacon/probe-resp
	ath10k: sdio: Add missing BH locking around napi_schdule()
	drm/ttm: stop calling tt_swapin in vm_access
	arm64: mm: update max_pfn after memory hotplug
	drm/amdgpu: fix warning for overflow check
	media: em28xx: add missing em28xx_close_extension
	media: cxd2880-spi: Fix a null pointer dereference on error handling path
	media: dvb-usb: fix ununit-value in az6027_rc_query
	media: v4l2-ioctl: S_CTRL output the right value
	media: TDA1997x: handle short reads of hdmi info frame.
	media: mtk-vpu: Fix a resource leak in the error handling path of 'mtk_vpu_probe()'
	media: radio-wl1273: Avoid card name truncation
	media: si470x: Avoid card name truncation
	media: tm6000: Avoid card name truncation
	media: cx23885: Fix snd_card_free call on null card pointer
	kprobes: Do not use local variable when creating debugfs file
	crypto: ecc - fix CRYPTO_DEFAULT_RNG dependency
	cpuidle: Fix kobject memory leaks in error paths
	media: em28xx: Don't use ops->suspend if it is NULL
	ath9k: Fix potential interrupt storm on queue reset
	PM: EM: Fix inefficient states detection
	EDAC/amd64: Handle three rank interleaving mode
	rcu: Always inline rcu_dynticks_task*_{enter,exit}()
	netfilter: nft_dynset: relax superfluous check on set updates
	media: dvb-frontends: mn88443x: Handle errors of clk_prepare_enable()
	crypto: qat - detect PFVF collision after ACK
	crypto: qat - disregard spurious PFVF interrupts
	hwrng: mtk - Force runtime pm ops for sleep ops
	b43legacy: fix a lower bounds test
	b43: fix a lower bounds test
	gve: Recover from queue stall due to missed IRQ
	mmc: sdhci-omap: Fix NULL pointer exception if regulator is not configured
	mmc: sdhci-omap: Fix context restore
	memstick: avoid out-of-range warning
	memstick: jmb38x_ms: use appropriate free function in jmb38x_ms_alloc_host()
	net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE
	hwmon: Fix possible memleak in __hwmon_device_register()
	hwmon: (pmbus/lm25066) Let compiler determine outer dimension of lm25066_coeff
	ath10k: fix max antenna gain unit
	kernel/sched: Fix sched_fork() access an invalid sched_task_group
	tcp: switch orphan_count to bare per-cpu counters
	drm/msm: potential error pointer dereference in init()
	drm/msm: uninitialized variable in msm_gem_import()
	net: stream: don't purge sk_error_queue in sk_stream_kill_queues()
	media: ir_toy: assignment to be16 should be of correct type
	mmc: mxs-mmc: disable regulator on error and in the remove function
	platform/x86: thinkpad_acpi: Fix bitwise vs. logical warning
	mt76: mt7615: fix endianness warning in mt7615_mac_write_txwi
	mt76: mt76x02: fix endianness warnings in mt76x02_mac.c
	mt76: mt7915: fix possible infinite loop release semaphore
	mt76: mt7915: fix sta_rec_wtbl tag len
	mt76: mt7915: fix muar_idx in mt7915_mcu_alloc_sta_req()
	rsi: stop thread firstly in rsi_91x_init() error handling
	mwifiex: Send DELBA requests according to spec
	net: enetc: unmap DMA in enetc_send_cmd()
	phy: micrel: ksz8041nl: do not use power down mode
	nvme-rdma: fix error code in nvme_rdma_setup_ctrl
	PM: hibernate: fix sparse warnings
	clocksource/drivers/timer-ti-dm: Select TIMER_OF
	x86/sev: Fix stack type check in vc_switch_off_ist()
	drm/msm: Fix potential NULL dereference in DPU SSPP
	smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi
	KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
	KVM: selftests: Fix nested SVM tests when built with clang
	bpftool: Avoid leaking the JSON writer prepared for program metadata
	libbpf: Fix BTF data layout checks and allow empty BTF
	libbpf: Allow loading empty BTFs
	libbpf: Fix overflow in BTF sanity checks
	libbpf: Fix BTF header parsing checks
	s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
	KVM: s390: pv: avoid double free of sida page
	KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
	irq: mips: avoid nested irq_enter()
	tpm: fix Atmel TPM crash caused by too frequent queries
	tpm_tis_spi: Add missing SPI ID
	libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()
	tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
	spi: spi-rpc-if: Check return value of rpcif_sw_init()
	samples/kretprobes: Fix return value if register_kretprobe() failed
	KVM: s390: Fix handle_sske page fault handling
	libertas_tf: Fix possible memory leak in probe and disconnect
	libertas: Fix possible memory leak in probe and disconnect
	wcn36xx: add proper DMA memory barriers in rx path
	wcn36xx: Fix discarded frames due to wrong sequence number
	drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
	selftests: bpf: Convert sk_lookup ctx access tests to PROG_TEST_RUN
	selftests/bpf: Fix fd cleanup in sk_lookup test
	net: amd-xgbe: Toggle PLL settings during rate change
	net: phylink: avoid mvneta warning when setting pause parameters
	crypto: pcrypt - Delay write to padata->info
	selftests/bpf: Fix fclose/pclose mismatch in test_progs
	udp6: allow SO_MARK ctrl msg to affect routing
	ibmvnic: don't stop queue in xmit
	ibmvnic: Process crqs after enabling interrupts
	cgroup: Fix rootcg cpu.stat guest double counting
	bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
	bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
	of: unittest: fix EXPECT text for gpio hog errors
	iio: st_sensors: Call st_sensors_power_enable() from bus drivers
	iio: st_sensors: disable regulators after device unregistration
	RDMA/rxe: Fix wrong port_cap_flags
	ARM: dts: BCM5301X: Fix memory nodes names
	clk: mvebu: ap-cpu-clk: Fix a memory leak in error handling paths
	ARM: s3c: irq-s3c24xx: Fix return value check for s3c24xx_init_intc()
	arm64: dts: rockchip: Fix GPU register width for RK3328
	ARM: dts: qcom: msm8974: Add xo_board reference clock to DSI0 PHY
	RDMA/bnxt_re: Fix query SRQ failure
	arm64: dts: ti: k3-j721e-main: Fix "max-virtual-functions" in PCIe EP nodes
	arm64: dts: ti: k3-j721e-main: Fix "bus-range" upto 256 bus number for PCIe
	arm64: dts: meson-g12a: Fix the pwm regulator supply properties
	arm64: dts: meson-g12b: Fix the pwm regulator supply properties
	bus: ti-sysc: Fix timekeeping_suspended warning on resume
	ARM: dts: at91: tse850: the emac<->phy interface is rmii
	scsi: dc395: Fix error case unwinding
	MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
	JFS: fix memleak in jfs_mount
	arm64: dts: qcom: msm8916: Fix Secondary MI2S bit clock
	arm64: dts: renesas: beacon: Fix Ethernet PHY mode
	arm64: dts: qcom: pm8916: Remove wrong reg-names for rtc@6000
	ALSA: hda: Reduce udelay() at SKL+ position reporting
	ALSA: hda: Release controller display power during shutdown/reboot
	ALSA: hda: Fix hang during shutdown due to link reset
	ALSA: hda: Use position buffer for SKL+ again
	soundwire: debugfs: use controller id and link_id for debugfs
	scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp()
	driver core: Fix possible memory leak in device_link_add()
	arm: dts: omap3-gta04a4: accelerometer irq fix
	ASoC: SOF: topology: do not power down primary core during topology removal
	soc/tegra: Fix an error handling path in tegra_powergate_power_up()
	memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe
	clk: at91: check pmc node status before registering syscore ops
	video: fbdev: chipsfb: use memset_io() instead of memset()
	powerpc: Refactor is_kvm_guest() declaration to new header
	powerpc: Rename is_kvm_guest() to check_kvm_guest()
	powerpc: Reintroduce is_kvm_guest() as a fast-path check
	powerpc: Fix is_kvm_guest() / kvm_para_available()
	powerpc: fix unbalanced node refcount in check_kvm_guest()
	serial: 8250_dw: Drop wrong use of ACPI_PTR()
	usb: gadget: hid: fix error code in do_config()
	power: supply: rt5033_battery: Change voltage values to µV
	power: supply: max17040: fix null-ptr-deref in max17040_probe()
	scsi: csiostor: Uninitialized data in csio_ln_vnp_read_cbfn()
	RDMA/mlx4: Return missed an error if device doesn't support steering
	usb: musb: select GENERIC_PHY instead of depending on it
	staging: most: dim2: do not double-register the same device
	staging: ks7010: select CRYPTO_HASH/CRYPTO_MICHAEL_MIC
	pinctrl: renesas: checker: Fix off-by-one bug in drive register check
	ARM: dts: stm32: Reduce DHCOR SPI NOR frequency to 50 MHz
	ARM: dts: stm32: fix SAI sub nodes register range
	ARM: dts: stm32: fix AV96 board SAI2 pin muxing on stm32mp15
	ASoC: cs42l42: Correct some register default values
	ASoC: cs42l42: Defer probe if request_threaded_irq() returns EPROBE_DEFER
	soc: qcom: rpmhpd: Provide some missing struct member descriptions
	soc: qcom: rpmhpd: Make power_on actually enable the domain
	usb: typec: STUSB160X should select REGMAP_I2C
	iio: adis: do not disabe IRQs in 'adis_init()'
	scsi: ufs: Refactor ufshcd_setup_clocks() to remove skip_ref_clk
	scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
	serial: imx: fix detach/attach of serial console
	usb: dwc2: drd: fix dwc2_force_mode call in dwc2_ovr_init
	usb: dwc2: drd: fix dwc2_drd_role_sw_set when clock could be disabled
	usb: dwc2: drd: reset current session before setting the new one
	firmware: qcom_scm: Fix error retval in __qcom_scm_is_call_available()
	soc: qcom: apr: Add of_node_put() before return
	pinctrl: equilibrium: Fix function addition in multiple groups
	phy: qcom-qusb2: Fix a memory leak on probe
	phy: ti: gmii-sel: check of_get_address() for failure
	phy: qcom-snps: Correct the FSEL_MASK
	serial: xilinx_uartps: Fix race condition causing stuck TX
	clk: at91: sam9x60-pll: use DIV_ROUND_CLOSEST_ULL
	HID: u2fzero: clarify error check and length calculations
	HID: u2fzero: properly handle timeouts in usb_submit_urb
	powerpc/44x/fsp2: add missing of_node_put
	ASoC: cs42l42: Disable regulators if probe fails
	ASoC: cs42l42: Use device_property API instead of of_property
	ASoC: cs42l42: Correct configuring of switch inversion from ts-inv
	virtio_ring: check desc == NULL when using indirect with packed
	mips: cm: Convert to bitfield API to fix out-of-bounds access
	power: supply: bq27xxx: Fix kernel crash on IRQ handler register error
	apparmor: fix error check
	rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
	nfsd: don't alloc under spinlock in rpc_parse_scope_id
	i2c: mediatek: fixing the incorrect register offset
	NFS: Fix dentry verifier races
	pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
	drm/plane-helper: fix uninitialized variable reference
	PCI: aardvark: Don't spam about PIO Response Status
	PCI: aardvark: Fix preserving PCI_EXP_RTCTL_CRSSVE flag on emulated bridge
	opp: Fix return in _opp_add_static_v2()
	NFS: Fix deadlocks in nfs_scan_commit_list()
	fs: orangefs: fix error return code of orangefs_revalidate_lookup()
	mtd: spi-nor: hisi-sfc: Remove excessive clk_disable_unprepare()
	PCI: uniphier: Serialize INTx masking/unmasking and fix the bit operation
	mtd: core: don't remove debugfs directory if device is in use
	remoteproc: Fix a memory leak in an error handling path in 'rproc_handle_vdev()'
	rtc: rv3032: fix error handling in rv3032_clkout_set_rate()
	dmaengine: at_xdmac: fix AT_XDMAC_CC_PERID() macro
	NFS: Fix up commit deadlocks
	NFS: Fix an Oops in pnfs_mark_request_commit()
	Fix user namespace leak
	auxdisplay: img-ascii-lcd: Fix lock-up when displaying empty string
	auxdisplay: ht16k33: Connect backlight to fbdev
	auxdisplay: ht16k33: Fix frame buffer device blanking
	soc: fsl: dpaa2-console: free buffer before returning from dpaa2_console_read
	netfilter: nfnetlink_queue: fix OOB when mac header was cleared
	dmaengine: dmaengine_desc_callback_valid(): Check for `callback_result`
	signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
	m68k: set a default value for MEMORY_RESERVE
	watchdog: f71808e_wdt: fix inaccurate report in WDIOC_GETTIMEOUT
	ar7: fix kernel builds for compiler test
	scsi: qla2xxx: Changes to support FCP2 Target
	scsi: qla2xxx: Relogin during fabric disturbance
	scsi: qla2xxx: Fix gnl list corruption
	scsi: qla2xxx: Turn off target reset during issue_lip
	NFSv4: Fix a regression in nfs_set_open_stateid_locked()
	i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
	xen-pciback: Fix return in pm_ctrl_init()
	net: davinci_emac: Fix interrupt pacing disable
	ethtool: fix ethtool msg len calculation for pause stats
	openrisc: fix SMP tlb flush NULL pointer dereference
	net: vlan: fix a UAF in vlan_dev_real_dev()
	ice: Fix replacing VF hardware MAC to existing MAC filter
	ice: Fix not stopping Tx queues for VFs
	ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
	drm/nouveau/svm: Fix refcount leak bug and missing check against null bug
	net: phy: fix duplex out of sync problem while changing settings
	bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed
	mfd: core: Add missing of_node_put for loop iteration
	can: mcp251xfd: mcp251xfd_chip_start(): fix error handling for mcp251xfd_chip_rx_int_enable()
	mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
	zram: off by one in read_block_state()
	perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
	llc: fix out-of-bound array index in llc_sk_dev_hash()
	nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails
	arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
	bpf, sockmap: Remove unhash handler for BPF sockmap usage
	bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding
	gve: Fix off by one in gve_tx_timeout()
	seq_file: fix passing wrong private data
	net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any
	net: hns3: fix kernel crash when unload VF while it is being reset
	net: hns3: allow configure ETS bandwidth of all TCs
	net: stmmac: allow a tc-taprio base-time of zero
	vsock: prevent unnecessary refcnt inc for nonblocking connect
	net/smc: fix sk_refcnt underflow on linkdown and fallback
	cxgb4: fix eeprom len when diagnostics not implemented
	selftests/net: udpgso_bench_rx: fix port argument
	ARM: 9155/1: fix early early_iounmap()
	ARM: 9156/1: drop cc-option fallbacks for architecture selection
	parisc: Fix backtrace to always include init funtion names
	MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL
	x86/mce: Add errata workaround for Skylake SKX37
	posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
	irqchip/sifive-plic: Fixup EOI failed when masked
	f2fs: should use GFP_NOFS for directory inodes
	net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE
	9p/net: fix missing error check in p9_check_errors
	memcg: prohibit unconditional exceeding the limit of dying tasks
	powerpc/lib: Add helper to check if offset is within conditional branch range
	powerpc/bpf: Validate branch ranges
	powerpc/security: Add a helper to query stf_barrier type
	powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
	mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
	mm, oom: do not trigger out_of_memory from the #PF
	mfd: dln2: Add cell for initializing DLN2 ADC
	video: backlight: Drop maximum brightness override for brightness zero
	s390/cio: check the subchannel validity for dev_busid
	s390/tape: fix timer initialization in tape_std_assign()
	s390/ap: Fix hanging ioctl caused by orphaned replies
	s390/cio: make ccw_device_dma_* more robust
	mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines
	powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
	powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
	drm/sun4i: Fix macros in sun8i_csc.h
	PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros
	PCI: aardvark: Fix PCIe Max Payload Size setting
	SUNRPC: Partial revert of commit 6f9f17287e
	ath10k: fix invalid dma_addr_t token assignment
	mmc: moxart: Fix null pointer dereference on pointer host
	selftests/bpf: Fix also no-alu32 strobemeta selftest
	arch/cc: Introduce a function to check for confidential computing features
	x86/sev: Add an x86 version of cc_platform_has()
	x86/sev: Make the #VC exception stacks part of the default stacks storage
	soc/tegra: pmc: Fix imbalanced clock disabling in error code path
	Linux 5.10.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I21c750863965fbf584251fa2de3c941ae5922d3f
2021-11-19 11:50:41 +01:00
Josef Bacik
b917f9b946 btrfs: do not take the uuid_mutex in btrfs_rm_device
[ Upstream commit 8ef9dc0f14ba6124c62547a4fdc59b163d8b864e ]

We got the following lockdep splat while running fstests (specifically
btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
converted loop to using workqueues, which comes with lockdep
annotations that don't exist with kworkers.  The lockdep splat is as
follows:

  WARNING: possible circular locking dependency detected
  5.14.0-rc2-custom+ #34 Not tainted
  ------------------------------------------------------
  losetup/156417 is trying to acquire lock:
  ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600

  but task is already holding lock:
  ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
	 __mutex_lock+0xba/0x7c0
	 lo_open+0x28/0x60 [loop]
	 blkdev_get_whole+0x28/0xf0
	 blkdev_get_by_dev.part.0+0x168/0x3c0
	 blkdev_open+0xd2/0xe0
	 do_dentry_open+0x163/0x3a0
	 path_openat+0x74d/0xa40
	 do_filp_open+0x9c/0x140
	 do_sys_openat2+0xb1/0x170
	 __x64_sys_openat+0x54/0x90
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #4 (&disk->open_mutex){+.+.}-{3:3}:
	 __mutex_lock+0xba/0x7c0
	 blkdev_get_by_dev.part.0+0xd1/0x3c0
	 blkdev_get_by_path+0xc0/0xd0
	 btrfs_scan_one_device+0x52/0x1f0 [btrfs]
	 btrfs_control_ioctl+0xac/0x170 [btrfs]
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #3 (uuid_mutex){+.+.}-{3:3}:
	 __mutex_lock+0xba/0x7c0
	 btrfs_rm_device+0x48/0x6a0 [btrfs]
	 btrfs_ioctl+0x2d1c/0x3110 [btrfs]
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #2 (sb_writers#11){.+.+}-{0:0}:
	 lo_write_bvec+0x112/0x290 [loop]
	 loop_process_work+0x25f/0xcb0 [loop]
	 process_one_work+0x28f/0x5d0
	 worker_thread+0x55/0x3c0
	 kthread+0x140/0x170
	 ret_from_fork+0x22/0x30

  -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
	 process_one_work+0x266/0x5d0
	 worker_thread+0x55/0x3c0
	 kthread+0x140/0x170
	 ret_from_fork+0x22/0x30

  -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
	 __lock_acquire+0x1130/0x1dc0
	 lock_acquire+0xf5/0x320
	 flush_workqueue+0xae/0x600
	 drain_workqueue+0xa0/0x110
	 destroy_workqueue+0x36/0x250
	 __loop_clr_fd+0x9a/0x650 [loop]
	 lo_ioctl+0x29d/0x780 [loop]
	 block_ioctl+0x3f/0x50
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  other info that might help us debug this:
  Chain exists of:
    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
   Possible unsafe locking scenario:
	 CPU0                    CPU1
	 ----                    ----
    lock(&lo->lo_mutex);
				 lock(&disk->open_mutex);
				 lock(&lo->lo_mutex);
    lock((wq_completion)loop0);

   *** DEADLOCK ***
  1 lock held by losetup/156417:
   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

  stack backtrace:
  CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x57/0x72
   check_noncircular+0x10a/0x120
   __lock_acquire+0x1130/0x1dc0
   lock_acquire+0xf5/0x320
   ? flush_workqueue+0x84/0x600
   flush_workqueue+0xae/0x600
   ? flush_workqueue+0x84/0x600
   drain_workqueue+0xa0/0x110
   destroy_workqueue+0x36/0x250
   __loop_clr_fd+0x9a/0x650 [loop]
   lo_ioctl+0x29d/0x780 [loop]
   ? __lock_acquire+0x3a0/0x1dc0
   ? update_dl_rq_load_avg+0x152/0x360
   ? lock_is_held_type+0xa5/0x120
   ? find_held_lock.constprop.0+0x2b/0x80
   block_ioctl+0x3f/0x50
   __x64_sys_ioctl+0x83/0xb0
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f645884de6b

Usually the uuid_mutex exists to protect the fs_devices that map
together all of the devices that match a specific uuid.  In rm_device
we're messing with the uuid of a device, so it makes sense to protect
that here.

However in doing that it pulls in a whole host of lockdep dependencies,
as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
we end up with the dependency chain under the uuid_mutex being added
under the normal sb write dependency chain, which causes problems with
loop devices.

We don't need the uuid mutex here however.  If we call
btrfs_scan_one_device() before we scratch the super block we will find
the fs_devices and not find the device itself and return EBUSY because
the fs_devices is open.  If we call it after the scratch happens it will
not appear to be a valid btrfs file system.

We do not need to worry about other fs_devices modifying operations here
because we're protected by the exclusive operations locking.

So drop the uuid_mutex here in order to fix the lockdep splat.

A more detailed explanation from the discussion:

We are worried about rm and scan racing with each other, before this
change we'll zero the device out under the UUID mutex so when scan does
run it'll make sure that it can go through the whole device scan thing
without rm messing with us.

We aren't worried if the scratch happens first, because the result is we
don't think this is a btrfs device and we bail out.

The only case we are concerned with is we scratch _after_ scan is able
to read the superblock and gets a seemingly valid super block, so lets
consider this case.

Scan will call device_list_add() with the device we're removing.  We'll
call find_fsid_with_metadata_uuid() and get our fs_devices for this
UUID.  At this point we lock the fs_devices->device_list_mutex.  This is
what protects us in this case, but we have two cases here.

1. We aren't to the device removal part of the RM.  We found our device,
   and device name matches our path, we go down and we set total_devices
   to our super number of devices, which doesn't affect anything because
   we haven't done the remove yet.

2. We are past the device removal part, which is protected by the
   device_list_mutex.  Scan doesn't find the device, it goes down and
   does the

   if (fs_devices->opened)
	   return -EBUSY;

   check and we bail out.

Nothing about this situation is ideal, but the lockdep splat is real,
and the fix is safe, tho admittedly a bit scary looking.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copy more from the discussion ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:04:00 +01:00
Sidong Yang
428bb3d71e btrfs: reflink: initialize return value to 0 in btrfs_extent_same()
[ Upstream commit 44bee215f72f13874c0e734a0712c2e3264c0108 ]

Fix a warning reported by smatch that ret could be returned without
initialized.  The dedupe operations are supposed to to return 0 for a 0
length range but the caller does not pass olen == 0. To keep this
behaviour and also fix the warning initialize ret to 0.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:04:00 +01:00
Anand Jain
b4a4c9dc44 btrfs: call btrfs_check_rw_degradable only if there is a missing device
commit 5c78a5e7aa835c4f08a7c90fe02d19f95a776f29 upstream.

In open_ctree() in btrfs_check_rw_degradable() [1], we check each block
group individually if at least the minimum number of devices is available
for that profile. If all the devices are available, then we don't have to
check degradable.

[1]
open_ctree()
::
3559 if (!sb_rdonly(sb) && !btrfs_check_rw_degradable(fs_info, NULL)) {

Also before calling btrfs_check_rw_degradable() in open_ctee() at the
line number shown below [2] we call btrfs_read_chunk_tree() and down to
add_missing_dev() to record number of missing devices.

[2]
open_ctree()
::
3454         ret = btrfs_read_chunk_tree(fs_info);

btrfs_read_chunk_tree()
  read_one_chunk() / read_one_dev()
    add_missing_dev()

So, check if there is any missing device before btrfs_check_rw_degradable()
in open_ctree().

Also, with this the mount command could save ~16ms.[3] in the most
common case, that is no device is missing.

[3]
 1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 14:03:44 +01:00
Filipe Manana
b406439afe btrfs: fix lost error handling when replaying directory deletes
commit 10adb1152d957a4d570ad630f93a88bb961616c1 upstream.

At replay_dir_deletes(), if find_dir_range() returns an error we break out
of the main while loop and then assign a value of 0 (success) to the 'ret'
variable, resulting in completely ignoring that an error happened. Fix
that by jumping to the 'out' label when find_dir_range() returns an error
(negative value).

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 14:03:44 +01:00
Li Zhang
8992aab294 btrfs: clear MISSING device status bit in btrfs_close_one_device
commit 5d03dbebba2594d2e6fbf3b5dd9060c5a835de3b upstream.

Reported bug: https://github.com/kdave/btrfs-progs/issues/389

There's a problem with scrub reporting aborted status but returning
error code 0, on a filesystem with missing and readded device.

Roughly these steps:

- mkfs -d raid1 dev1 dev2
- fill with data
- unmount
- make dev1 disappear
- mount -o degraded
- copy more data
- make dev1 appear again

Running scrub afterwards reports that the command was aborted, but the
system log message says the exit code was 0.

It seems that the cause of the error is decrementing
fs_devices->missing_devices but not clearing device->dev_state.  Every
time we umount filesystem, it would call close_ctree, And it would
eventually involve btrfs_close_one_device to close the device, but it
only decrements fs_devices->missing_devices but does not clear the
device BTRFS_DEV_STATE_MISSING bit. Worse, this bug will cause Integer
Overflow, because every time umount, fs_devices->missing_devices will
decrease. If fs_devices->missing_devices value hit 0, it would overflow.

With added debugging:

   loop1: detected capacity change from 0 to 20971520
   BTRFS: device fsid 56ad51f1-5523-463b-8547-c19486c51ebb devid 1 transid 21 /dev/loop1 scanned by systemd-udevd (2311)
   loop2: detected capacity change from 0 to 20971520
   BTRFS: device fsid 56ad51f1-5523-463b-8547-c19486c51ebb devid 2 transid 17 /dev/loop2 scanned by systemd-udevd (2313)
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): using free space tree
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000f706684d /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 6635ac31-56dd-4852-873b-c60f5e2d53d2 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): using free space tree
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000f706684d /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 6635ac31-56dd-4852-873b-c60f5e2d53d2 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 0
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): using free space tree
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000f706684d /dev/loop1 18446744073709551615
   BTRFS warning (device loop1): devid 2 uuid 6635ac31-56dd-4852-873b-c60f5e2d53d2 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 18446744073709551615

If fs_devices->missing_devices is 0, next time it would be 18446744073709551615

After apply this patch, the fs_devices->missing_devices seems to be
right:

  $ truncate -s 10g test1
  $ truncate -s 10g test2
  $ losetup /dev/loop1 test1
  $ losetup /dev/loop2 test2
  $ mkfs.btrfs -draid1 -mraid1 /dev/loop1 /dev/loop2 -f
  $ losetup -d /dev/loop2
  $ mount -o degraded /dev/loop1 /mnt/1
  $ umount /mnt/1
  $ mount -o degraded /dev/loop1 /mnt/1
  $ umount /mnt/1
  $ mount -o degraded /dev/loop1 /mnt/1
  $ umount /mnt/1
  $ dmesg

   loop1: detected capacity change from 0 to 20971520
   loop2: detected capacity change from 0 to 20971520
   BTRFS: device fsid 15aa1203-98d3-4a66-bcae-ca82f629c2cd devid 1 transid 5 /dev/loop1 scanned by mkfs.btrfs (1863)
   BTRFS: device fsid 15aa1203-98d3-4a66-bcae-ca82f629c2cd devid 2 transid 5 /dev/loop2 scanned by mkfs.btrfs (1863)
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): disk space caching is enabled
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000975bd577 /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 8b333791-0b3f-4f57-b449-1c1ab6b51f38 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1
   BTRFS info (device loop1): checking UUID tree
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): disk space caching is enabled
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000975bd577 /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 8b333791-0b3f-4f57-b449-1c1ab6b51f38 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): disk space caching is enabled
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000975bd577 /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 8b333791-0b3f-4f57-b449-1c1ab6b51f38 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 14:03:44 +01:00
Greg Kroah-Hartman
4944ec82eb This is the 5.10.76 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmF5BlMACgkQONu9yGCS
 aT4OIQ/+PY9YFEKTRlW/TDDqnc3KxpTsSDgpyEHzDwW/zqzTu9ja36suqz0SGu+S
 SyY9tp3g7A+Ty7m7Xd6UKFcrfb6+fWN3+YqglXqgQ/VH1rGOhZiVfOZ+yBsF1D3O
 hQZRhnxYn1IJRpvKdTiw57RvjwuZO82QlXpd2SCt0crfclN/HeTvUpTxq6bllE+H
 X5TANiN0NVJgVBQvoztXQS7I6oXcAeGCoXsQ4m7S28apKBgax2EXHewFIOrjq3ub
 SVFeD1oWcHjGPcln/kv8HNGAZXZivsARl+ag8BBWmtC59dZjH/niFEcq1qVmwjBj
 d3YcBUP1nfsY2PhyXjxaUQ7gdEfFieL8wM9mYBmGyWempVWSfGAPe/td0IR7NsR9
 hwmHs4VNFLhtzEZW8OSpCV1f7dNo1WZZshCrRotSZ9QEZP/TZ84oXFFLl72C0tUI
 4Jfx5Ll3xqBjBQdDthWb99Rg0ZBQVPox/jnZMBCxk3NXtLhREmjwP4Sx8Wf635xE
 kfCGb0HkgjrQiQX0A+AyPwFAOTTrTe3WDZ5aYB4r2FwFzhySQw5ol8SmOEsZt2om
 aAEPjmut8HoM+Ch2Q9NGZghZGyePiqyxNTh9dL7G3U+Wss2xpBzpm1loCpm8OHr3
 KxwqhQCdKNxfVuPjK49N6QDAgmuhqtOx6S1z0WcMUuaZ7EZkac8=
 =zASi
 -----END PGP SIGNATURE-----

Merge 5.10.76 into android12-5.10-lts

Changes in 5.10.76
	parisc: math-emu: Fix fall-through warnings
	xhci: add quirk for host controllers that don't update endpoint DCS
	io_uring: fix splice_fd_in checks backport typo
	arm: dts: vexpress-v2p-ca9: Fix the SMB unit-address
	ARM: dts: at91: sama5d2_som1_ek: disable ISC node by default
	block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output
	xen/x86: prevent PVH type from getting clobbered
	drm/amdgpu/display: fix dependencies for DRM_AMD_DC_SI
	xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF
	xtensa: xtfpga: Try software restart before simulating CPU reset
	NFSD: Keep existing listeners on portlist error
	netfilter: xt_IDLETIMER: fix panic that occurs when timer_type has garbage value
	dma-debug: fix sg checks in debug_dma_map_sg()
	ASoC: wm8960: Fix clock configuration on slave mode
	ice: fix getting UDP tunnel entry
	netfilter: ip6t_rt: fix rt0_hdr parsing in rt_mt6
	netfilter: ipvs: make global sysctl readonly in non-init netns
	lan78xx: select CRC32
	tcp: md5: Fix overlap between vrf and non-vrf keys
	ipv6: When forwarding count rx stats on the orig netdev
	net: dsa: lantiq_gswip: fix register definition
	NIOS2: irqflags: rename a redefined register name
	powerpc/smp: do not decrement idle task preempt count in CPU offline
	net: hns3: reset DWRR of unused tc to zero
	net: hns3: add limit ets dwrr bandwidth cannot be 0
	net: hns3: schedule the polling again when allocation fails
	net: hns3: fix vf reset workqueue cannot exit
	net: hns3: disable sriov before unload hclge layer
	net: stmmac: Fix E2E delay mechanism
	e1000e: Fix packet loss on Tiger Lake and later
	ice: Add missing E810 device ids
	drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
	net: enetc: fix ethtool counter name for PM0_TERR
	can: rcar_can: fix suspend/resume
	can: peak_usb: pcan_usb_fd_decode_status(): fix back to ERROR_ACTIVE state notification
	can: peak_pci: peak_pci_remove(): fix UAF
	can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path
	can: isotp: isotp_sendmsg(): add result check for wait_event_interruptible()
	can: j1939: j1939_tp_rxtimer(): fix errant alert in j1939_tp_rxtimer
	can: j1939: j1939_netdev_start(): fix UAF for rx_kref of j1939_priv
	can: j1939: j1939_xtp_rx_dat_one(): cancel session if receive TP.DT with error length
	can: j1939: j1939_xtp_rx_rts_session_new(): abort TP less than 9 bytes
	ceph: skip existing superblocks that are blocklisted or shut down when mounting
	ceph: fix handling of "meta" errors
	ocfs2: fix data corruption after conversion from inline format
	ocfs2: mount fails with buffer overflow in strlen
	userfaultfd: fix a race between writeprotect and exit_mmap()
	elfcore: correct reference to CONFIG_UML
	vfs: check fd has read access in kernel_read_file_from_fd()
	ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset
	ALSA: hda/realtek: Add quirk for Clevo PC50HS
	ASoC: DAPM: Fix missing kctl change notifications
	audit: fix possible null-pointer dereference in audit_filter_rules
	net: dsa: mt7530: correct ds->num_ports
	powerpc64/idle: Fix SP offsets when saving GPRs
	KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
	KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
	powerpc/idle: Don't corrupt back chain when going idle
	mm, slub: fix mismatch between reconstructed freelist depth and cnt
	mm, slub: fix potential memoryleak in kmem_cache_open()
	mm, slub: fix incorrect memcg slab count for bulk free
	KVM: nVMX: promptly process interrupts delivered while in guest mode
	nfc: nci: fix the UAF of rf_conn_info object
	isdn: cpai: check ctr->cnr to avoid array index out of bound
	netfilter: Kconfig: use 'default y' instead of 'm' for bool config option
	selftests: netfilter: remove stray bash debug line
	net: bridge: mcast: use multicast_membership_interval for IGMPv3
	drm: mxsfb: Fix NULL pointer dereference crash on unload
	net: hns3: fix the max tx size according to user manual
	gcc-plugins/structleak: add makefile var for disabling structleak
	ALSA: hda: intel: Allow repeatedly probing on codec configuration errors
	btrfs: deal with errors when checking if a dir entry exists during log replay
	net: stmmac: add support for dwmac 3.40a
	ARM: dts: spear3xx: Fix gmac node
	isdn: mISDN: Fix sleeping function called from invalid context
	platform/x86: intel_scu_ipc: Update timeout value in comment
	ALSA: hda: avoid write to STATESTS if controller is in reset
	libperf tests: Fix test_stat_cpu
	perf/x86/msr: Add Sapphire Rapids CPU support
	Input: snvs_pwrkey - add clk handling
	scsi: iscsi: Fix set_param() handling
	scsi: qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()
	sched/scs: Reset the shadow stack when idle_task_exit
	net: hns3: fix for miscalculation of rx unused desc
	scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
	can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()
	s390/pci: fix zpci_zdev_put() on reserve
	bpf, test, cgroup: Use sk_{alloc,free} for test cases
	net: mdiobus: Fix memory leak in __mdiobus_register
	tracing: Have all levels of checks prevent recursion
	e1000e: Separate TGP board type from SPT
	selftests: bpf: fix backported ASSERT_FALSE
	ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
	pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume()
	Linux 5.10.76

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia2eae7445f275464721daabb414beadf1e244c56
2021-10-27 10:43:17 +02:00
Filipe Manana
f9d16a4284 btrfs: deal with errors when checking if a dir entry exists during log replay
[ Upstream commit 77a5b9e3d14cbce49ceed2766b2003c034c066dc ]

Currently inode_in_dir() ignores errors returned from
btrfs_lookup_dir_index_item() and from btrfs_lookup_dir_item(), treating
any errors as if the directory entry does not exists in the fs/subvolume
tree, which is obviously not correct, as we can get errors such as -EIO
when reading extent buffers while searching the fs/subvolume's tree.

Fix that by making inode_in_dir() return the errors and making its only
caller, add_inode_ref(), deal with returned errors as well.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27 09:56:54 +02:00
Greg Kroah-Hartman
221975092a This is the 5.10.75 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFv5S8ACgkQONu9yGCS
 aT7IjQ/8CFAU/7DKwb6LMl8OOzui8yStdO2Dta2P0XfsfLVBcyFp0AZg9ODKRnzi
 6JbUtmDxak/hobVPGEl8zaF8438+vXX85evqj0Ufq8fV8HFiHHmrC6CETz4fHY18
 GJ6IaWkMx8dhIW5cmgVyPdBZ3dEOSyuJd74tq8GajyAYN7kdHTNxtWx8platQ2N7
 q8hcfAmwBEhaZIKUU6mBLZodIFg1+guqPliXrNEnO+1YTTcBSMaG3plYH7C+7x/r
 vSimoSGDj2o2P0eFOpGrl9mXcXuXuvdCjeonp/O881Kjkbib3gYeTq32WcKa5e2R
 cQJKDEk5DCzbXxm+FuFdczId36n0c36pb2E6XCxPbq9+S+U0T3uUoNxSU8Qjgpt1
 0B8z5QhEahzEJkvM0h4Ph5NVCZ8nIzWxnCAC6KbjMerQQ/4JtrAjY7BSPcN9TCfH
 EzfjpUyEjBey5gP48/a7nRHXlX968XkN2qpEeG77OilSHNp1V7ZSkHkq9rUOBFJl
 MF3EM9kHwtbmDl9X1fdd6qWSItFBZsAuZZl17wMsZhacSZooNlsL1dEz+wmQkUeT
 1k0rn1ZJ0EECrFgbXfPHHwYBrEx/F2QF0iCiDBq4RwWlFgwche8UqJDFNn97alz2
 Gyy80hf62YaDYnQgRLPOQjlwFnfAXth0c0biTETmIWTbkwof2fw=
 =mAv1
 -----END PGP SIGNATURE-----

Merge 5.10.75 into android12-5.10-lts

Changes in 5.10.75
	ALSA: usb-audio: Add quirk for VF0770
	ALSA: pcm: Workaround for a wrong offset in SYNC_PTR compat ioctl
	ALSA: seq: Fix a potential UAF by wrong private_free call order
	ALSA: hda/realtek: Enable 4-speaker output for Dell Precision 5560 laptop
	ALSA: hda - Enable headphone mic on Dell Latitude laptops with ALC3254
	ALSA: hda/realtek: Complete partial device name to avoid ambiguity
	ALSA: hda/realtek: Add quirk for Clevo X170KM-G
	ALSA: hda/realtek - ALC236 headset MIC recording issue
	ALSA: hda/realtek: Add quirk for TongFang PHxTxX1
	ALSA: hda/realtek: Fix for quirk to enable speaker output on the Lenovo 13s Gen2
	ALSA: hda/realtek: Fix the mic type detection issue for ASUS G551JW
	nds32/ftrace: Fix Error: invalid operands (*UND* and *UND* sections) for `^'
	s390: fix strrchr() implementation
	clk: socfpga: agilex: fix duplicate s2f_user0_clk
	csky: don't let sigreturn play with priveleged bits of status register
	csky: Fixup regs.sr broken in ptrace
	arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE
	drm/msm: Avoid potential overflow in timeout_to_jiffies()
	btrfs: unlock newly allocated extent buffer after error
	btrfs: deal with errors when replaying dir entry during log replay
	btrfs: deal with errors when adding inode reference during log replay
	btrfs: check for error when looking up inode during dir entry replay
	btrfs: update refs for any root except tree log roots
	btrfs: fix abort logic in btrfs_replace_file_extents
	x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
	mei: me: add Ice Lake-N device id.
	USB: xhci: dbc: fix tty registration race
	xhci: guard accesses to ep_state in xhci_endpoint_reset()
	xhci: Fix command ring pointer corruption while aborting a command
	xhci: Enable trust tx length quirk for Fresco FL11 USB controller
	cb710: avoid NULL pointer subtraction
	efi/cper: use stack buffer for error record decoding
	efi: Change down_interruptible() in virt_efi_reset_system() to down_trylock()
	usb: musb: dsps: Fix the probe error path
	Input: xpad - add support for another USB ID of Nacon GC-100
	USB: serial: qcserial: add EM9191 QDL support
	USB: serial: option: add Quectel EC200S-CN module support
	USB: serial: option: add Telit LE910Cx composition 0x1204
	USB: serial: option: add prod. id for Quectel EG91
	misc: fastrpc: Add missing lock before accessing find_vma()
	virtio: write back F_VERSION_1 before validate
	EDAC/armada-xp: Fix output of uncorrectable error counter
	nvmem: Fix shift-out-of-bound (UBSAN) with byte size cells
	x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically
	powerpc/xive: Discard disabled interrupts in get_irqchip_state()
	iio: adc: aspeed: set driver data when adc probe.
	drivers: bus: simple-pm-bus: Add support for probing simple bus only devices
	driver core: Reject pointless SYNC_STATE_ONLY device links
	iio: adc: ad7192: Add IRQ flag
	iio: adc: ad7780: Fix IRQ flag
	iio: adc: ad7793: Fix IRQ flag
	iio: adc128s052: Fix the error handling path of 'adc128_probe()'
	iio: adc: max1027: Fix wrong shift with 12-bit devices
	iio: mtk-auxadc: fix case IIO_CHAN_INFO_PROCESSED
	iio: light: opt3001: Fixed timeout error when 0 lux
	iio: adc: max1027: Fix the number of max1X31 channels
	iio: ssp_sensors: add more range checking in ssp_parse_dataframe()
	iio: ssp_sensors: fix error code in ssp_print_mcu_debug()
	iio: dac: ti-dac5571: fix an error code in probe()
	tee: optee: Fix missing devices unregister during optee_remove
	ARM: dts: bcm2711-rpi-4-b: Fix usb's unit address
	ARM: dts: bcm2711: fix MDIO #address- and #size-cells
	ARM: dts: bcm2711-rpi-4-b: fix sd_io_1v8_reg regulator states
	ARM: dts: bcm2711-rpi-4-b: Fix pcie0's unit address formatting
	nvme-pci: Fix abort command id
	sctp: account stream padding length for reconf chunk
	gpio: pca953x: Improve bias setting
	net: arc: select CRC32
	net: korina: select CRC32
	net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
	net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
	net: stmmac: fix get_hw_feature() on old hardware
	net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
	net: encx24j600: check error in devm_regmap_init_encx24j600
	ethernet: s2io: fix setting mac address during resume
	vhost-vdpa: Fix the wrong input in config_cb
	nfc: fix error handling of nfc_proto_register()
	NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
	NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
	pata_legacy: fix a couple uninitialized variable bugs
	ata: ahci_platform: fix null-ptr-deref in ahci_platform_enable_regulators()
	mlxsw: thermal: Fix out-of-bounds memory accesses
	platform/mellanox: mlxreg-io: Fix argument base in kstrtou32() call
	platform/mellanox: mlxreg-io: Fix read access of n-bytes size attributes
	spi: bcm-qspi: clear MSPI spifie interrupt during probe
	drm/panel: olimex-lcd-olinuxino: select CRC32
	drm/edid: In connector_bad_edid() cap num_of_ext by num_blocks read
	drm/msm: Fix null pointer dereference on pointer edp
	drm/msm/mdp5: fix cursor-related warnings
	drm/msm/a6xx: Track current ctx by seqno
	drm/msm/dsi: Fix an error code in msm_dsi_modeset_init()
	drm/msm/dsi: fix off by one in dsi_bus_clk_enable error handling
	acpi/arm64: fix next_platform_timer() section mismatch error
	platform/x86: intel_scu_ipc: Fix busy loop expiry time
	mqprio: Correct stats in mqprio_dump_class_stats().
	qed: Fix missing error code in qed_slowpath_start()
	r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256
	nfp: flow_offload: move flow_indr_dev_register from app init to app start
	net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
	ionic: don't remove netdev->dev_addr when syncing uc list
	net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's
	Linux 5.10.75

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0c71baf6e376f0983e4a34d950c2bba7e226b564
2021-10-20 11:53:41 +02:00
Josef Bacik
0e32a2b85c btrfs: fix abort logic in btrfs_replace_file_extents
commit 4afb912f439c4bc4e6a4f3e7547f2e69e354108f upstream.

Error injection testing uncovered a case where we'd end up with a
corrupt file system with a missing extent in the middle of a file.  This
occurs because the if statement to decide if we should abort is wrong.

The only way we would abort in this case is if we got a ret !=
-EOPNOTSUPP and we called from the file clone code.  However the
prealloc code uses this path too.  Instead we need to abort if there is
an error, and the only error we _don't_ abort on is -EOPNOTSUPP and only
if we came from the clone file code.

CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 11:44:59 +02:00
Josef Bacik
52924879ed btrfs: update refs for any root except tree log roots
commit d175209be04d7d263fa1a54cde7608c706c9d0d7 upstream.

I hit a stuck relocation on btrfs/061 during my overnight testing.  This
turned out to be because we had left over extent entries in our extent
root for a data reloc inode that no longer existed.  This happened
because in btrfs_drop_extents() we only update refs if we have SHAREABLE
set or we are the tree_root.  This regression was introduced by
aeb935a455 ("btrfs: don't set SHAREABLE flag for data reloc tree")
where we stopped setting SHAREABLE for the data reloc tree.

The problem here is we actually do want to update extent references for
data extents in the data reloc tree, in fact we only don't want to
update extent references if the file extents are in the log tree.
Update this check to only skip updating references in the case of the
log tree.

This is relatively rare, because you have to be running scrub at the
same time, which is what btrfs/061 does.  The data reloc inode has its
extents pre-allocated, and then we copy the extent into the
pre-allocated chunks.  We theoretically should never be calling
btrfs_drop_extents() on a data reloc inode.  The exception of course is
with scrub, if our pre-allocated extent falls inside of the block group
we are scrubbing, then the block group will be marked read only and we
will be forced to cow that extent.  This means we will call
btrfs_drop_extents() on that range when we COW that file extent.

This isn't really problematic if we do this, the data reloc inode
requires that our extent lengths match exactly with the extent we are
copying, thankfully we validate the extent is correct with
get_new_location(), so if we happen to COW only part of the extent we
won't link it in when we do the relocation, so we are safe from any
other shenanigans that arise because of this interaction with scrub.

Fixes: aeb935a455 ("btrfs: don't set SHAREABLE flag for data reloc tree")
CC: stable@vger.kernel.org # 5.8+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 11:44:59 +02:00
Filipe Manana
352349aa49 btrfs: check for error when looking up inode during dir entry replay
commit cfd312695b71df04c3a2597859ff12c470d1e2e4 upstream.

At replay_one_name(), we are treating any error from btrfs_lookup_inode()
as if the inode does not exists. Fix this by checking for an error and
returning it to the caller.

CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 11:44:59 +02:00
Filipe Manana
4ed68471bc btrfs: deal with errors when adding inode reference during log replay
commit 52db77791fe24538c8aa2a183248399715f6b380 upstream.

At __inode_add_ref(), we treating any error returned from
btrfs_lookup_dir_item() or from btrfs_lookup_dir_index_item() as meaning
that there is no existing directory entry in the fs/subvolume tree.
This is not correct since we can get errors such as, for example, -EIO
when reading extent buffers while searching the fs/subvolume's btree.

So fix that and return the error to the caller when it is not -ENOENT.

CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 11:44:59 +02:00
Filipe Manana
95d3aba5fe btrfs: deal with errors when replaying dir entry during log replay
commit e15ac6413745e3def00e663de00aea5a717311c1 upstream.

At replay_one_one(), we are treating any error returned from
btrfs_lookup_dir_item() or from btrfs_lookup_dir_index_item() as meaning
that there is no existing directory entry in the fs/subvolume tree.
This is not correct since we can get errors such as, for example, -EIO
when reading extent buffers while searching the fs/subvolume's btree.

So fix that and return the error to the caller when it is not -ENOENT.

CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 11:44:59 +02:00
Qu Wenruo
206868a5b6 btrfs: unlock newly allocated extent buffer after error
commit 19ea40dddf1833db868533958ca066f368862211 upstream.

[BUG]
There is a bug report that injected ENOMEM error could leave a tree
block locked while we return to user-space:

  BTRFS info (device loop0): enabling ssd optimizations
  FAULT_INJECTION: forcing a failure.
  name failslab, interval 1, probability 0, space 0, times 0
  CPU: 0 PID: 7579 Comm: syz-executor Not tainted 5.15.0-rc1 #16
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:88 [inline]
   dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
   fail_dump lib/fault-inject.c:52 [inline]
   should_fail+0x13c/0x160 lib/fault-inject.c:146
   should_failslab+0x5/0x10 mm/slab_common.c:1328
   slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
   slab_alloc_node mm/slub.c:3120 [inline]
   slab_alloc mm/slub.c:3214 [inline]
   kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
   btrfs_alloc_delayed_extent_op fs/btrfs/delayed-ref.h:299 [inline]
   btrfs_alloc_tree_block+0x38c/0x670 fs/btrfs/extent-tree.c:4833
   __btrfs_cow_block+0x16f/0x7d0 fs/btrfs/ctree.c:415
   btrfs_cow_block+0x12a/0x300 fs/btrfs/ctree.c:570
   btrfs_search_slot+0x6b0/0xee0 fs/btrfs/ctree.c:1768
   btrfs_insert_empty_items+0x80/0xf0 fs/btrfs/ctree.c:3905
   btrfs_new_inode+0x311/0xa60 fs/btrfs/inode.c:6530
   btrfs_create+0x12b/0x270 fs/btrfs/inode.c:6783
   lookup_open+0x660/0x780 fs/namei.c:3282
   open_last_lookups fs/namei.c:3352 [inline]
   path_openat+0x465/0xe20 fs/namei.c:3557
   do_filp_open+0xe3/0x170 fs/namei.c:3588
   do_sys_openat2+0x357/0x4a0 fs/open.c:1200
   do_sys_open+0x87/0xd0 fs/open.c:1216
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x46ae99
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48
  89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
  01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f46711b9c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
  RAX: ffffffffffffffda RBX: 000000000078c0a0 RCX: 000000000046ae99
  RDX: 0000000000000000 RSI: 00000000000000a1 RDI: 0000000020005800
  RBP: 00007f46711b9c80 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000017
  R13: 0000000000000000 R14: 000000000078c0a0 R15: 00007ffc129da6e0

  ================================================
  WARNING: lock held when returning to user space!
  5.15.0-rc1 #16 Not tainted
  ------------------------------------------------
  syz-executor/7579 is leaving the kernel with locks still held!
  1 lock held by syz-executor/7579:
   #0: ffff888104b73da8 (btrfs-tree-01/1){+.+.}-{3:3}, at:
  __btrfs_tree_lock+0x2e/0x1a0 fs/btrfs/locking.c:112

[CAUSE]
In btrfs_alloc_tree_block(), after btrfs_init_new_buffer(), the new
extent buffer @buf is locked, but if later operations like adding
delayed tree ref fail, we just free @buf without unlocking it,
resulting above warning.

[FIX]
Unlock @buf in out_free_buf: label.

Reported-by: Hao Sun <sunhao.th@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsZ9O6Zr0KK1yGn=1rQi6Crh1yeCRdTSBxx9R99L4xdn-Q@mail.gmail.com/
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 11:44:59 +02:00
Greg Kroah-Hartman
d306ef529c This is the 5.10.72 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFhkHIACgkQONu9yGCS
 aT71tg/9H5raC+vzSNbv+IqXb7NDvNBTAKIYqZGfqlVzkPO53UJ5lCsjNkA8nV2J
 zeYsF20Cmc5labconbe2i4AkBegGvPhYYOvB5hN/8kOdy/4cFDtYKCh5MtK4LjJp
 kx/7u5hvJThbF3k5l3aNJtgP9osMH9oVAegtcDiaRXBLcpbW+h3Y/u05mcFWkyRC
 ixiQDZBeuiwLfLr8kiNpQj8SlO+/yZKBp6hnPQ8I3uxA2V/rJQKpiHUj0jIa/XDA
 KzeAoRuKSao1AIttgolNBVGPscoqDAHgJplfDkbqK0XZpZG7cQP3tcKA7LbD26ge
 QGehpekyyeMTsUELB4msY02CxUIdmhGiRdRvhXiDjsbcrOr1fDmYV2kbdZIGIvec
 7BQuqUXlcSVfLDDFe6rbqPgcY8lIoer0jKe1ocGlaPwOGNRbQreFR6ZxhteW/MQk
 fv5cfye1GpB83xcvvSpTu58IxxiP/SvhlqApkBMIE4Q5l+yixud8YEUpZmWE/Ppk
 uxE2BEFZ+qKhUw1mmK3YHoM1BEw/QvL2rKuxg9Vv90UJHBHatwsVOM4EKeNfrCe7
 uiIrReDwiNJgdFv8E178uFZuzkxJEnrs6TMzeNTRAtyFBE17VpIXCxm2sJLPr4U0
 Ixfj+3A5tboBI90zOFLJeVmgzBDKB5JN2B1ADxBjxXgqJ09qHTs=
 =t0Zc
 -----END PGP SIGNATURE-----

Merge 5.10.72 into android12-5.10-lts

Changes in 5.10.72
	spi: rockchip: handle zero length transfers without timing out
	platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet
	platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet
	nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
	btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
	btrfs: fix mount failure due to past and transient device flush error
	net: mdio: introduce a shutdown method to mdio device drivers
	xen-netback: correct success/error reporting for the SKB-with-fraglist case
	sparc64: fix pci_iounmap() when CONFIG_PCI is not set
	ext2: fix sleeping in atomic bugs on error
	scsi: sd: Free scsi_disk device via put_device()
	usb: testusb: Fix for showing the connection speed
	usb: dwc2: check return value after calling platform_get_resource()
	habanalabs/gaudi: fix LBW RR configuration
	selftests: be sure to make khdr before other targets
	selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
	nvme-fc: update hardware queues before using them
	nvme-fc: avoid race between time out and tear down
	thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
	scsi: ses: Retry failed Send/Receive Diagnostic commands
	irqchip/gic: Work around broken Renesas integration
	smb3: correct smb3 ACL security descriptor
	tools/vm/page-types: remove dependency on opt_file for idle page tracking
	selftests: KVM: Align SMCCC call with the spec in steal_time
	KVM: do not shrink halt_poll_ns below grow_start
	kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]
	KVM: x86: nSVM: restore int_vector in svm_clear_vintr
	perf/x86: Reset destroy callback on event init failure
	libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
	Linux 5.10.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9358a2f80cce03b61fd003405e1c78433644015c
2021-10-09 15:04:00 +02:00
Filipe Manana
63c89930d4 btrfs: fix mount failure due to past and transient device flush error
[ Upstream commit 6b225baababf1e3d41a4250e802cbd193e1343fb ]

When we get an error flushing one device, during a super block commit, we
record the error in the device structure, in the field 'last_flush_error'.
This is used to later check if we should error out the super block commit,
depending on whether the number of flush errors is greater than or equals
to the maximum tolerated device failures for a raid profile.

However if we get a transient device flush error, unmount the filesystem
and later try to mount it, we can fail the mount because we treat that
past error as critical and consider the device is missing. Even if it's
very likely that the error will happen again, as it's probably due to a
hardware related problem, there may be cases where the error might not
happen again. One example is during testing, and a test case like the
new generic/648 from fstests always triggers this. The test cases
generic/019 and generic/475 also trigger this scenario, but very
sporadically.

When this happens we get an error like this:

  $ mount /dev/sdc /mnt
  mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.

  $ dmesg
  (...)
  [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount
  [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices
  [12918.890853] BTRFS error (device sdc): open_ctree failed

The failure happens because when btrfs_check_rw_degradable() is called at
mount time, or at remount from RO to RW time, is sees a non zero value in
a device's ->last_flush_error attribute, and therefore considers that the
device is 'missing'.

Fix this by setting a device's ->last_flush_error to zero when we close a
device, making sure the error is not seen on the next mount attempt. We
only need to track flush errors during the current mount, so that we never
commit a super block if such errors happened.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-09 14:40:56 +02:00
Qu Wenruo
50628b06e6 btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
[ Upstream commit bbc9a6eb5eec03dcafee266b19f56295e3b2aa8f ]

There is a BUG_ON() in btrfs_csum_one_bio() to catch code logic error.
It has indeed caught several bugs during subpage development.
But the BUG_ON() itself will bring down the whole system which is
an overkill.

Replace it with a WARN() and exit gracefully, so that it won't crash the
whole system while we can still catch the code logic error.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-09 14:40:56 +02:00
Greg Kroah-Hartman
dcf0824c26 Revert "treewide: Change list_sort to use const pointers"
This reverts commit 55e6f8b3c0 which is
commit 4f0f586bf0c898233d8f316f471a21db2abd522d upstream.

This commit is already in this branch, but in a different fashion, as
CFI is included here.  By having this version, there is a crc error that
is due to the use of typedefs.  Reverting this commit changes nothing
and fixes the CRC issue.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I97849a104acbc88599481f6c5c9d024570ec5c87
2021-10-04 11:07:40 +02:00
Greg Kroah-Hartman
d69751309b This is the 5.10.70 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFVcUcACgkQONu9yGCS
 aT4/Mw//b3IUn6Vy0r8Jc6MsU16U+UY0Rb6o8X6J5V7PXMI2RuHIf6+AXm4CDLPZ
 jpsgaPB3nSYUz63+b699kB6IZiUTbij8r0O/Yjy1p2/Z6HoDgSOX8WvU25kTO697
 MWxZT25Nj8sZzigPuXw1zy1ioZCdeGlRGXrDAoeZt8OL8TMd78eSLISYNQYv38L6
 Sg3TbtumEwjfZe3FeyzPA82Qc1jlsZ2ViKJ+E/BC74TJ9DBS5K+uMUzDwDyJEIaB
 MwswdjvQIbK5cN+uux6Ok3v4/6/bIKeouYkpLnQvnNtIrn8hk8FXO6OamU6XwTGl
 oI26Hu5mjL2WecHvpQJCcn6h8L0w/dMfQPg2b/m1gJ5l58NJobFS3Uy1bMaGlJic
 L1K2ZFPHQd+CR9Lvz/umiXqaBgL2K4QKKi28TrWxMgKatrMeip3Lo8krxNuxm0/Z
 VpJIsOajWkgf3n5HuQ/zfFGl+YUcjtBUqxO+WR3ocTLlN3kcG6ZjEMxHPK8VYmIr
 Yp4s+WyU7uRlGhSy6UpWI78AHcijx5WKS5n25ZI56VJRi38Qxgb3Q+EZ6vlpJuvh
 yTCgvjwi4FzLWXeYRR/RXpwzvwS8t5TKJT355ufjqZaAtQk/vE27deFdQs6B7Hqy
 17KvN8UjycbWKUXX/zM1CcU6ikXgj/h+q3+kAe99kldpEphjpMs=
 =vyz1
 -----END PGP SIGNATURE-----

Merge 5.10.70 into android12-5.10-lts

Changes in 5.10.70
	PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response
	ocfs2: drop acl cache for directories too
	mm: fix uninitialized use in overcommit_policy_handler
	usb: gadget: r8a66597: fix a loop in set_feature()
	usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
	usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
	usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
	cifs: fix incorrect check for null pointer in header_assemble
	xen/x86: fix PV trap handling on secondary processors
	usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
	USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
	USB: cdc-acm: fix minor-number release
	Revert "USB: bcma: Add a check for devm_gpiod_get"
	binder: make sure fd closes complete
	staging: greybus: uart: fix tty use after free
	Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
	usb: dwc3: core: balance phy init and exit
	usb: core: hcd: Add support for deferring roothub registration
	USB: serial: mos7840: remove duplicated 0xac24 device ID
	USB: serial: option: add Telit LN920 compositions
	USB: serial: option: remove duplicate USB device ID
	USB: serial: option: add device id for Foxconn T99W265
	mcb: fix error handling in mcb_alloc_bus()
	erofs: fix up erofs_lookup tracepoint
	btrfs: prevent __btrfs_dump_space_info() to underflow its free space
	xhci: Set HCD flag to defer primary roothub registration
	serial: 8250: 8250_omap: Fix RX_LVL register offset
	serial: mvebu-uart: fix driver's tx_empty callback
	scsi: sd_zbc: Ensure buffer size is aligned to SECTOR_SIZE
	drm/amd/pm: Update intermediate power state for SI
	net: hso: fix muxed tty registration
	comedi: Fix memory leak in compat_insnlist()
	afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation
	afs: Fix updating of i_blocks on file/dir extension
	platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
	enetc: Fix illegal access when reading affinity_hint
	enetc: Fix uninitialized struct dim_sample field usage
	bnxt_en: Fix TX timeout when TX ring size is set to the smallest
	net: hns3: fix change RSS 'hfunc' ineffective issue
	net: hns3: check queue id range before using
	net/smc: add missing error check in smc_clc_prfx_set()
	net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
	net: dsa: don't allocate the slave_mii_bus using devres
	net: dsa: realtek: register the MDIO bus under devres
	kselftest/arm64: signal: Add SVE to the set of features we can check for
	kselftest/arm64: signal: Skip tests if required features are missing
	s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
	gpio: uniphier: Fix void functions to remove return value
	qed: rdma - don't wait for resources under hw error recovery flow
	net/mlx4_en: Don't allow aRFS for encapsulated packets
	atlantic: Fix issue in the pm resume flow.
	scsi: iscsi: Adjust iface sysfs attr detection
	scsi: target: Fix the pgr/alua_support_store functions
	tty: synclink_gt, drop unneeded forward declarations
	tty: synclink_gt: rename a conflicting function name
	fpga: machxo2-spi: Return an error on failure
	fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
	nvme-tcp: fix incorrect h2cdata pdu offset accounting
	treewide: Change list_sort to use const pointers
	nvme: keep ctrl->namespaces ordered
	thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
	cifs: fix a sign extension bug
	scsi: qla2xxx: Restore initiator in dual mode
	scsi: lpfc: Use correct scnprintf() limit
	irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
	irqchip/gic-v3-its: Fix potential VPE leak on error
	md: fix a lock order reversal in md_alloc
	x86/asm: Add a missing __iomem annotation in enqcmds()
	x86/asm: Fix SETZ size enqcmds() build failure
	io_uring: put provided buffer meta data under memcg accounting
	blktrace: Fix uaf in blk_trace access after removing by sysfs
	net: phylink: Update SFP selected interface on advertising changes
	net: macb: fix use after free on rmmod
	net: stmmac: allow CSR clock of 300MHz
	blk-mq: avoid to iterate over stale request
	m68k: Double cast io functions to unsigned long
	ipv6: delay fib6_sernum increase in fib6_add
	cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
	bpf: Add oversize check before call kvcalloc()
	xen/balloon: use a kernel thread instead a workqueue
	nvme-multipath: fix ANA state updates when a namespace is not present
	nvme-rdma: destroy cm id before destroy qp to avoid use after free
	sparc32: page align size in arch_dma_alloc
	amd/display: downgrade validation failure log level
	block: check if a profile is actually registered in blk_integrity_unregister
	block: flush the integrity workqueue in blk_integrity_unregister
	blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd
	compiler.h: Introduce absolute_pointer macro
	net: i825xx: Use absolute_pointer for memcpy from fixed memory location
	sparc: avoid stringop-overread errors
	qnx4: avoid stringop-overread errors
	parisc: Use absolute_pointer() to define PAGE0
	arm64: Mark __stack_chk_guard as __ro_after_init
	alpha: Declare virt_to_phys and virt_to_bus parameter as pointer to volatile
	net: 6pack: Fix tx timeout and slot time
	spi: Fix tegra20 build with CONFIG_PM=n
	EDAC/synopsys: Fix wrong value type assignment for edac_mode
	EDAC/dmc520: Assign the proper type to dimm->edac_mode
	thermal/drivers/int340x: Do not set a wrong tcc offset on resume
	USB: serial: cp210x: fix dropped characters with CP2102
	xen/balloon: fix balloon kthread freezing
	qnx4: work around gcc false positive warning bug
	Linux 5.10.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0be3ab08ab5dd724a79c5c5ff8e49c18d2666193
2021-10-01 11:20:43 +02:00
Greg Kroah-Hartman
33740c9227 This is the 5.10.69 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFQYu4ACgkQONu9yGCS
 aT6vFQ//VgOxH4RGBJ8cVUgCR7t2XiShhs2xqPJRaYe2nnl318aNvuTJFOXKa5gg
 gD6jhdnInBpO1iD4An/WJLZ+EO8+CDEYezsMBV7SaORR2NlOxprxig/VZKmlBHSq
 b2h2itA4yfa0BtPbFF2SzA6V7PtKygquCFloUQtJbgwiinNlVx4oS+8jDPgd9R9S
 z0AffSTnxBLXh6rc+Hn7CsFap6Ob9CiX/ZMJPxK7c5cn+/aGweL1DYVxWiBmYpC9
 ynzw6blR2wjvhLkKEc3y/pZqxG1XmF9yG0kYkIK2Oajw7hfjVLIT2O6edugmmufU
 LbsOfL7BOJrmdd1M8B1pZcGt8qaCfD0RS5LiSxf/aNLOklpS5CpeiMdRqiPEqvqN
 PZV4MUjG1ra59DMBHTr3OmugRwZBBc156dahccDEecVE0yYqYeHmYfsYoo06EqTA
 kpOGGRAjA46xmQLqPbyxBtdaIryiHHM2E72zhKgK9JA9EtOSrOCYXc6ltsBzPB69
 9LIvrfevvrbsQGXKZ4Zw5DcYaA0VGVutO3ixEiPH4TJGco9rFW4NeHQK8wk1NE0D
 ilFUcf0MRaI1XkjjlgEua3oC8o6rKDd8GhooukWx/ItWMytw+5JUMpYHBa9aA2yU
 ev31m/YEscrveY4Nml2sDhzH3R+vRrLdTOYh7RwIb3UvqkYLb4Y=
 =d1XJ
 -----END PGP SIGNATURE-----

Merge 5.10.69 into android12-5.10-lts

Changes in 5.10.69
	PCI: pci-bridge-emul: Add PCIe Root Capabilities Register
	PCI: aardvark: Fix reporting CRS value
	console: consume APC, DM, DCS
	s390/pci_mmio: fully validate the VMA before calling follow_pte()
	ARM: Qualify enabling of swiotlb_init()
	ARM: 9077/1: PLT: Move struct plt_entries definition to header
	ARM: 9078/1: Add warn suppress parameter to arm_gen_branch_link()
	ARM: 9079/1: ftrace: Add MODULE_PLTS support
	ARM: 9098/1: ftrace: MODULE_PLT: Fix build problem without DYNAMIC_FTRACE
	Revert "net/mlx5: Register to devlink ingress VLAN filter trap"
	sctp: validate chunk size in __rcv_asconf_lookup
	sctp: add param size validation for SCTP_PARAM_SET_PRIMARY
	staging: rtl8192u: Fix bitwise vs logical operator in TranslateRxSignalStuff819xUsb()
	coredump: fix memleak in dump_vma_snapshot()
	um: virtio_uml: fix memory leak on init failures
	dmaengine: acpi: Avoid comparison GSI with Linux vIRQ
	perf test: Fix bpf test sample mismatch reporting
	tools lib: Adopt memchr_inv() from kernel
	perf tools: Allow build-id with trailing zeros
	thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
	9p/trans_virtio: Remove sysfs file on probe failure
	prctl: allow to setup brk for et_dyn executables
	nilfs2: use refcount_dec_and_lock() to fix potential UAF
	profiling: fix shift-out-of-bounds bugs
	PM: sleep: core: Avoid setting power.must_resume to false
	pwm: lpc32xx: Don't modify HW state in .probe() after the PWM chip was registered
	pwm: mxs: Don't modify HW state in .probe() after the PWM chip was registered
	dmaengine: idxd: fix wq slot allocation index check
	platform/chrome: sensorhub: Add trace events for sample
	platform/chrome: cros_ec_trace: Fix format warnings
	ceph: allow ceph_put_mds_session to take NULL or ERR_PTR
	ceph: cancel delayed work instead of flushing on mdsc teardown
	Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
	tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh
	thermal/core: Fix thermal_cooling_device_register() prototype
	drm/amdgpu: Disable PCIE_DPM on Intel RKL Platform
	drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
	dma-buf: DMABUF_MOVE_NOTIFY should depend on DMA_SHARED_BUFFER
	parisc: Move pci_dev_is_behind_card_dino to where it is used
	iommu/amd: Relocate GAMSup check to early_enable_iommus
	dmaengine: idxd: depends on !UML
	dmaengine: sprd: Add missing MODULE_DEVICE_TABLE
	dmaengine: ioat: depends on !UML
	dmaengine: xilinx_dma: Set DMA mask for coherent APIs
	ceph: request Fw caps before updating the mtime in ceph_write_iter
	ceph: remove the capsnaps when removing caps
	ceph: lockdep annotations for try_nonblocking_invalidate
	btrfs: update the bdev time directly when closing
	btrfs: fix lockdep warning while mounting sprout fs
	nilfs2: fix memory leak in nilfs_sysfs_create_device_group
	nilfs2: fix NULL pointer in nilfs_##name##_attr_release
	nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
	nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
	nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
	nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
	habanalabs: add validity check for event ID received from F/W
	pwm: img: Don't modify HW state in .remove() callback
	pwm: rockchip: Don't modify HW state in .remove() callback
	pwm: stm32-lp: Don't modify HW state in .remove() callback
	blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
	blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues
	rtc: rx8010: select REGMAP_I2C
	sched/idle: Make the idle timer expire in hard interrupt context
	drm/nouveau/nvkm: Replace -ENOSYS with -ENODEV
	Linux 5.10.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I982349e3a65b83e92e9b808154bf8c84d094f1d6
2021-09-30 18:36:17 +02:00
Greg Kroah-Hartman
beafee90ec This is the 5.10.68 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFLB58ACgkQONu9yGCS
 aT7uAhAAraX1qVdfkq3g4w9jaURkiR/Z1LbPqjMswIojApmcXV3e0mUtEWxBBEJT
 o/uId9KUr/OrfAN++DO+9iLmPIjZHW+49I+CeHcDS95PdeWSKxZ3HBPUqK8uX8tU
 QdPjh2PVL7Kkzbgi65RWeTOERHLlEj6qo21xu4W9QuwmZZojEB8xVP9BB/U6p84Q
 KYPX+zyGUo9NgsaVTwOXxZzyT8JgcfEUKg0F4nHeNJxEh106dN2XgZpq+GvB7Hq7
 koDy/dg2I4hS++Ds/Fjz9wQrgcvw3WSo3pUZzyTS2zfrcefLjqDVWzSY/1Ttd4b9
 B7Lw7WiEgbX75EFXX8RgCrmNSsNW8pnFyR2URoOfFD6ckJNj/XCPVV+tfiSfAnH5
 vlOQOicjtr/yFeOfhre8U4pTBWXk9BYscJyzNp/wScaExHXXkI+HYi92cbbTWKCU
 /ig1RmIqTATdFAXjukHUqt6QzI1iqPtTQCGd99AhaBGq0Hb8OK2HponzBOpQvAHb
 xaEMSL9YsJhoAux+n+R95FQKCk2KrjgX8Bczyuj2OAL5jeST10fWrYe6DflSta5K
 9fNWmyjegpQEcmtDidQ7HH81Fy793S/34R8FQ4y1zPEi1A0yH//FO2lA8dS4Rdvo
 ho7l7W+Hd/Ut67P0b7OFz2znw0T4OqMF6Il30q88pOfcis2TfNs=
 =2XgB
 -----END PGP SIGNATURE-----

Merge 5.10.68 into android12-5.10-lts

Changes in 5.10.68
	drm/bridge: lt9611: Fix handling of 4k panels
	btrfs: fix upper limit for max_inline for page size 64K
	io_uring: ensure symmetry in handling iter types in loop_rw_iter()
	xen: reset legacy rtc flag for PV domU
	bnx2x: Fix enabling network interfaces without VFs
	arm64/sve: Use correct size when reinitialising SVE state
	PM: base: power: don't try to use non-existing RTC for storing data
	PCI: Add AMD GPU multi-function power dependencies
	drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10
	drm/etnaviv: return context from etnaviv_iommu_context_get
	drm/etnaviv: put submit prev MMU context when it exists
	drm/etnaviv: stop abusing mmu_context as FE running marker
	drm/etnaviv: keep MMU context across runtime suspend/resume
	drm/etnaviv: exec and MMU state is lost when resetting the GPU
	drm/etnaviv: fix MMU context leak on GPU reset
	drm/etnaviv: reference MMU context when setting up hardware state
	drm/etnaviv: add missing MMU context put when reaping MMU mapping
	s390/sclp: fix Secure-IPL facility detection
	x86/pat: Pass valid address to sanitize_phys()
	x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
	tipc: fix an use-after-free issue in tipc_recvmsg
	ethtool: Fix rxnfc copy to user buffer overflow
	net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
	net-caif: avoid user-triggerable WARN_ON(1)
	ptp: dp83640: don't define PAGE0
	dccp: don't duplicate ccid when cloning dccp sock
	net/l2tp: Fix reference count leak in l2tp_udp_recv_core
	r6040: Restore MDIO clock frequency after MAC reset
	tipc: increase timeout in tipc_sk_enqueue()
	drm/rockchip: cdn-dp-core: Make cdn_dp_core_resume __maybe_unused
	perf machine: Initialize srcline string member in add_location struct
	net/mlx5: FWTrace, cancel work on alloc pd error flow
	net/mlx5: Fix potential sleeping in atomic context
	nvme-tcp: fix io_work priority inversion
	events: Reuse value read using READ_ONCE instead of re-reading it
	net: ipa: initialize all filter table slots
	gen_compile_commands: fix missing 'sys' package
	vhost_net: fix OoB on sendmsg() failure.
	net/af_unix: fix a data-race in unix_dgram_poll
	net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
	x86/uaccess: Fix 32-bit __get_user_asm_u64() when CC_HAS_ASM_GOTO_OUTPUT=y
	tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
	selftest: net: fix typo in altname test
	qed: Handle management FW error
	udp_tunnel: Fix udp_tunnel_nic work-queue type
	dt-bindings: arm: Fix Toradex compatible typo
	ibmvnic: check failover_pending in login response
	KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
	bnxt_en: make bnxt_free_skbs() safe to call after bnxt_free_mem()
	net: hns3: pad the short tunnel frame before sending to hardware
	net: hns3: change affinity_mask to numa node range
	net: hns3: disable mac in flr process
	net: hns3: fix the timing issue of VF clearing interrupt sources
	mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
	dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation
	mfd: db8500-prcmu: Adjust map to reality
	PCI: Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms
	fuse: fix use after free in fuse_read_interrupt()
	PCI: tegra194: Fix handling BME_CHGED event
	PCI: tegra194: Fix MSI-X programming
	PCI: tegra: Fix OF node reference leak
	mfd: Don't use irq_create_mapping() to resolve a mapping
	PCI: rcar: Fix runtime PM imbalance in rcar_pcie_ep_probe()
	tracing/probes: Reject events which have the same name of existing one
	PCI: cadence: Use bitfield for *quirk_retrain_flag* instead of bool
	PCI: cadence: Add quirk flag to set minimum delay in LTSSM Detect.Quiet state
	PCI: j721e: Add PCIe support for J7200
	PCI: j721e: Add PCIe support for AM64
	PCI: Add ACS quirks for Cavium multi-function devices
	watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate
	octeontx2-af: Add additional register check to rvu_poll_reg()
	Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
	net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920
	block, bfq: honor already-setup queue merges
	PCI: ibmphp: Fix double unmap of io_mem
	ethtool: Fix an error code in cxgb2.c
	NTB: Fix an error code in ntb_msit_probe()
	NTB: perf: Fix an error code in perf_setup_inbuf()
	s390/bpf: Fix optimizing out zero-extensions
	s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
	s390/bpf: Fix branch shortening during codegen pass
	mfd: axp20x: Update AXP288 volatile ranges
	backlight: ktd253: Stabilize backlight
	PCI: of: Don't fail devm_pci_alloc_host_bridge() on missing 'ranges'
	PCI: iproc: Fix BCMA probe resource handling
	netfilter: Fix fall-through warnings for Clang
	netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
	KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
	PCI: Fix pci_dev_str_match_path() alloc while atomic bug
	mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
	tracing/boot: Fix a hist trigger dependency for boot time tracing
	mtd: mtdconcat: Judge callback existence based on the master
	mtd: mtdconcat: Check _read, _write callbacks existence before assignment
	KVM: arm64: Fix read-side race on updates to vcpu reset state
	KVM: arm64: Handle PSCI resets before userspace touches vCPU state
	PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n
	mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
	ARC: export clear_user_page() for modules
	perf unwind: Do not overwrite FEATURE_CHECK_LDFLAGS-libunwind-{x86,aarch64}
	perf bench inject-buildid: Handle writen() errors
	gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
	gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
	net: dsa: tag_rtl4_a: Fix egress tags
	selftests: mptcp: clean tmp files in simult_flows
	net: hso: add failure handler for add_net_device
	net: dsa: b53: Fix calculating number of switch ports
	net: dsa: b53: Set correct number of ports in the DSA struct
	netfilter: socket: icmp6: fix use-after-scope
	fq_codel: reject silly quantum parameters
	qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
	ip_gre: validate csum_start only on pull
	net: dsa: b53: Fix IMP port setup on BCM5301x
	bnxt_en: fix stored FW_PSID version masks
	bnxt_en: Fix asic.rev in devlink dev info command
	bnxt_en: log firmware debug notifications
	bnxt_en: Consolidate firmware reset event logging.
	bnxt_en: Convert to use netif_level() helpers.
	bnxt_en: Improve logging of error recovery settings information.
	bnxt_en: Fix possible unintended driver initiated error recovery
	mfd: lpc_sch: Partially revert "Add support for Intel Quark X1000"
	mfd: lpc_sch: Rename GPIOBASE to prevent build error
	net: renesas: sh_eth: Fix freeing wrong tx descriptor
	x86/mce: Avoid infinite loop for copy from user recovery
	bnxt_en: Fix error recovery regression
	net: dsa: bcm_sf2: Fix array overrun in bcm_sf2_num_active_ports()
	Linux 5.10.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I542f48f8de516dcabce91d3d399583483aba0da7
2021-09-30 18:35:35 +02:00
Greg Kroah-Hartman
08ed4cb090 Merge 5.10.67 into android12-5.10-lts
Changes in 5.10.67
	rtc: tps65910: Correct driver module alias
	io_uring: limit fixed table size by RLIMIT_NOFILE
	io_uring: place fixed tables under memcg limits
	io_uring: add ->splice_fd_in checks
	io_uring: fail links of cancelled timeouts
	io-wq: fix wakeup race when adding new work
	btrfs: wake up async_delalloc_pages waiters after submit
	btrfs: reset replace target device to allocation state on close
	blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
	blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
	PCI/MSI: Skip masking MSI-X on Xen PV
	powerpc/perf/hv-gpci: Fix counter value parsing
	xen: fix setting of max_pfn in shared_info
	9p/xen: Fix end of loop tests for list_for_each_entry
	ceph: fix dereference of null pointer cf
	selftests/ftrace: Fix requirement check of README file
	tools/thermal/tmon: Add cross compiling support
	clk: socfpga: agilex: fix the parents of the psi_ref_clk
	clk: socfpga: agilex: fix up s2f_user0_clk representation
	clk: socfpga: agilex: add the bypass register for s2f_usr0 clock
	pinctrl: stmfx: Fix hazardous u8[] to unsigned long cast
	pinctrl: ingenic: Fix incorrect pull up/down info
	soc: qcom: aoss: Fix the out of bound usage of cooling_devs
	soc: aspeed: lpc-ctrl: Fix boundary check for mmap
	soc: aspeed: p2a-ctrl: Fix boundary check for mmap
	arm64: mm: Fix TLBI vs ASID rollover
	arm64: head: avoid over-mapping in map_memory
	iio: ltc2983: fix device probe
	wcn36xx: Ensure finish scan is not requested before start scan
	crypto: public_key: fix overflow during implicit conversion
	block: bfq: fix bfq_set_next_ioprio_data()
	power: supply: max17042: handle fails of reading status register
	dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
	crypto: ccp - shutdown SEV firmware on kexec
	VMCI: fix NULL pointer dereference when unmapping queue pair
	media: uvc: don't do DMA on stack
	media: rc-loopback: return number of emitters rather than error
	s390/qdio: fix roll-back after timeout on ESTABLISH ccw
	s390/qdio: cancel the ESTABLISH ccw after timeout
	Revert "dmaengine: imx-sdma: refine to load context only once"
	dmaengine: imx-sdma: remove duplicated sdma_load_context
	libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
	ARM: 9105/1: atags_to_fdt: don't warn about stack size
	f2fs: fix to do sanity check for sb/cp fields correctly
	PCI/portdrv: Enable Bandwidth Notification only if port supports it
	PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported
	PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure
	PCI: xilinx-nwl: Enable the clock through CCF
	PCI: aardvark: Configure PCIe resources from 'ranges' DT property
	PCI: Export pci_pio_to_address() for module use
	PCI: aardvark: Fix checking for PIO status
	PCI: aardvark: Fix masking and unmasking legacy INTx interrupts
	HID: input: do not report stylus battery state as "full"
	f2fs: quota: fix potential deadlock
	pinctrl: remove empty lines in pinctrl subsystem
	pinctrl: armada-37xx: Correct PWM pins definitions
	scsi: bsg: Remove support for SCSI_IOCTL_SEND_COMMAND
	clk: rockchip: drop GRF dependency for rk3328/rk3036 pll types
	IB/hfi1: Adjust pkey entry in index 0
	RDMA/iwcm: Release resources if iw_cm module initialization fails
	docs: Fix infiniband uverbs minor number
	scsi: BusLogic: Use %X for u32 sized integer rather than %lX
	pinctrl: samsung: Fix pinctrl bank pin count
	vfio: Use config not menuconfig for VFIO_NOIOMMU
	scsi: ufs: Fix memory corruption by ufshcd_read_desc_param()
	cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards
	powerpc/stacktrace: Include linux/delay.h
	RDMA/efa: Remove double QP type assignment
	RDMA/mlx5: Delete not-available udata check
	cpuidle: pseries: Mark pseries_idle_proble() as __init
	f2fs: reduce the scope of setting fsck tag when de->name_len is zero
	openrisc: don't printk() unconditionally
	dma-debug: fix debugfs initialization order
	NFSv4/pNFS: Fix a layoutget livelock loop
	NFSv4/pNFS: Always allow update of a zero valued layout barrier
	NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid
	SUNRPC: Fix potential memory corruption
	SUNRPC/xprtrdma: Fix reconnection locking
	SUNRPC query transport's source port
	sunrpc: Fix return value of get_srcport()
	scsi: fdomain: Fix error return code in fdomain_probe()
	pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry()
	powerpc/numa: Consider the max NUMA node for migratable LPAR
	scsi: smartpqi: Fix an error code in pqi_get_raid_map()
	scsi: qedi: Fix error codes in qedi_alloc_global_queues()
	scsi: qedf: Fix error codes in qedf_alloc_global_queues()
	powerpc/config: Renable MTD_PHYSMAP_OF
	iommu/vt-d: Update the virtual command related registers
	HID: i2c-hid: Fix Elan touchpad regression
	clk: imx8m: fix clock tree update of TF-A managed clocks
	KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines
	scsi: ufs: ufs-exynos: Fix static checker warning
	KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live
	platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
	powerpc/smp: Update cpu_core_map on all PowerPc systems
	RDMA/hns: Fix QP's resp incomplete assignment
	fscache: Fix cookie key hashing
	clk: at91: clk-generated: Limit the requested rate to our range
	KVM: PPC: Fix clearing never mapped TCEs in realmode
	soc: mediatek: cmdq: add address shift in jump
	f2fs: fix to account missing .skipped_gc_rwsem
	f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
	f2fs: fix to unmap pages from userspace process in punch_hole()
	f2fs: deallocate compressed pages when error happens
	f2fs: should put a page beyond EOF when preparing a write
	MIPS: Malta: fix alignment of the devicetree buffer
	kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
	userfaultfd: prevent concurrent API initialization
	drm/vc4: hdmi: Set HD_CTL_WHOLSMP and HD_CTL_CHALIGN_SET
	drm/amdgpu: Fix amdgpu_ras_eeprom_init()
	ASoC: atmel: ATMEL drivers don't need HAS_DMA
	media: dib8000: rewrite the init prbs logic
	libbpf: Fix reuse of pinned map on older kernel
	x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
	crypto: mxs-dcp - Use sg_mapping_iter to copy data
	PCI: Use pci_update_current_state() in pci_enable_device_flags()
	tipc: keep the skb in rcv queue until the whole data is read
	net: phy: Fix data type in DP83822 dp8382x_disable_wol()
	iio: dac: ad5624r: Fix incorrect handling of an optional regulator.
	iavf: do not override the adapter state in the watchdog task
	iavf: fix locking of critical sections
	ARM: dts: qcom: apq8064: correct clock names
	video: fbdev: kyro: fix a DoS bug by restricting user input
	netlink: Deal with ESRCH error in nlmsg_notify()
	Smack: Fix wrong semantics in smk_access_entry()
	drm: avoid blocking in drm_clients_info's rcu section
	drm: serialize drm_file.master with a new spinlock
	drm: protect drm_master pointers in drm_lease.c
	rcu: Fix macro name CONFIG_TASKS_RCU_TRACE
	igc: Check if num of q_vectors is smaller than max before array access
	usb: host: fotg210: fix the endpoint's transactional opportunities calculation
	usb: host: fotg210: fix the actual_length of an iso packet
	usb: gadget: u_ether: fix a potential null pointer dereference
	USB: EHCI: ehci-mv: improve error handling in mv_ehci_enable()
	usb: gadget: composite: Allow bMaxPower=0 if self-powered
	staging: board: Fix uninitialized spinlock when attaching genpd
	tty: serial: jsm: hold port lock when reporting modem line changes
	bus: fsl-mc: fix mmio base address for child DPRCs
	selftests: firmware: Fix ignored return val of asprintf() warn
	drm/amd/display: Fix timer_per_pixel unit error
	media: hantro: vp8: Move noisy WARN_ON to vpu_debug
	media: platform: stm32: unprepare clocks at handling errors in probe
	media: atomisp: Fix runtime PM imbalance in atomisp_pci_probe
	media: atomisp: pci: fix error return code in atomisp_pci_probe()
	nfp: fix return statement in nfp_net_parse_meta()
	ethtool: improve compat ioctl handling
	drm/amdgpu: Fix a printing message
	drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex
	bpf/tests: Fix copy-and-paste error in double word test
	bpf/tests: Do not PASS tests without actually testing the result
	drm/bridge: nwl-dsi: Avoid potential multiplication overflow on 32-bit
	arm64: dts: allwinner: h6: tanix-tx6: Fix regulator node names
	video: fbdev: asiliantfb: Error out if 'pixclock' equals zero
	video: fbdev: kyro: Error out if 'pixclock' equals zero
	video: fbdev: riva: Error out if 'pixclock' equals zero
	ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()
	flow_dissector: Fix out-of-bounds warnings
	s390/jump_label: print real address in a case of a jump label bug
	s390: make PCI mio support a machine flag
	serial: 8250: Define RX trigger levels for OxSemi 950 devices
	xtensa: ISS: don't panic in rs_init
	hvsi: don't panic on tty_register_driver failure
	serial: 8250_pci: make setup_port() parameters explicitly unsigned
	staging: ks7010: Fix the initialization of the 'sleep_status' structure
	samples: bpf: Fix tracex7 error raised on the missing argument
	libbpf: Fix race when pinning maps in parallel
	ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init()
	Bluetooth: skip invalid hci_sync_conn_complete_evt
	workqueue: Fix possible memory leaks in wq_numa_init()
	ARM: dts: stm32: Set {bitclock,frame}-master phandles on DHCOM SoM
	ARM: dts: stm32: Set {bitclock,frame}-master phandles on ST DKx
	ARM: dts: stm32: Update AV96 adv7513 node per dtbs_check
	bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
	ARM: dts: at91: use the right property for shutdown controller
	arm64: tegra: Fix Tegra194 PCIe EP compatible string
	ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output
	ASoC: Intel: update sof_pcm512x quirks
	media: imx258: Rectify mismatch of VTS value
	media: imx258: Limit the max analogue gain to 480
	media: v4l2-dv-timings.c: fix wrong condition in two for-loops
	media: TDA1997x: fix tda1997x_query_dv_timings() return value
	media: tegra-cec: Handle errors of clk_prepare_enable()
	gfs2: Fix glock recursion in freeze_go_xmote_bh
	arm64: dts: qcom: sdm630: Rewrite memory map
	arm64: dts: qcom: sdm630: Fix TLMM node and pinctrl configuration
	serial: 8250_omap: Handle optional overrun-throttle-ms property
	ARM: dts: imx53-ppd: Fix ACHC entry
	arm64: dts: qcom: ipq8074: fix pci node reg property
	arm64: dts: qcom: sdm660: use reg value for memory node
	arm64: dts: qcom: ipq6018: drop '0x' from unit address
	arm64: dts: qcom: sdm630: don't use underscore in node name
	arm64: dts: qcom: msm8994: don't use underscore in node name
	arm64: dts: qcom: msm8996: don't use underscore in node name
	arm64: dts: qcom: sm8250: Fix epss_l3 unit address
	nvmem: qfprom: Fix up qfprom_disable_fuse_blowing() ordering
	net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()
	drm/msm: mdp4: drop vblank get/put from prepare/complete_commit
	drm/msm/dsi: Fix DSI and DSI PHY regulator config from SDM660
	drm: xlnx: zynqmp_dpsub: Call pm_runtime_get_sync before setting pixel clock
	drm: xlnx: zynqmp: release reset to DP controller before accessing DP registers
	thunderbolt: Fix port linking by checking all adapters
	drm/amd/display: fix missing writeback disablement if plane is removed
	drm/amd/display: fix incorrect CM/TF programming sequence in dwb
	selftests/bpf: Fix xdp_tx.c prog section name
	drm/vmwgfx: fix potential UAF in vmwgfx_surface.c
	Bluetooth: schedule SCO timeouts with delayed_work
	Bluetooth: avoid circular locks in sco_sock_connect
	drm/msm/dp: return correct edid checksum after corrupted edid checksum read
	net/mlx5: Fix variable type to match 64bit
	gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
	drm/display: fix possible null-pointer dereference in dcn10_set_clock()
	mac80211: Fix monitor MTU limit so that A-MSDUs get through
	ARM: tegra: acer-a500: Remove bogus USB VBUS regulators
	ARM: tegra: tamonten: Fix UART pad setting
	arm64: tegra: Fix compatible string for Tegra132 CPUs
	arm64: dts: ls1046a: fix eeprom entries
	nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
	nvme: code command_id with a genctr for use-after-free validation
	Bluetooth: Fix handling of LE Enhanced Connection Complete
	opp: Don't print an error if required-opps is missing
	serial: sh-sci: fix break handling for sysrq
	iomap: pass writeback errors to the mapping
	tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
	rpc: fix gss_svc_init cleanup on failure
	selftests/bpf: Fix flaky send_signal test
	hwmon: (pmbus/ibm-cffps) Fix write bits for LED control
	staging: rts5208: Fix get_ms_information() heap buffer size
	net: Fix offloading indirect devices dependency on qdisc order creation
	kselftest/arm64: mte: Fix misleading output when skipping tests
	kselftest/arm64: pac: Fix skipping of tests on systems without PAC
	gfs2: Don't call dlm after protocol is unmounted
	usb: chipidea: host: fix port index underflow and UBSAN complains
	lockd: lockd server-side shouldn't set fl_ops
	drm/exynos: Always initialize mapping in exynos_drm_register_dma()
	rtl8xxxu: Fix the handling of TX A-MPDU aggregation
	rtw88: use read_poll_timeout instead of fixed sleep
	rtw88: wow: build wow function only if CONFIG_PM is on
	rtw88: wow: fix size access error of probe request
	octeontx2-pf: Fix NIX1_RX interface backpressure
	m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch
	btrfs: tree-log: check btrfs_lookup_data_extent return value
	soundwire: intel: fix potential race condition during power down
	ASoC: Intel: Skylake: Fix module configuration for KPB and MIXER
	ASoC: Intel: Skylake: Fix passing loadable flag for module
	of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS
	mmc: sdhci-of-arasan: Modified SD default speed to 19MHz for ZynqMP
	mmc: sdhci-of-arasan: Check return value of non-void funtions
	mmc: rtsx_pci: Fix long reads when clock is prescaled
	selftests/bpf: Enlarge select() timeout for test_maps
	mmc: core: Return correct emmc response in case of ioctl error
	cifs: fix wrong release in sess_alloc_buffer() failed path
	Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
	usb: musb: musb_dsps: request_irq() after initializing musb
	usbip: give back URBs for unsent unlink requests during cleanup
	usbip:vhci_hcd USB port can get stuck in the disabled state
	ASoC: rockchip: i2s: Fix regmap_ops hang
	ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B
	drm/amdkfd: Account for SH/SE count when setting up cu masks.
	nfsd: fix crash on LOCKT on reexported NFSv3
	iwlwifi: pcie: free RBs during configure
	iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed
	iwlwifi: mvm: avoid static queue number aliasing
	iwlwifi: mvm: fix access to BSS elements
	iwlwifi: fw: correctly limit to monitor dump
	iwlwifi: mvm: Fix scan channel flags settings
	net/mlx5: DR, fix a potential use-after-free bug
	net/mlx5: DR, Enable QP retransmission
	parport: remove non-zero check on count
	selftests/bpf: Fix potential unreleased lock
	wcn36xx: Fix missing frame timestamp for beacon/probe-resp
	ath9k: fix OOB read ar9300_eeprom_restore_internal
	ath9k: fix sleeping in atomic context
	net: fix NULL pointer reference in cipso_v4_doi_free
	fix array-index-out-of-bounds in taprio_change
	net: w5100: check return value after calling platform_get_resource()
	net: hns3: clean up a type mismatch warning
	fs/io_uring Don't use the return value from import_iovec().
	io_uring: remove duplicated io_size from rw
	parisc: fix crash with signals and alloca
	ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
	scsi: BusLogic: Fix missing pr_cont() use
	scsi: qla2xxx: Changes to support kdump kernel
	scsi: qla2xxx: Sync queue idx with queue_pair_map idx
	cpufreq: powernv: Fix init_chip_info initialization in numa=off
	s390/pv: fix the forcing of the swiotlb
	hugetlb: fix hugetlb cgroup refcounting during vma split
	mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
	mm/hugetlb: initialize hugetlb_usage in mm_init
	mm,vmscan: fix divide by zero in get_scan_count
	memcg: enable accounting for pids in nested pid namespaces
	libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
	platform/chrome: cros_ec_proto: Send command again when timeout occurs
	lib/test_stackinit: Fix static initializer test
	net: dsa: lantiq_gswip: fix maximum frame length
	drm/mgag200: Select clock in PLL update functions
	drm/msi/mdp4: populate priv->kms in mdp4_kms_init
	drm/dp_mst: Fix return code on sideband message failure
	drm/panfrost: Make sure MMU context lifetime is not bound to panfrost_priv
	drm/amdgpu: Fix BUG_ON assert
	drm/amd/display: Update number of DCN3 clock states
	drm/amd/display: Update bounding box states (v2)
	drm/panfrost: Simplify lock_region calculation
	drm/panfrost: Use u64 for size in lock_region
	drm/panfrost: Clamp lock region to Bifrost minimum
	fanotify: limit number of event merge attempts
	Linux 5.10.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic8df59518265d0cdf724e93e8922cde48fc85ce9
2021-09-30 12:21:03 +02:00
Sami Tolvanen
55e6f8b3c0 treewide: Change list_sort to use const pointers
[ Upstream commit 4f0f586bf0c898233d8f316f471a21db2abd522d ]

list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30 10:11:04 +02:00
Qu Wenruo
80af86c122 btrfs: prevent __btrfs_dump_space_info() to underflow its free space
commit 0619b7901473c380abc05d45cf9c70bee0707db3 upstream.

It's not uncommon where __btrfs_dump_space_info() gets called
under over-commit situations.

In that case free space would underflow as total allocated space is not
enough to handle all the over-committed space.

Such underflow values can sometimes cause confusion for users enabled
enospc_debug mount option, and takes some seconds for developers to
convert the underflow value to signed result.

Just output the free space as s64 to avoid such problem.

Reported-by: Eli V <eliventer@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAJtFHUSy4zgyhf-4d9T+KdJp9w=UgzC2A0V=VtmaeEpcGgm1-Q@mail.gmail.com/
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-30 10:11:00 +02:00
Anand Jain
aa1af89a66 btrfs: fix lockdep warning while mounting sprout fs
[ Upstream commit c124706900c20dee70f921bb3a90492431561a0a ]

Following test case reproduces lockdep warning.

  Test case:

  $ mkfs.btrfs -f <dev1>
  $ btrfstune -S 1 <dev1>
  $ mount <dev1> <mnt>
  $ btrfs device add <dev2> <mnt> -f
  $ umount <mnt>
  $ mount <dev2> <mnt>
  $ umount <mnt>

The warning claims a possible ABBA deadlock between the threads
initiated by [#1] btrfs device add and [#0] the mount.

  [ 540.743122] WARNING: possible circular locking dependency detected
  [ 540.743129] 5.11.0-rc7+ #5 Not tainted
  [ 540.743135] ------------------------------------------------------
  [ 540.743142] mount/2515 is trying to acquire lock:
  [ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs]
  [ 540.743458] but task is already holding lock:
  [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
  [ 540.743541] which lock already depends on the new lock.
  [ 540.743543] the existing dependency chain (in reverse order) is:

  [ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
  [ 540.743566] down_read_nested+0x48/0x2b0
  [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
  [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
  [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
  [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
  [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex
  [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs]
  [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
  [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
  [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
  [ 540.744166] __x64_sys_ioctl+0xe4/0x140
  [ 540.744170] do_syscall_64+0x4b/0x80
  [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
  [ 540.744184] __lock_acquire+0x155f/0x2360
  [ 540.744188] lock_acquire+0x10b/0x5c0
  [ 540.744190] __mutex_lock+0xb1/0xf80
  [ 540.744193] mutex_lock_nested+0x27/0x30
  [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
  [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
  [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
  [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
  [ 540.744472] legacy_get_tree+0x38/0x90
  [ 540.744475] vfs_get_tree+0x30/0x140
  [ 540.744478] fc_mount+0x16/0x60
  [ 540.744482] vfs_kern_mount+0x91/0x100
  [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
  [ 540.744536] legacy_get_tree+0x38/0x90
  [ 540.744537] vfs_get_tree+0x30/0x140
  [ 540.744539] path_mount+0x8d8/0x1070
  [ 540.744541] do_mount+0x8d/0xc0
  [ 540.744543] __x64_sys_mount+0x125/0x160
  [ 540.744545] do_syscall_64+0x4b/0x80
  [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [ 540.744551] other info that might help us debug this:
  [ 540.744552] Possible unsafe locking scenario:

  [ 540.744553] CPU0 				CPU1
  [ 540.744554] ---- 				----
  [ 540.744555] lock(btrfs-chunk-00);
  [ 540.744557] 					lock(&fs_devs->device_list_mutex);
  [ 540.744560] 					lock(btrfs-chunk-00);
  [ 540.744562] lock(&fs_devs->device_list_mutex);
  [ 540.744564]
   *** DEADLOCK ***

  [ 540.744565] 3 locks held by mount/2515:
  [ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450
  [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
  [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
  [ 540.744708]
   stack backtrace:
  [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5

But the device_list_mutex in clone_fs_devices() is redundant, as
explained below.  Two threads [1]  and [2] (below) could lead to
clone_fs_device().

  [1]
  open_ctree <== mount sprout fs
   btrfs_read_chunk_tree()
    mutex_lock(&uuid_mutex) <== global lock
    read_one_dev()
     open_seed_devices()
      clone_fs_devices() <== seed fs_devices
       mutex_lock(&orig->device_list_mutex) <== seed fs_devices

  [2]
  btrfs_init_new_device() <== sprouting
   mutex_lock(&uuid_mutex); <== global lock
   btrfs_prepare_sprout()
     lockdep_assert_held(&uuid_mutex)
     clone_fs_devices(seed_fs_device) <== seed fs_devices

Both of these threads hold uuid_mutex which is sufficient to protect
getting the seed device(s) freed while we are trying to clone it for
sprouting [2] or mounting a sprout [1] (as above). A mounted seed device
can not free/write/replace because it is read-only. An unmounted seed
device can be freed by btrfs_free_stale_devices(), but it needs
uuid_mutex.  So this patch removes the unnecessary device_list_mutex in
clone_fs_devices().  And adds a lockdep_assert_held(&uuid_mutex) in
clone_fs_devices().

Reported-by: Su Yue <l@damenly.su>
Tested-by: Su Yue <l@damenly.su>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-26 14:09:00 +02:00
Josef Bacik
c43803c1aa btrfs: update the bdev time directly when closing
[ Upstream commit 8f96a5bfa1503e0a5f3c78d51e993a1794d4aff1 ]

We update the ctime/mtime of a block device when we remove it so that
blkid knows the device changed.  However we do this by re-opening the
block device and calling filp_update_time.  This is more correct because
it'll call the inode->i_op->update_time if it exists, but the block dev
inodes do not do this.  Instead call generic_update_time() on the
bd_inode in order to avoid the blkdev_open path and get rid of the
following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #406 Not tainted
------------------------------------------------------
losetup/11596 is trying to acquire lock:
ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       file_open_name+0xc7/0x170
       filp_open+0x2c/0x50
       btrfs_scratch_superblocks.part.0+0x10f/0x170
       btrfs_rm_device.cold+0xe8/0xed
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11596:
 #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-26 14:09:00 +02:00
Anand Jain
88f3d951e2 btrfs: fix upper limit for max_inline for page size 64K
commit 6f93e834fa7c5faa0372e46828b4b2a966ac61d7 upstream.

The mount option max_inline ranges from 0 to the sectorsize (which is
now equal to page size). But we parse the mount options too early and
before the actual sectorsize is read from the superblock. So the upper
limit of max_inline is unaware of the actual sectorsize and is limited
by the temporary sectorsize 4096, even on a system where the default
sectorsize is 64K.

Fix this by reading the superblock sectorsize before the mount option
parse.

Reported-by: Alexander Tsvetkov <alexander.tsvetkov@oracle.com>
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:27:54 +02:00
Marcos Paulo de Souza
b225eeaf3a btrfs: tree-log: check btrfs_lookup_data_extent return value
[ Upstream commit 3736127a3aa805602b7a2ad60ec9cfce68065fbb ]

Function btrfs_lookup_data_extent calls btrfs_search_slot to verify if
the EXTENT_ITEM exists in the extent tree. btrfs_search_slot can return
values bellow zero if an error happened.

Function replay_one_extent currently checks if the search found
something (0 returned) and increments the reference, and if not, it
seems to evaluate as 'not found'.

Fix the condition by checking if the value was bellow zero and return
early.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18 13:40:31 +02:00
Desmond Cheong Zhi Xi
c1b249e02a btrfs: reset replace target device to allocation state on close
commit 0d977e0eba234e01a60bdde27314dc21374201b3 upstream.

This crash was observed with a failed assertion on device close:

  BTRFS: Transaction aborted (error -28)
  WARNING: CPU: 1 PID: 3902 at fs/btrfs/extent-tree.c:2150 btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
  Modules linked in: btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress raid6_pq loop
  CPU: 1 PID: 3902 Comm: kworker/u8:4 Not tainted 5.14.0-rc5-default+ #1532
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
  RIP: 0010:btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
  RSP: 0018:ffffb7a5452d7d80 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
  RBP: ffff97834176a378 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff97835195d388
  R13: 0000000005b08000 R14: ffff978385484000 R15: 000000000000016c
  FS:  0000000000000000(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000056190d003fe8 CR3: 000000002a81e005 CR4: 0000000000170ea0
  Call Trace:
   flush_space+0x197/0x2f0 [btrfs]
   btrfs_async_reclaim_metadata_space+0x139/0x300 [btrfs]
   process_one_work+0x262/0x5e0
   worker_thread+0x4c/0x320
   ? process_one_work+0x5e0/0x5e0
   kthread+0x144/0x170
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x1f/0x30
  irq event stamp: 19334989
  hardirqs last  enabled at (19334997): [<ffffffffab0e0c87>] console_unlock+0x2b7/0x400
  hardirqs last disabled at (19335006): [<ffffffffab0e0d0d>] console_unlock+0x33d/0x400
  softirqs last  enabled at (19334900): [<ffffffffaba0030d>] __do_softirq+0x30d/0x574
  softirqs last disabled at (19334893): [<ffffffffab0721ec>] irq_exit_rcu+0x12c/0x140
  ---[ end trace 45939e308e0dd3c7 ]---
  BTRFS: error (device vdd) in btrfs_run_delayed_refs:2150: errno=-28 No space left
  BTRFS info (device vdd): forced readonly
  BTRFS warning (device vdd): failed setting block group ro: -30
  BTRFS info (device vdd): suspending dev_replace for unmount
  assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3431!
  invalid opcode: 0000 [#1] PREEMPT SMP
  CPU: 1 PID: 3982 Comm: umount Tainted: G        W         5.14.0-rc5-default+ #1532
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
  RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
  RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
  RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
  R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
  FS:  00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
  Call Trace:
   btrfs_close_one_device.cold+0x11/0x55 [btrfs]
   close_fs_devices+0x44/0xb0 [btrfs]
   btrfs_close_devices+0x48/0x160 [btrfs]
   generic_shutdown_super+0x69/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x2c/0xa0
   cleanup_mnt+0x144/0x1b0
   task_work_run+0x59/0xa0
   exit_to_user_mode_loop+0xe7/0xf0
   exit_to_user_mode_prepare+0xaf/0xf0
   syscall_exit_to_user_mode+0x19/0x50
   do_syscall_64+0x4a/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

This happens when close_ctree is called while a dev_replace hasn't
completed. In close_ctree, we suspend the dev_replace, but keep the
replace target around so that we can resume the dev_replace procedure
when we mount the root again. This is the call trace:

  close_ctree():
    btrfs_dev_replace_suspend_for_unmount();
    btrfs_close_devices():
      btrfs_close_fs_devices():
        btrfs_close_one_device():
          ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
                 &device->dev_state));

However, since the replace target sticks around, there is a device
with BTRFS_DEV_STATE_REPLACE_TGT set on close, and we fail the
assertion in btrfs_close_one_device.

To fix this, if we come across the replace target device when
closing, we should properly reset it back to allocation state. This
fix also ensures that if a non-target device has a corrupted state and
has the BTRFS_DEV_STATE_REPLACE_TGT bit set, the assertion will still
catch the error.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: b2a616676839 ("btrfs: fix rw device counting in __btrfs_free_extra_devids")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18 13:40:06 +02:00
Josef Bacik
0901af53da btrfs: wake up async_delalloc_pages waiters after submit
commit ac98141d140444fe93e26471d3074c603b70e2ca upstream.

We use the async_delalloc_pages mechanism to make sure that we've
completed our async work before trying to continue our delalloc
flushing.  The reason for this is we need to see any ordered extents
that were created by our delalloc flushing.  However we're waking up
before we do the submit work, which is before we create the ordered
extents.  This is a pretty wide race window where we could potentially
think there are no ordered extents and thus exit shrink_delalloc
prematurely.  Fix this by waking us up after we've done the work to
create ordered extents.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18 13:40:06 +02:00
Greg Kroah-Hartman
674d2ac211 This is the 5.10.62 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEx2IsACgkQONu9yGCS
 aT6bihAA1pqL2x40IIk7nmu3XW/lNJxt4WLRjT7kGehDYE5gNz5VTQDaccb/+PiM
 18XjLN0RcSkhiai8410A/SNoHBvPSMWrNvf093NpFcGm0WcvsZVyNUfRgdXfs1N8
 FM1DAgA5eVowL+CthZyJIIRX6VllLAg1b3Z61frNO/t1nOUsS26enoxMSeaziqjz
 BuvajftaRGtJBMA3iNB1iKR3EEHRqfuWMT4swfQGDshdHlzXbdDOs0KV4bTYRo+f
 76n1bvftiMH4vRONIR3T+ZeJWeaL6IfW01v7qYx+06hBa6TtkwDc+Ofw5zxWFtj1
 7abou5U46gHIp4Ce8ANJzmnvd7iOssFk7+6y8Z7lZgUQCdwHFLP70VX9Jd3bRlsK
 UIwG0qiIG0jdLgUvjgritDWIcxuSniWIbecLr1xfHIzJAIvVr6bjRNyOz3gXgaa/
 h16oHZRhprC3+4baUQPjPiSb54z+y0Xrx8x9ZSvgHe4gQmctjzoKMVr9Fy7NSpQv
 MVVNi9GvqRI+o+Tu7/N3yxShdXvuRvOOFgeOcyvA7BigrxMb+LvOsPcISFIBva6a
 6+vKuRY79TMwEbDJkURnSNqkVyvMuqKhH5OGiKVHkQrp3AkltpDyDuap+6i2oar4
 ba20HaVstcSdCfx8iCmiy/l0g16vOyNC8cVVdB4WoX1fU6aiv9Y=
 =g+cC
 -----END PGP SIGNATURE-----

Merge 5.10.62 into android12-5.10-lts

Changes in 5.10.62
	net: qrtr: fix another OOB Read in qrtr_endpoint_post
	bpf: Fix ringbuf helper function compatibility
	bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper
	ASoC: rt5682: Adjust headset volume button threshold
	ASoC: component: Remove misplaced prefix handling in pin control functions
	ARC: Fix CONFIG_STACKDEPOT
	netfilter: conntrack: collect all entries in one cycle
	once: Fix panic when module unload
	blk-iocost: fix lockdep warning on blkcg->lock
	ovl: fix uninitialized pointer read in ovl_lookup_real_one()
	net: mscc: Fix non-GPL export of regmap APIs
	can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
	ceph: correctly handle releasing an embedded cap flush
	riscv: Ensure the value of FP registers in the core dump file is up to date
	Revert "btrfs: compression: don't try to compress if we don't have enough pages"
	drm/amdgpu: Cancel delayed work when GFXOFF is disabled
	Revert "USB: serial: ch341: fix character loss at high transfer rates"
	USB: serial: option: add new VID/PID to support Fibocom FG150
	usb: renesas-xhci: Prefer firmware loading on unknown ROM state
	usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
	usb: dwc3: gadget: Stop EP0 transfers during pullup disable
	scsi: core: Fix hang of freezing queue between blocking and running device
	RDMA/bnxt_re: Add missing spin lock initialization
	IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
	RDMA/bnxt_re: Remove unpaired rtnl unlock in bnxt_re_dev_init()
	ice: do not abort devlink info if board identifier can't be found
	net: usb: pegasus: fixes of set_register(s) return value evaluation;
	igc: fix page fault when thunderbolt is unplugged
	igc: Use num_tx_queues when iterating over tx_ring queue
	e1000e: Fix the max snoop/no-snoop latency for 10M
	e1000e: Do not take care about recovery NVM checksum
	RDMA/efa: Free IRQ vectors on error flow
	ip_gre: add validation for csum_start
	xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()'
	net: marvell: fix MVNETA_TX_IN_PRGRS bit number
	ucounts: Increase ucounts reference counter before the security hook
	net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
	ipv6: use siphash in rt6_exception_hash()
	ipv4: use siphash instead of Jenkins in fnhe_hashfun()
	cxgb4: dont touch blocked freelist bitmap after free
	rtnetlink: Return correct error on changing device netns
	net: hns3: clear hardware resource when loading driver
	net: hns3: add waiting time before cmdq memory is released
	net: hns3: fix duplicate node in VLAN list
	net: hns3: fix get wrong pfc_en when query PFC configuration
	Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711"
	net: stmmac: add mutex lock to protect est parameters
	net: stmmac: fix kernel panic due to NULL pointer dereference of plat->est
	drm/i915: Fix syncmap memory leak
	usb: gadget: u_audio: fix race condition on endpoint stop
	dt-bindings: sifive-l2-cache: Fix 'select' matching
	perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32
	clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference
	iwlwifi: pnvm: accept multiple HW-type TLVs
	opp: remove WARN when no valid OPPs remain
	cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
	virtio: Improve vq->broken access to avoid any compiler optimization
	virtio_pci: Support surprise removal of virtio pci device
	virtio_vdpa: reject invalid vq indices
	vringh: Use wiov->used to check for read/write desc order
	tools/virtio: fix build
	qed: qed ll2 race condition fixes
	qed: Fix null-pointer dereference in qed_rdma_create_qp()
	Revert "drm/amd/pm: fix workload mismatch on vega10"
	drm/amd/pm: change the workload type for some cards
	blk-mq: don't grab rq's refcount in blk_mq_check_expired()
	drm: Copy drm_wait_vblank to user before returning
	drm/nouveau/disp: power down unused DP links during init
	drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences
	net/rds: dma_map_sg is entitled to merge entries
	btrfs: fix race between marking inode needs to be logged and log syncing
	pipe: avoid unnecessary EPOLLET wakeups under normal loads
	pipe: do FASYNC notifications for every pipe IO, not just state changes
	mtd: spinand: Fix incorrect parameters for on-die ECC
	tipc: call tipc_wait_for_connect only when dlen is not 0
	vt_kdsetmode: extend console locking
	Bluetooth: btusb: check conditions before enabling USB ALT 3 for WBS
	riscv: Fixup wrong ftrace remove cflag
	riscv: Fixup patch_text panic in ftrace
	perf env: Fix memory leak of bpf_prog_info_linear member
	perf symbol-elf: Fix memory leak by freeing sdt_note.args
	perf record: Fix memory leak in vDSO found using ASAN
	perf tools: Fix arm64 build error with gcc-11
	perf annotate: Fix jump parsing for C++ code.
	powerpc/perf: Invoke per-CPU variable access with disabled interrupts
	srcu: Provide internal interface to start a Tree SRCU grace period
	srcu: Provide polling interfaces for Tree SRCU grace periods
	srcu: Provide internal interface to start a Tiny SRCU grace period
	srcu: Make Tiny SRCU use multi-bit grace-period counter
	srcu: Provide polling interfaces for Tiny SRCU grace periods
	tracepoint: Use rcu get state and cond sync for static call updates
	usb: typec: ucsi: acpi: Always decode connector change information
	usb: typec: ucsi: Work around PPM losing change information
	usb: typec: ucsi: Clear pending after acking connector change
	net: dsa: mt7530: fix VLAN traffic leaks again
	lkdtm: Enable DOUBLE_FAULT on all architectures
	arm64: dts: qcom: msm8994-angler: Fix gpio-reserved-ranges 85-88
	btrfs: fix NULL pointer dereference when deleting device by invalid id
	kthread: Fix PF_KTHREAD vs to_kthread() race
	Revert "floppy: reintroduce O_NDELAY fix"
	Revert "parisc: Add assembly implementations for memset, strlen, strcpy, strncpy and strcat"
	net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
	audit: move put_tree() to avoid trim_trees refcount underflow and UAF
	bpf: Fix potentially incorrect results with bpf_get_local_storage()
	Linux 5.10.62

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5a9bf4b2c254ae21a10f838494cae1c3fa016be3
2021-09-03 10:51:56 +02:00
Qu Wenruo
c43add24df btrfs: fix NULL pointer dereference when deleting device by invalid id
commit e4571b8c5e9ffa1e85c0c671995bd4dcc5c75091 upstream.

[BUG]
It's easy to trigger NULL pointer dereference, just by removing a
non-existing device id:

 # mkfs.btrfs -f -m single -d single /dev/test/scratch1 \
				     /dev/test/scratch2
 # mount /dev/test/scratch1 /mnt/btrfs
 # btrfs device remove 3 /mnt/btrfs

Then we have the following kernel NULL pointer dereference:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 9 PID: 649 Comm: btrfs Not tainted 5.14.0-rc3-custom+ #35
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 RIP: 0010:btrfs_rm_device+0x4de/0x6b0 [btrfs]
  btrfs_ioctl+0x18bb/0x3190 [btrfs]
  ? lock_is_held_type+0xa5/0x120
  ? find_held_lock.constprop.0+0x2b/0x80
  ? do_user_addr_fault+0x201/0x6a0
  ? lock_release+0xd2/0x2d0
  ? __x64_sys_ioctl+0x83/0xb0
  __x64_sys_ioctl+0x83/0xb0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

[CAUSE]
Commit a27a94c2b0 ("btrfs: Make btrfs_find_device_by_devspec return
btrfs_device directly") moves the "missing" device path check into
btrfs_rm_device().

But btrfs_rm_device() itself can have case where it only receives
@devid, with NULL as @device_path.

In that case, calling strcmp() on NULL will trigger the NULL pointer
dereference.

Before that commit, we handle the "missing" case inside
btrfs_find_device_by_devspec(), which will not check @device_path at all
if @devid is provided, thus no way to trigger the bug.

[FIX]
Before calling strcmp(), also make sure @device_path is not NULL.

Fixes: a27a94c2b0 ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly")
CC: stable@vger.kernel.org # 5.4+
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03 10:09:30 +02:00
Filipe Manana
d845f89d59 btrfs: fix race between marking inode needs to be logged and log syncing
commit bc0939fcfab0d7efb2ed12896b1af3d819954a14 upstream.

We have a race between marking that an inode needs to be logged, either
at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between
btrfs_sync_log(). The following steps describe how the race happens.

1) We are at transaction N;

2) Inode I was previously fsynced in the current transaction so it has:

    inode->logged_trans set to N;

3) The inode's root currently has:

   root->log_transid set to 1
   root->last_log_commit set to 0

   Which means only one log transaction was committed to far, log
   transaction 0. When a log tree is created we set ->log_transid and
   ->last_log_commit of its parent root to 0 (at btrfs_add_log_tree());

4) One more range of pages is dirtied in inode I;

5) Some task A starts an fsync against some other inode J (same root), and
   so it joins log transaction 1.

   Before task A calls btrfs_sync_log()...

6) Task B starts an fsync against inode I, which currently has the full
   sync flag set, so it starts delalloc and waits for the ordered extent
   to complete before calling btrfs_inode_in_log() at btrfs_sync_file();

7) During ordered extent completion we have btrfs_update_inode() called
   against inode I, which in turn calls btrfs_set_inode_last_trans(),
   which does the following:

     spin_lock(&inode->lock);
     inode->last_trans = trans->transaction->transid;
     inode->last_sub_trans = inode->root->log_transid;
     inode->last_log_commit = inode->root->last_log_commit;
     spin_unlock(&inode->lock);

   So ->last_trans is set to N and ->last_sub_trans set to 1.
   But before setting ->last_log_commit...

8) Task A is at btrfs_sync_log():

   - it increments root->log_transid to 2
   - starts writeback for all log tree extent buffers
   - waits for the writeback to complete
   - writes the super blocks
   - updates root->last_log_commit to 1

   It's a lot of slow steps between updating root->log_transid and
   root->last_log_commit;

9) The task doing the ordered extent completion, currently at
   btrfs_set_inode_last_trans(), then finally runs:

     inode->last_log_commit = inode->root->last_log_commit;
     spin_unlock(&inode->lock);

   Which results in inode->last_log_commit being set to 1.
   The ordered extent completes;

10) Task B is resumed, and it calls btrfs_inode_in_log() which returns
    true because we have all the following conditions met:

    inode->logged_trans == N which matches fs_info->generation &&
    inode->last_subtrans (1) <= inode->last_log_commit (1) &&
    inode->last_subtrans (1) <= root->last_log_commit (1) &&
    list inode->extent_tree.modified_extents is empty

    And as a consequence we return without logging the inode, so the
    existing logged version of the inode does not point to the extent
    that was written after the previous fsync.

It should be impossible in practice for one task be able to do so much
progress in btrfs_sync_log() while another task is at
btrfs_set_inode_last_trans() right after it reads root->log_transid and
before it reads root->last_log_commit. Even if kernel preemption is enabled
we know the task at btrfs_set_inode_last_trans() can not be preempted
because it is holding the inode's spinlock.

However there is another place where we do the same without holding the
spinlock, which is in the memory mapped write path at:

  vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  {
     (...)
     BTRFS_I(inode)->last_trans = fs_info->generation;
     BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
     BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
     (...)

So with preemption happening after setting ->last_sub_trans and before
setting ->last_log_commit, it is less of a stretch to have another task
do enough progress at btrfs_sync_log() such that the task doing the memory
mapped write ends up with ->last_sub_trans and ->last_log_commit set to
the same value. It is still a big stretch to get there, as the task doing
btrfs_sync_log() has to start writeback, wait for its completion and write
the super blocks.

So fix this in two different ways:

1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the
   value of ->last_sub_trans minus 1;

2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just
   like we do for buffered and direct writes at btrfs_file_write_iter(),
   which is all we need to make sure multiple writes and fsyncs to an
   inode in the same transaction never result in an fsync missing that
   the inode changed and needs to be logged. Turn this into a helper
   function and use it both at btrfs_page_mkwrite() and at
   btrfs_file_write_iter() - this also fixes the problem that at
   btrfs_page_mkwrite() we were setting those fields without the
   protection of the inode's spinlock.

This is an extremely unlikely race to happen in practice.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03 10:09:28 +02:00
Qu Wenruo
3134292a8e Revert "btrfs: compression: don't try to compress if we don't have enough pages"
commit 4e9655763b82a91e4c341835bb504a2b1590f984 upstream.

This reverts commit f2165627319ffd33a6217275e5690b1ab5c45763.

[BUG]
It's no longer possible to create compressed inline extent after commit
f2165627319f ("btrfs: compression: don't try to compress if we don't
have enough pages").

[CAUSE]
For compression code, there are several possible reasons we have a range
that needs to be compressed while it's no more than one page.

- Compressed inline write
  The data is always smaller than one sector and the test lacks the
  condition to properly recognize a non-inline extent.

- Compressed subpage write
  For the incoming subpage compressed write support, we require page
  alignment of the delalloc range.
  And for 64K page size, we can compress just one page into smaller
  sectors.

For those reasons, the requirement for the data to be more than one page
is not correct, and is already causing regression for compressed inline
data writeback.  The idea of skipping one page to avoid wasting CPU time
could be revisited in the future.

[FIX]
Fix it by reverting the offending commit.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net
Fixes: f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03 10:09:22 +02:00
Greg Kroah-Hartman
a6777a7cee Linux 5.10.61
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmEnjxwACgkQ3qZv95d3
 LNxlaw//VC9rmejl9JwdZQaz3NfhNv2UwamLShlliq2aBmy8d+fGTAMXkCW7UK/M
 gS27oweL6UV6wT8BKgK8LoKJ7L1rI1KDAePpxMCSZw8Mrd8jre0FMVgkkQz/2mFQ
 w3aMmVitCX7RIeShoxP/8fmWnGRkugDx7SZ2csoqRP9Txtih7DAamiun2ttTzzra
 4dxFRtyzw1LpJBSv/pMJ3/FusPkrbwKzSv2wYIgWmfhIWjPcvmBc0ufJmGUK7Gxo
 MJGaUWk9RJ3eNSUPUP3pJlKdzeHdOhOQnydiaUA60pWcDoAyQj7qM06af5hVLsCX
 0Z9r97bzWOF+LNuEiNIGbodJ74IgFv0VTgjlRdZfeLC+5yLyIo7fXAmAA2Od0fiH
 04Ak9+n+FTkl5avUufLrEwHljAAOgcbtJX7W4F/XPW+1P1tZxG9H4ZI+uQN7ZyAB
 fXGo3O3p5J14fI/m8Zr4mXDIiq34OPxSHLx89YryubriO8kdNv69+tciLn2m5a5W
 yctpYgVQnRJt44dg5I/aIzCSOW5+FviRY8slAopdXBIAwUZAZZgW9IVGAh2uRiLS
 tCtbFa3cXUV1JoWsgUdae8BDoOp0dm69yeYrP1f5eVCLe+FHa9P/Z3qxpBLhpPEh
 QOTI14tg3wu3vwFxAA2rsFmSInkh231ryjgGq788BnoHHad3DnI=
 =Rtb5
 -----END PGP SIGNATURE-----

Merge 5.10.61 into android12-5.10-lts

Changes in 5.10.61
	ath: Use safer key clearing with key cache entries
	ath9k: Clear key cache explicitly on disabling hardware
	ath: Export ath_hw_keysetmac()
	ath: Modify ath_key_delete() to not need full key entry
	ath9k: Postpone key cache entry deletion for TXQ frames reference it
	mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards
	media: zr364xx: propagate errors from zr364xx_start_readpipe()
	media: zr364xx: fix memory leaks in probe()
	media: drivers/media/usb: fix memory leak in zr364xx_probe
	KVM: x86: Factor out x86 instruction emulation with decoding
	KVM: X86: Fix warning caused by stale emulation context
	USB: core: Avoid WARNings for 0-length descriptor requests
	USB: core: Fix incorrect pipe calculation in do_proc_control()
	dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers
	dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe()
	spi: spi-mux: Add module info needed for autoloading
	net: xfrm: Fix end of loop tests for list_for_each_entry
	ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218
	dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available
	scsi: pm80xx: Fix TMF task completion race condition
	scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry()
	scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach()
	scsi: core: Avoid printing an error if target_alloc() returns -ENXIO
	scsi: core: Fix capacity set to zero after offlinining device
	drm/amdgpu: fix the doorbell missing when in CGPG issue for renoir.
	qede: fix crash in rmmod qede while automatic debug collection
	ARM: dts: nomadik: Fix up interrupt controller node names
	net: usb: pegasus: Check the return value of get_geristers() and friends;
	net: usb: lan78xx: don't modify phy_device state concurrently
	drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X
	drm/amd/display: workaround for hard hang on HPD on native DP
	Bluetooth: hidp: use correct wait queue when removing ctrl_wait
	arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x
	arm64: dts: qcom: msm8992-bullhead: Remove PSCI
	iommu: Check if group is NULL before remove device
	cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
	dccp: add do-while-0 stubs for dccp_pr_debug macros
	virtio: Protect vqs list access
	vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
	bus: ti-sysc: Fix error handling for sysc_check_active_timer()
	vhost: Fix the calculation in vhost_overflow()
	vdpa/mlx5: Avoid destroying MR on empty iotlb
	soc / drm: mediatek: Move DDP component defines into mtk-mmsys.h
	drm/mediatek: Fix aal size config
	drm/mediatek: Add AAL output size configuration
	bpf: Clear zext_dst of dead insns
	bnxt: don't lock the tx queue from napi poll
	bnxt: disable napi before canceling DIM
	bnxt: make sure xmit_more + errors does not miss doorbells
	bnxt: count Tx drops
	net: 6pack: fix slab-out-of-bounds in decode_data
	ptp_pch: Restore dependency on PCI
	bnxt_en: Disable aRFS if running on 212 firmware
	bnxt_en: Add missing DMA memory barriers
	vrf: Reset skb conntrack connection on VRF rcv
	virtio-net: support XDP when not more queues
	virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO
	net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
	ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
	sch_cake: fix srchost/dsthost hashing mode
	net: mdio-mux: Don't ignore memory allocation errors
	net: mdio-mux: Handle -EPROBE_DEFER correctly
	ovs: clear skb->tstamp in forwarding path
	iommu/vt-d: Consolidate duplicate cache invaliation code
	iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()
	r8152: fix writing USB_BP2_EN
	i40e: Fix ATR queue selection
	iavf: Fix ping is lost after untrusted VF had tried to change MAC
	Revert "flow_offload: action should not be NULL when it is referenced"
	mmc: dw_mmc: Fix hang on data CRC error
	mmc: mmci: stm32: Check when the voltage switch procedure should be done
	mmc: sdhci-msm: Update the software timeout value for sdhc
	clk: imx6q: fix uart earlycon unwork
	clk: qcom: gdsc: Ensure regulator init state matches GDSC state
	ALSA: hda - fix the 'Capture Switch' value change notifications
	tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
	slimbus: messaging: start transaction ids from 1 instead of zero
	slimbus: messaging: check for valid transaction id
	slimbus: ngd: reset dma setup during runtime pm
	ipack: tpci200: fix many double free issues in tpci200_pci_probe
	ipack: tpci200: fix memory leak in the tpci200_register
	ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop
	mmc: sdhci-iproc: Cap min clock frequency on BCM2711
	mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711
	btrfs: prevent rename2 from exchanging a subvol with a directory from different parents
	ALSA: hda/via: Apply runtime PM workaround for ASUS B23E
	s390/pci: fix use after free of zpci_dev
	PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI
	ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8
	ASoC: intel: atom: Fix breakage for PCM buffer address setup
	mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
	fs: warn about impending deprecation of mandatory locks
	io_uring: fix xa_alloc_cycle() error return value check
	io_uring: only assign io_uring_enter() SQPOLL error in actual error case
	Linux 5.10.61

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5b6e2a66b03d1cb01c8310b83dcc2a119c1bd6b3
2021-08-27 20:51:37 +02:00
NeilBrown
67fece6289 btrfs: prevent rename2 from exchanging a subvol with a directory from different parents
[ Upstream commit 3f79f6f6247c83f448c8026c3ee16d4636ef8d4f ]

Cross-rename lacks a check when that would prevent exchanging a
directory and subvolume from different parent subvolume. This causes
data inconsistencies and is caught before commit by tree-checker,
turning the filesystem to read-only.

Calling the renameat2 with RENAME_EXCHANGE flags like

  renameat2(AT_FDCWD, namesrc, AT_FDCWD, namedest, (1 << 1))

on two paths:

  namesrc = dir1/subvol1/dir2
 namedest = subvol2/subvol3

will cause key order problem with following write time tree-checker
report:

  [1194842.307890] BTRFS critical (device loop1): corrupt leaf: root=5 block=27574272 slot=10 ino=258, invalid previous key objectid, have 257 expect 258
  [1194842.322221] BTRFS info (device loop1): leaf 27574272 gen 8 total ptrs 11 free space 15444 owner 5
  [1194842.331562] BTRFS info (device loop1): refs 2 lock_owner 0 current 26561
  [1194842.338772]        item 0 key (256 1 0) itemoff 16123 itemsize 160
  [1194842.338793]                inode generation 3 size 16 mode 40755
  [1194842.338801]        item 1 key (256 12 256) itemoff 16111 itemsize 12
  [1194842.338809]        item 2 key (256 84 2248503653) itemoff 16077 itemsize 34
  [1194842.338817]                dir oid 258 type 2
  [1194842.338823]        item 3 key (256 84 2363071922) itemoff 16043 itemsize 34
  [1194842.338830]                dir oid 257 type 2
  [1194842.338836]        item 4 key (256 96 2) itemoff 16009 itemsize 34
  [1194842.338843]        item 5 key (256 96 3) itemoff 15975 itemsize 34
  [1194842.338852]        item 6 key (257 1 0) itemoff 15815 itemsize 160
  [1194842.338863]                inode generation 6 size 8 mode 40755
  [1194842.338869]        item 7 key (257 12 256) itemoff 15801 itemsize 14
  [1194842.338876]        item 8 key (257 84 2505409169) itemoff 15767 itemsize 34
  [1194842.338883]                dir oid 256 type 2
  [1194842.338888]        item 9 key (257 96 2) itemoff 15733 itemsize 34
  [1194842.338895]        item 10 key (258 12 256) itemoff 15719 itemsize 14
  [1194842.339163] BTRFS error (device loop1): block=27574272 write time tree block corruption detected
  [1194842.339245] ------------[ cut here ]------------
  [1194842.443422] WARNING: CPU: 6 PID: 26561 at fs/btrfs/disk-io.c:449 csum_one_extent_buffer+0xed/0x100 [btrfs]
  [1194842.511863] CPU: 6 PID: 26561 Comm: kworker/u17:2 Not tainted 5.14.0-rc3-git+ #793
  [1194842.511870] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
  [1194842.511876] Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
  [1194842.511976] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
  [1194842.512068] RSP: 0018:ffffa2c284d77da0 EFLAGS: 00010282
  [1194842.512074] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff928867bd9978
  [1194842.512078] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff928867bd9970
  [1194842.512081] RBP: ffff92876b958000 R08: 0000000000000001 R09: 00000000000c0003
  [1194842.512085] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  [1194842.512088] R13: ffff92875f989f98 R14: 0000000000000000 R15: 0000000000000000
  [1194842.512092] FS:  0000000000000000(0000) GS:ffff928867a00000(0000) knlGS:0000000000000000
  [1194842.512095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [1194842.512099] CR2: 000055f5384da1f0 CR3: 0000000102fe4000 CR4: 00000000000006e0
  [1194842.512103] Call Trace:
  [1194842.512128]  ? run_one_async_free+0x10/0x10 [btrfs]
  [1194842.631729]  btree_csum_one_bio+0x1ac/0x1d0 [btrfs]
  [1194842.631837]  run_one_async_start+0x18/0x30 [btrfs]
  [1194842.631938]  btrfs_work_helper+0xd5/0x1d0 [btrfs]
  [1194842.647482]  process_one_work+0x262/0x5e0
  [1194842.647520]  worker_thread+0x4c/0x320
  [1194842.655935]  ? process_one_work+0x5e0/0x5e0
  [1194842.655946]  kthread+0x135/0x160
  [1194842.655953]  ? set_kthread_struct+0x40/0x40
  [1194842.655965]  ret_from_fork+0x1f/0x30
  [1194842.672465] irq event stamp: 1729
  [1194842.672469] hardirqs last  enabled at (1735): [<ffffffffbd1104f5>] console_trylock_spinning+0x185/0x1a0
  [1194842.672477] hardirqs last disabled at (1740): [<ffffffffbd1104cc>] console_trylock_spinning+0x15c/0x1a0
  [1194842.672482] softirqs last  enabled at (1666): [<ffffffffbdc002e1>] __do_softirq+0x2e1/0x50a
  [1194842.672491] softirqs last disabled at (1651): [<ffffffffbd08aab7>] __irq_exit_rcu+0xa7/0xd0

The corrupted data will not be written, and filesystem can be unmounted
and mounted again (all changes since the last commit will be lost).

Add the missing check for new_ino so that all non-subvolumes must reside
under the same parent subvolume. There's an exception allowing to
exchange two subvolumes from any parents as the directory representing a
subvolume is only a logical link and does not have any other structures
related to the parent subvolume, unlike files, directories etc, that
are always in the inode namespace of the parent subvolume.

Fixes: cdd1fedf82 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.7+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-26 08:35:56 -04:00
Greg Kroah-Hartman
a15695131a This is the 5.10.57 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEPgjkACgkQONu9yGCS
 aT4rgg//V+2igeM1O6aFCZODXLiagKe4k52bWC8N28KfjgDTo8znJSvg9wMlQ1vb
 YbUIOFqiTTM3JncZB3V260JRiT+abclyxVXlsXOkUpgZJPaBHGSINm5047Uo/Wwf
 /xLDPMmT1Yn/o8VUfRArm1+Q5wGCnb7a++2LAozElOAiCifk4S915940KDKQDs5f
 PNewMcAhSsPd8lLg+/AqFfACFFmd0CSsLQoF67Cq0fngOEd/2F0tgQkT40+7KBEM
 rd9BSHWFFG3uYE/DnkchmxqY0sJ84hdDdNf1Y3UddIiyXPRLOAenUBeBzbHhAVpb
 LelR/F5SX4BW+z2+25dj4kD4fKxwWjdESGiWse1GEijLWXRKLTTbgCfMphQOz/km
 5qf7wJ7EAfMXmIwHCtF86wHGr1ax37j9qfGSsi+TucLdqafQVTDuSRRH+0gLXCqX
 0ST4kAB/N0EjkAYrYNJSNvKqEFFW7YlWtlk9AO+hHfP52XaeOKzW7z/tgqwiGZOZ
 pfJfQyqxLpfIeVGzkANJaMwIdqcwOhge3FWScscBpuoxhIkbcdByvTYCr3s7wSzv
 jmqja1ofSXoe/7sGNrmRg6VcgnfBgbaOHWuPKvWw/tQlML6bHBflt2Pq1RkyoAsm
 LEF+IuZRfLCV/PoqatzA0PT/W9RDniQ0JEjwooOakDX5C5a91Dw=
 =X3WM
 -----END PGP SIGNATURE-----

Merge 5.10.57 into android12-5.10-lts

Changes in 5.10.57
	drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser"
	Revert "drm/i915: Propagate errors on awaiting already signaled fences"
	btrfs: fix race causing unnecessary inode logging during link and rename
	btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
	regulator: rtmv20: Fix wrong mask for strobe-polarity-high
	regulator: rt5033: Fix n_voltages settings for BUCK and LDO
	spi: stm32h7: fix full duplex irq handler handling
	ASoC: tlv320aic31xx: fix reversed bclk/wclk master bits
	r8152: Fix potential PM refcount imbalance
	qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()
	ASoC: rt5682: Fix the issue of garbled recording after powerd_dbus_suspend
	net: Fix zero-copy head len calculation.
	ASoC: ti: j721e-evm: Fix unbalanced domain activity tracking during startup
	ASoC: ti: j721e-evm: Check for not initialized parent_clk_id
	efi/mokvar: Reserve the table only if it is in boot services data
	nvme: fix nvme_setup_command metadata trace event
	drm/amd/display: Fix comparison error in dcn21 DML
	drm/amd/display: Fix max vstartup calculation for modes with borders
	ACPI: fix NULL pointer dereference
	Revert "Bluetooth: Shutdown controller after workqueues are flushed or cancelled"
	firmware: arm_scmi: Ensure drivers provide a probe function
	firmware: arm_scmi: Add delayed response status check
	Revert "watchdog: iTCO_wdt: Account for rebooting on second timeout"
	selftests/bpf: Add a test for ptr_to_map_value on stack for helper access
	selftest/bpf: Adjust expected verifier errors
	bpf, selftests: Adjust few selftest result_unpriv outcomes
	bpf: Update selftests to reflect new error states
	bpf, selftests: Adjust few selftest outcomes wrt unreachable code
	selftest/bpf: Verifier tests for var-off access
	spi: mediatek: Fix fifo transfer
	Linux 5.10.57

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic9f65afb3ff8d01e17fa27dff20c3592910280f2
2021-08-08 09:36:33 +02:00
Filipe Manana
9e55b9278c btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
[ Upstream commit ecc64fab7d49c678e70bd4c35fe64d2ab3e3d212 ]

When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.

The following example triggers the problem:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ mkdir /mnt/A
  $ mkdir /mnt/B
  $ echo -n "hello world" > /mnt/A/foo

  $ sync

  # Add some new file to A and fsync directory A.
  $ touch /mnt/A/bar
  $ xfs_io -c "fsync" /mnt/A

  # Now trigger inode eviction. We are only interested in triggering
  # eviction for the inode of directory A.
  $ echo 2 > /proc/sys/vm/drop_caches

  # Move foo from directory A to directory B.
  # This deletes the directory entries for foo in A from the log, and
  # does not add the new name for foo in directory B to the log, because
  # logged_trans of A is 0, which is less than the current transaction ID.
  $ mv /mnt/A/foo /mnt/B/foo

  # Now make an fsync to anything except A, B or any file inside them,
  # like for example create a file at the root directory and fsync this
  # new file. This syncs the log that contains all the changes done by
  # previous rename operation.
  $ touch /mnt/baz
  $ xfs_io -c "fsync" /mnt/baz

  <power fail>

  # Mount the filesystem and replay the log.
  $ mount /dev/sdc /mnt

  # Check the filesystem content.
  $ ls -1R /mnt
  /mnt/:
  A
  B
  baz

  /mnt/A:
  bar

  /mnt/B:
  $

  # File foo is gone, it's neither in A/ nor in B/.

Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-08 09:05:22 +02:00
Filipe Manana
e2419c5709 btrfs: fix race causing unnecessary inode logging during link and rename
[ Upstream commit de53d892e5c51dfa0a158e812575a75a6c991f39 ]

When we are doing a rename or a link operation for an inode that was logged
in the previous transaction and that transaction is still committing, we
have a time window where we incorrectly consider that the inode was logged
previously in the current transaction and therefore decide to log it to
update it in the log. The following steps give an example on how this
happens during a link operation:

1) Inode X is logged in transaction 1000, so its logged_trans field is set
   to 1000;

2) Task A starts to commit transaction 1000;

3) The state of transaction 1000 is changed to TRANS_STATE_UNBLOCKED;

4) Task B starts a link operation for inode X, and as a consequence it
   starts transaction 1001;

5) Task A is still committing transaction 1000, therefore the value stored
   at fs_info->last_trans_committed is still 999;

6) Task B calls btrfs_log_new_name(), it reads a value of 999 from
   fs_info->last_trans_committed and because the logged_trans field of
   inode X has a value of 1000, the function does not return immediately,
   instead it proceeds to logging the inode, which should not happen
   because the inode was logged in the previous transaction (1000) and
   not in the current one (1001).

This is not a functional problem, just wasted time and space logging an
inode that does not need to be logged, contributing to higher latency
for link and rename operations.

So fix this by comparing the inodes' logged_trans field with the
generation of the current transaction instead of comparing with the value
stored in fs_info->last_trans_committed.

This case is often hit when running dbench for a long enough duration, as
it does lots of rename operations.

This patch belongs to a patch set that is comprised of the following
patches:

  btrfs: fix race causing unnecessary inode logging during link and rename
  btrfs: fix race that results in logging old extents during a fast fsync
  btrfs: fix race that causes unnecessary logging of ancestor inodes
  btrfs: fix race that makes inode logging fallback to transaction commit
  btrfs: fix race leading to unnecessary transaction commit when logging inode
  btrfs: do not block inode logging for so long during transaction commit

Performance results are mentioned in the change log of the last patch.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-08 09:05:22 +02:00
Greg Kroah-Hartman
8b444656fa This is the 5.10.56 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEKcCUACgkQONu9yGCS
 aT7sMw/7BNJDmX9w+p1lgTIJJzSuz8C/eNgbeZgK7CE4DovO+WL9oEm53vqYcDDo
 j5REnrRhxcBYxwG/GXl1Oniv1wHqw0SplV+5G2NH1RMy23eSFGCw+8G+YOEJnU3P
 94hJuEs/43Py7eZV/VtyO2UMdDRnGI6MlNvu18YjnRJcdqIIl2gln1G8wbyySYVb
 wR1rudvtiEdrmTQr7qGxeIrZNKGwFl0KxEl8j9X/aqxvfe8PRVYKlmtwblf5rybe
 TElQxz2XGRgk8g2yWQmnNoU6rfFHdZ4lTnCwfpFA1XE6/HBA64/1p22QTJUZvyOU
 pbQc1MRaoUncGV9UFAMY1j38JFsVar7YHHOcpp9YIJOjoyiAw4aatGDcntdWDCiG
 X1mCSLs10/xGRPaJJXulp786MH4aTR5qIeoNg8mu3Z3In4ElbBW5xr0wa3N8gs3O
 lEnK/gT2MHiQ1boa+Qy3W+XZmOjWtL69JgbOyRcOYS6lkHL4DFlGL2Nn5u8qGfL4
 hzohJzH36W5SUHDQiYTt1wLNu4iHpAECjxcnk9fCvlcHA5Yu1bqgyQ62i3C9RA6a
 /aO0B0yraHmvCAboemDsESwylxmpiRB3caqKtzlaZjoiOfPydcBwJM46ZfbzLNPh
 l+/YKK2tLOXWyRIhEv8183tVeu7mZ02xjsetPtLltZPJqR+SJKE=
 =8nLw
 -----END PGP SIGNATURE-----

Merge 5.10.56 into android12-5.10-lts

Changes in 5.10.56
	selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c
	io_uring: fix null-ptr-deref in io_sq_offload_start()
	x86/asm: Ensure asm/proto.h can be included stand-alone
	pipe: make pipe writes always wake up readers
	btrfs: fix rw device counting in __btrfs_free_extra_devids
	btrfs: mark compressed range uptodate only if all bio succeed
	Revert "ACPI: resources: Add checks for ACPI IRQ override"
	ACPI: DPTF: Fix reading of attributes
	x86/kvm: fix vcpu-id indexed array sizes
	KVM: add missing compat KVM_CLEAR_DIRTY_LOG
	ocfs2: fix zero out valid data
	ocfs2: issue zeroout to EOF blocks
	can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms
	can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
	can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values
	can: mcba_usb_start(): add missing urb->transfer_dma initialization
	can: usb_8dev: fix memory leak
	can: ems_usb: fix memory leak
	can: esd_usb2: fix memory leak
	alpha: register early reserved memory in memblock
	HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
	NIU: fix incorrect error return, missed in previous revert
	drm/amd/display: ensure dentist display clock update finished in DCN20
	drm/amdgpu: Avoid printing of stack contents on firmware load error
	drm/amdgpu: Fix resource leak on probe error path
	blk-iocost: fix operation ordering in iocg_wake_fn()
	nfc: nfcsim: fix use after free during module unload
	cfg80211: Fix possible memory leak in function cfg80211_bss_update
	RDMA/bnxt_re: Fix stats counters
	bpf: Fix OOB read when printing XDP link fdinfo
	mac80211: fix enabling 4-address mode on a sta vif after assoc
	netfilter: conntrack: adjust stop timestamp to real expiry value
	netfilter: nft_nat: allow to specify layer 4 protocol NAT only
	i40e: Fix logic of disabling queues
	i40e: Fix firmware LLDP agent related warning
	i40e: Fix queue-to-TC mapping on Tx
	i40e: Fix log TC creation failure when max num of queues is exceeded
	tipc: fix implicit-connect for SYN+
	tipc: fix sleeping in tipc accept routine
	net: Set true network header for ECN decapsulation
	net: qrtr: fix memory leaks
	ionic: remove intr coalesce update from napi
	ionic: fix up dim accounting for tx and rx
	ionic: count csum_none when offload enabled
	tipc: do not write skb_shinfo frags when doing decrytion
	octeontx2-pf: Fix interface down flag on error
	mlx4: Fix missing error code in mlx4_load_one()
	KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access
	net: llc: fix skb_over_panic
	drm/msm/dpu: Fix sm8250_mdp register length
	drm/msm/dp: Initialize the INTF_CONFIG register
	skmsg: Make sk_psock_destroy() static
	net/mlx5: Fix flow table chaining
	net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
	sctp: fix return value check in __sctp_rcv_asconf_lookup
	tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
	sis900: Fix missing pci_disable_device() in probe and remove
	can: hi311x: fix a signedness bug in hi3110_cmd()
	bpf: Introduce BPF nospec instruction for mitigating Spectre v4
	bpf: Fix leakage due to insufficient speculative store bypass mitigation
	bpf: Remove superfluous aux sanitation on subprog rejection
	bpf: verifier: Allocate idmap scratch in verifier env
	bpf: Fix pointer arithmetic mask tightening under state pruning
	SMB3: fix readpage for large swap cache
	powerpc/pseries: Fix regression while building external modules
	Revert "perf map: Fix dso->nsinfo refcounting"
	i40e: Add additional info to PHY type error
	can: j1939: j1939_session_deactivate(): clarify lifetime of session object
	Linux 5.10.56

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib3c9244afb7ee5d6ee8d3235efe8956898f486c4
2021-08-04 15:02:23 +02:00
Goldwyn Rodrigues
0a421a2fc5 btrfs: mark compressed range uptodate only if all bio succeed
commit 240246f6b913b0c23733cfd2def1d283f8cc9bbe upstream.

In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.

Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.

Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-04 12:46:39 +02:00
Desmond Cheong Zhi Xi
4e1a57d752 btrfs: fix rw device counting in __btrfs_free_extra_devids
commit b2a616676839e2a6b02c8e40be7f886f882ed194 upstream.

When removing a writeable device in __btrfs_free_extra_devids, the rw
device count should be decremented.

This error was caught by Syzbot which reported a warning in
close_fs_devices:

  WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
  Modules linked in:
  CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
  RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
  RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
  RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
  R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
  R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
  FS:  00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
   open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
   btrfs_fill_super fs/btrfs/super.c:1382 [inline]
   btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x86/0x270 fs/super.c:1498
   fc_mount fs/namespace.c:993 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
   btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x86/0x270 fs/super.c:1498
   do_new_mount fs/namespace.c:2905 [inline]
   path_mount+0x196f/0x2be0 fs/namespace.c:3235
   do_mount fs/namespace.c:3248 [inline]
   __do_sys_mount fs/namespace.c:3456 [inline]
   __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
   do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Because fs_devices->rw_devices was not 0 after
closing all devices. Here is the call trace that was observed:

  btrfs_mount_root():
    btrfs_scan_one_device():
      device_list_add();   <---------------- device added
    btrfs_open_devices():
      open_fs_devices():
        btrfs_open_one_device();   <-------- writable device opened,
	                                     rw device count ++
    btrfs_fill_super():
      open_ctree():
        btrfs_free_extra_devids():
	  __btrfs_free_extra_devids();  <--- writable device removed,
	                              rw device count not decremented
	  fail_tree_roots:
	    btrfs_close_devices():
	      close_fs_devices();   <------- rw device count off by 1

As a note, prior to commit cf89af146b ("btrfs: dev-replace: fail
mount if we don't have replace item with target device"), rw_devices
was decremented on removing a writable device in
__btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
was not set for the device. However, this check does not need to be
reinstated as it is now redundant and incorrect.

In __btrfs_free_extra_devids, we skip removing the device if it is the
target for replacement. This is done by checking whether device->devid
== BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
and so it's redundant to test for that bit.

Additionally, following commit 82372bc816 ("Btrfs: make
the logic of source device removing more clear"), rw_devices is
incremented whenever a writeable device is added to the alloc
list (including the target device in btrfs_dev_replace_finishing), so
all removals of writable devices from the alloc list should also be
accompanied by a decrement to rw_devices.

Reported-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Fixes: cf89af146b ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
CC: stable@vger.kernel.org # 5.10+
Tested-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-04 12:46:39 +02:00
Greg Kroah-Hartman
e4cac2c332 This is the 5.10.54 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEBT00ACgkQONu9yGCS
 aT7svA/9HCRwW+pK3UpK1+0FK7gGH8DA3jSONj775zVEKhboDZNIwZsDqG0Ly+jm
 /JejWXPKZlekaDXgzBfZY3H59xgij/VwYHe8p7cdxfi1TlhmAQwFjLNZnWav8as6
 IyNkpsDJn8fMXmfHDi2u3cb8wrVi/aQDzTlwu88cUtREyZCaaYlo0Fdv9MJyhww/
 p6LWPYQoZ8TmFY+Y/2ORVxFos2UVuU0hhhMdGt9LrX2WNEGRNUZUqmbhcXYfdsX0
 ckSHbijIcWdcka3nQ6yOvdxw75rTqd8c/bP0y+yAteeJ0CykjVnI2cdK+M2ZEi4j
 /JqpGJrRWhsZf5MiO8b3k+I1K62JDa1GYBQ9Amp8FKKzjYLPTNeFAP9IsMyDc4oi
 oW98XM7XzoSEU9t/FSAIGT0hYK9k+lnPxw623LhxD6x3VPynnNAnQsLr+HirOgG6
 mZ79L4ZFu3lUvVsCuCgKn/uxwDopUNlhqo5B2/4M2kSWwe2Xu5bExpGc2bT9xCOP
 6fF9DmvmpG1UPGCXrOqaxemyEPmHqmyjKJpxDt6vZqlOL9vqHez4WmEEM1C+E2NZ
 5VKKbBk/KZDxNX9EiFOtI2HRFb1cghoI2Hcb/QjRoB9Dv3a6cHgjxDl0eKm8SiDN
 +1ytV0IFH3fT4aRiXJ7I3GBwkjKcDaX0sjYwtnCx9s5XZmm9PRQ=
 =HAyL
 -----END PGP SIGNATURE-----

Merge 5.10.54 into android12-5.10-lts

Changes in 5.10.54
	igc: Fix use-after-free error during reset
	igb: Fix use-after-free error during reset
	igc: change default return of igc_read_phy_reg()
	ixgbe: Fix an error handling path in 'ixgbe_probe()'
	igc: Fix an error handling path in 'igc_probe()'
	igb: Fix an error handling path in 'igb_probe()'
	fm10k: Fix an error handling path in 'fm10k_probe()'
	e1000e: Fix an error handling path in 'e1000_probe()'
	iavf: Fix an error handling path in 'iavf_probe()'
	igb: Check if num of q_vectors is smaller than max before array access
	igb: Fix position of assignment to *ring
	gve: Fix an error handling path in 'gve_probe()'
	net: add kcov handle to skb extensions
	bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
	bonding: fix null dereference in bond_ipsec_add_sa()
	ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
	bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
	bonding: disallow setting nested bonding + ipsec offload
	bonding: Add struct bond_ipesc to manage SA
	bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
	bonding: fix incorrect return value of bond_ipsec_offload_ok()
	ipv6: fix 'disable_policy' for fwd packets
	stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
	selftests: icmp_redirect: remove from checking for IPv6 route get
	selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
	pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
	cxgb4: fix IRQ free race during driver unload
	mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join
	nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
	KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
	perf inject: Fix dso->nsinfo refcounting
	perf map: Fix dso->nsinfo refcounting
	perf probe: Fix dso->nsinfo refcounting
	perf env: Fix sibling_dies memory leak
	perf test session_topology: Delete session->evlist
	perf test event_update: Fix memory leak of evlist
	perf dso: Fix memory leak in dso__new_map()
	perf test maps__merge_in: Fix memory leak of maps
	perf env: Fix memory leak of cpu_pmu_caps
	perf report: Free generated help strings for sort option
	perf script: Fix memory 'threads' and 'cpus' leaks on exit
	perf lzma: Close lzma stream on exit
	perf probe-file: Delete namelist in del_events() on the error path
	perf data: Close all files in close_dir()
	perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
	ASoC: wm_adsp: Correct wm_coeff_tlv_get handling
	spi: imx: add a check for speed_hz before calculating the clock
	spi: stm32: fixes pm_runtime calls in probe/remove
	regulator: hi6421: Use correct variable type for regmap api val argument
	regulator: hi6421: Fix getting wrong drvdata
	spi: mediatek: fix fifo rx mode
	ASoC: rt5631: Fix regcache sync errors on resume
	bpf, test: fix NULL pointer dereference on invalid expected_attach_type
	bpf: Fix tail_call_reachable rejection for interpreter when jit failed
	xdp, net: Fix use-after-free in bpf_xdp_link_release
	timers: Fix get_next_timer_interrupt() with no timers pending
	liquidio: Fix unintentional sign extension issue on left shift of u16
	s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
	bpf, sockmap: Fix potential memory leak on unlikely error case
	bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats
	bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats
	bpftool: Check malloc return value in mount_bpffs_for_pin
	net: fix uninit-value in caif_seqpkt_sendmsg
	usb: hso: fix error handling code of hso_create_net_device
	dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
	efi/tpm: Differentiate missing and invalid final event log table.
	net: decnet: Fix sleeping inside in af_decnet
	KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
	KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
	net: sched: fix memory leak in tcindex_partial_destroy_work
	sctp: trim optlen when it's a huge value in sctp_setsockopt
	netrom: Decrease sock refcount when sock timers expire
	scsi: iscsi: Fix iface sysfs attr detection
	scsi: target: Fix protect handling in WRITE SAME(32)
	spi: cadence: Correct initialisation of runtime PM again
	ACPI: Kconfig: Fix table override from built-in initrd
	bnxt_en: don't disable an already disabled PCI device
	bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe()
	bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task()
	bnxt_en: Validate vlan protocol ID on RX packets
	bnxt_en: Check abort error state in bnxt_half_open_nic()
	net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition
	net/tcp_fastopen: fix data races around tfo_active_disable_stamp
	ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
	net: hns3: fix possible mismatches resp of mailbox
	net: hns3: fix rx VLAN offload state inconsistent issue
	spi: spi-bcm2835: Fix deadlock
	net/sched: act_skbmod: Skip non-Ethernet packets
	ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
	ceph: don't WARN if we're still opening a session to an MDS
	nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
	Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
	afs: Fix tracepoint string placement with built-in AFS
	r8169: Avoid duplicate sysfs entry creation error
	nvme: set the PRACT bit when using Write Zeroes with T10 PI
	sctp: update active_key for asoc when old key is being replaced
	tcp: disable TFO blackhole logic by default
	net: dsa: sja1105: make VID 4095 a bridge VLAN too
	net: sched: cls_api: Fix the the wrong parameter
	drm/panel: raspberrypi-touchscreen: Prevent double-free
	cifs: only write 64kb at a time when fallocating a small region of a file
	cifs: fix fallocate when trying to allocate a hole.
	proc: Avoid mixing integer types in mem_rw()
	mmc: core: Don't allocate IDA for OF aliases
	s390/ftrace: fix ftrace_update_ftrace_func implementation
	s390/boot: fix use of expolines in the DMA code
	ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
	ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
	ALSA: sb: Fix potential ABBA deadlock in CSP driver
	ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
	ALSA: hdmi: Expose all pins on MSI MS-7C94 board
	ALSA: pcm: Call substream ack() method upon compat mmap commit
	ALSA: pcm: Fix mmap capability check
	Revert "usb: renesas-xhci: Fix handling of unknown ROM state"
	usb: xhci: avoid renesas_usb_fw.mem when it's unusable
	xhci: Fix lost USB 2 remote wake
	KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
	KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
	usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
	usb: hub: Fix link power management max exit latency (MEL) calculations
	USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
	usb: max-3421: Prevent corruption of freed memory
	usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
	USB: serial: option: add support for u-blox LARA-R6 family
	USB: serial: cp210x: fix comments for GE CS1000
	USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
	usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
	usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
	usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
	usb: typec: stusb160x: register role switch before interrupt registration
	firmware/efi: Tell memblock about EFI iomem reservations
	tracepoints: Update static_call before tp_funcs when adding a tracepoint
	tracing/histogram: Rename "cpu" to "common_cpu"
	tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
	tracing: Synthetic event field_pos is an index not a boolean
	btrfs: check for missing device in btrfs_trim_fs
	media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
	ixgbe: Fix packet corruption due to missing DMA sync
	bus: mhi: core: Validate channel ID when processing command completions
	posix-cpu-timers: Fix rearm racing against process tick
	selftest: use mmap instead of posix_memalign to allocate memory
	io_uring: explicitly count entries for poll reqs
	io_uring: remove double poll entry on arm failure
	userfaultfd: do not untag user pointers
	memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
	hugetlbfs: fix mount mode command line processing
	rbd: don't hold lock_rwsem while running_list is being drained
	rbd: always kick acquire on "acquired" and "released" notifications
	misc: eeprom: at24: Always append device id even if label property is set.
	nds32: fix up stack guard gap
	driver core: Prevent warning when removing a device link from unregistered consumer
	drm: Return -ENOTTY for non-drm ioctls
	drm/amdgpu: update golden setting for sienna_cichlid
	net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
	net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
	PCI: Mark AMD Navi14 GPU ATS as broken
	bonding: fix build issue
	skbuff: Release nfct refcount on napi stolen or re-used skbs
	Documentation: Fix intiramfs script name
	perf inject: Close inject.output on exit
	usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
	drm/i915/gvt: Clear d3_entered on elsp cmd submission.
	sfc: ensure correct number of XDP queues
	xhci: add xhci_get_virt_ep() helper
	skbuff: Fix build with SKB extensions disabled
	Linux 5.10.54

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e
2021-07-28 15:23:47 +02:00
Anand Jain
755971dc7e btrfs: check for missing device in btrfs_trim_fs
commit 16a200f66ede3f9afa2e51d90ade017aaa18d213 upstream.

A fstrim on a degraded raid1 can trigger the following null pointer
dereference:

  BTRFS info (device loop0): allowing degraded mounts
  BTRFS info (device loop0): disk space caching is enabled
  BTRFS info (device loop0): has skinny extents
  BTRFS warning (device loop0): devid 2 uuid 97ac16f7-e14d-4db1-95bc-3d489b424adb is missing
  BTRFS warning (device loop0): devid 2 uuid 97ac16f7-e14d-4db1-95bc-3d489b424adb is missing
  BTRFS info (device loop0): enabling ssd optimizations
  BUG: kernel NULL pointer dereference, address: 0000000000000620
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 4574 Comm: fstrim Not tainted 5.13.0-rc7+ #31
  Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
  RIP: 0010:btrfs_trim_fs+0x199/0x4a0 [btrfs]
  RSP: 0018:ffff959541797d28 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff946f84eca508 RCX: a7a67937adff8608
  RDX: ffff946e8122d000 RSI: 0000000000000000 RDI: ffffffffc02fdbf0
  RBP: ffff946ea4615000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: ffff946e8122d960 R12: 0000000000000000
  R13: ffff959541797db8 R14: ffff946e8122d000 R15: ffff959541797db8
  FS:  00007f55917a5080(0000) GS:ffff946f9bc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000620 CR3: 000000002d2c8001 CR4: 00000000000706f0
  Call Trace:
  btrfs_ioctl_fitrim+0x167/0x260 [btrfs]
  btrfs_ioctl+0x1c00/0x2fe0 [btrfs]
  ? selinux_file_ioctl+0x140/0x240
  ? syscall_trace_enter.constprop.0+0x188/0x240
  ? __x64_sys_ioctl+0x83/0xb0
  __x64_sys_ioctl+0x83/0xb0

Reproducer:

  $ mkfs.btrfs -fq -d raid1 -m raid1 /dev/loop0 /dev/loop1
  $ mount /dev/loop0 /btrfs
  $ umount /btrfs
  $ btrfs dev scan --forget
  $ mount -o degraded /dev/loop0 /btrfs

  $ fstrim /btrfs

The reason is we call btrfs_trim_free_extents() for the missing device,
which uses device->bdev (NULL for missing device) to find if the device
supports discard.

Fix is to check if the device is missing before calling
btrfs_trim_free_extents().

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-28 14:35:45 +02:00
Greg Kroah-Hartman
2df0fb4a4b This is the 5.10.50 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDu+1UACgkQONu9yGCS
 aT7jQRAAuLDi7ejk3JUameYFMzVXGAUE6yPs392/lWJzey7IBf+2uLqz4FzqqUHp
 U1GkEKJVaCacEfi0+rpi7BxNFljUdZdg/F/P68ARtAWPvwqAeJ4QIh5u3A682UUO
 1M5h6e5/oY9F4kQIb5Kot04avqOeR6lTqrkA8jeP5h43ngyLWuS2d+5oOGmbCukS
 UgEaCC6CiKjcN51UUTj/fXMQ0X4IDHP5pD8rWwH0IvK0i7gduvk744un8LVB6aW1
 rNV88C3BEFFtkPQh2XySnXM5Ok8kYlhFoTDsqlpeAX7pA8hiUPYBoRzTg0MJtPZn
 N1L/Yqhvxmn5xs9HAw7mDOo8E8NWXzsT5FvZVaBeiCgtdKmcPszylXqmSt1oiOb0
 /EmkCWmlbG/3qWql24+LU4XP36iVPx32HQxAgg2XbnlNU5o0E1y2F98p6p/3JSWX
 NAjHtmg/MxueFQ+w8bDzhO8YzYn1dIU3V3qaXRvtpODrmaSYW+bwCyPtSjXe3/vL
 604zb3dOg9+tD/gKqfRb/UPMu24nNll8M/gnSRci05/thmIxwtYudPwoLNSejDqr
 e+a8vejISfIyp41XrpYQbUeKs1WOA+A7vgx6CZrT791afiT+6UgC/ecQfg1NFxhs
 8ayWpocaIszxyXxVGro1rfwZeQmTlbTCZ5wVdpn9sDPZfI7epts=
 =FCrA
 -----END PGP SIGNATURE-----

Merge 5.10.50 into android12-5.10-lts

Changes in 5.10.50
	Bluetooth: hci_qca: fix potential GPF
	Bluetooth: btqca: Don't modify firmware contents in-place
	Bluetooth: Remove spurious error message
	ALSA: usb-audio: fix rate on Ozone Z90 USB headset
	ALSA: usb-audio: Fix OOB access at proc output
	ALSA: firewire-motu: fix stream format for MOTU 8pre FireWire
	ALSA: usb-audio: scarlett2: Fix wrong resume call
	ALSA: intel8x0: Fix breakage at ac97 clock measurement
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
	ALSA: hda/realtek: Add another ALC236 variant support
	ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook x360 830 G8
	ALSA: hda/realtek: Improve fixup for HP Spectre x360 15-df0xxx
	ALSA: hda/realtek: Fix bass speaker DAC mapping for Asus UM431D
	ALSA: hda/realtek: Apply LED fixup for HP Dragonfly G1, too
	ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
	media: dvb-usb: fix wrong definition
	Input: usbtouchscreen - fix control-request directions
	net: can: ems_usb: fix use-after-free in ems_usb_disconnect()
	usb: gadget: eem: fix echo command packet response issue
	usb: renesas-xhci: Fix handling of unknown ROM state
	USB: cdc-acm: blacklist Heimann USB Appset device
	usb: dwc3: Fix debugfs creation flow
	usb: typec: Add the missed altmode_id_remove() in typec_register_altmode()
	xhci: solve a double free problem while doing s4
	gfs2: Fix underflow in gfs2_page_mkwrite
	gfs2: Fix error handling in init_statfs
	ntfs: fix validity check for file name attribute
	selftests/lkdtm: Avoid needing explicit sub-shell
	copy_page_to_iter(): fix ITER_DISCARD case
	iov_iter_fault_in_readable() should do nothing in xarray case
	Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl
	crypto: nx - Fix memcpy() over-reading in nonce
	crypto: ccp - Annotate SEV Firmware file names
	arm_pmu: Fix write counter incorrect in ARMv7 big-endian mode
	ARM: dts: ux500: Fix LED probing
	ARM: dts: at91: sama5d4: fix pinctrl muxing
	btrfs: send: fix invalid path for unlink operations after parent orphanization
	btrfs: compression: don't try to compress if we don't have enough pages
	btrfs: clear defrag status of a root if starting transaction fails
	ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle
	ext4: fix kernel infoleak via ext4_extent_header
	ext4: fix overflow in ext4_iomap_alloc()
	ext4: return error code when ext4_fill_flex_info() fails
	ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
	ext4: remove check for zero nr_to_scan in ext4_es_scan()
	ext4: fix avefreec in find_group_orlov
	ext4: use ext4_grp_locked_error in mb_find_extent
	can: bcm: delay release of struct bcm_op after synchronize_rcu()
	can: gw: synchronize rcu operations before removing gw job entry
	can: isotp: isotp_release(): omit unintended hrtimer restart on socket release
	can: j1939: j1939_sk_init(): set SOCK_RCU_FREE to call sk_destruct() after RCU is done
	can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path
	mac80211: remove iwlwifi specific workaround that broke sta NDP tx
	SUNRPC: Fix the batch tasks count wraparound.
	SUNRPC: Should wake up the privileged task firstly.
	bus: mhi: Wait for M2 state during system resume
	mm/gup: fix try_grab_compound_head() race with split_huge_page()
	perf/smmuv3: Don't trample existing events with global filter
	KVM: nVMX: Handle split-lock #AC exceptions that happen in L2
	KVM: PPC: Book3S HV: Workaround high stack usage with clang
	KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
	KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk
	s390/cio: dont call css_wait_for_slow_path() inside a lock
	s390: mm: Fix secure storage access exception handling
	f2fs: Prevent swap file in LFS mode
	clk: agilex/stratix10/n5x: fix how the bypass_reg is handled
	clk: agilex/stratix10: remove noc_clk
	clk: agilex/stratix10: fix bypass representation
	rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path
	iio: frequency: adf4350: disable reg and clk on error in adf4350_probe()
	iio: light: tcs3472: do not free unallocated IRQ
	iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too
	iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR
	iio: ltr501: ltr501_read_ps(): add missing endianness conversion
	iio: accel: bma180: Fix BMA25x bandwidth register values
	serial: mvebu-uart: fix calculation of clock divisor
	serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
	serial_cs: Add Option International GSM-Ready 56K/ISDN modem
	serial_cs: remove wrong GLOBETROTTER.cis entry
	ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()
	ssb: sdio: Don't overwrite const buffer if block_write fails
	rsi: Assign beacon rate settings to the correct rate_info descriptor field
	rsi: fix AP mode with WPA failure due to encrypted EAPOL
	tracing/histograms: Fix parsing of "sym-offset" modifier
	tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
	seq_buf: Make trace_seq_putmem_hex() support data longer than 8
	powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
	loop: Fix missing discard support when using LOOP_CONFIGURE
	evm: Execute evm_inode_init_security() only when an HMAC key is loaded
	evm: Refuse EVM_ALLOW_METADATA_WRITES only if an HMAC key is loaded
	fuse: Fix crash in fuse_dentry_automount() error path
	fuse: Fix crash if superblock of submount gets killed early
	fuse: Fix infinite loop in sget_fc()
	fuse: ignore PG_workingset after stealing
	fuse: check connected before queueing on fpq->io
	fuse: reject internal errno
	thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
	spi: Make of_register_spi_device also set the fwnode
	Add a reference to ucounts for each cred
	staging: media: rkvdec: fix pm_runtime_get_sync() usage count
	media: marvel-ccic: fix some issues when getting pm_runtime
	media: mdk-mdp: fix pm_runtime_get_sync() usage count
	media: s5p: fix pm_runtime_get_sync() usage count
	media: am437x: fix pm_runtime_get_sync() usage count
	media: sh_vou: fix pm_runtime_get_sync() usage count
	media: mtk-vcodec: fix PM runtime get logic
	media: s5p-jpeg: fix pm_runtime_get_sync() usage count
	media: sunxi: fix pm_runtime_get_sync() usage count
	media: sti/bdisp: fix pm_runtime_get_sync() usage count
	media: exynos4-is: fix pm_runtime_get_sync() usage count
	media: exynos-gsc: fix pm_runtime_get_sync() usage count
	spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf'
	spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages()
	spi: omap-100k: Fix the length judgment problem
	regulator: uniphier: Add missing MODULE_DEVICE_TABLE
	sched/core: Initialize the idle task with preemption disabled
	hwrng: exynos - Fix runtime PM imbalance on error
	crypto: nx - add missing MODULE_DEVICE_TABLE
	media: sti: fix obj-$(config) targets
	media: cpia2: fix memory leak in cpia2_usb_probe
	media: cobalt: fix race condition in setting HPD
	media: hevc: Fix dependent slice segment flags
	media: pvrusb2: fix warning in pvr2_i2c_core_done
	media: imx: imx7_mipi_csis: Fix logging of only error event counters
	crypto: qat - check return code of qat_hal_rd_rel_reg()
	crypto: qat - remove unused macro in FW loader
	crypto: qce: skcipher: Fix incorrect sg count for dma transfers
	arm64: perf: Convert snprintf to sysfs_emit
	sched/fair: Fix ascii art by relpacing tabs
	media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off
	media: bt878: do not schedule tasklet when it is not setup
	media: em28xx: Fix possible memory leak of em28xx struct
	media: hantro: Fix .buf_prepare
	media: cedrus: Fix .buf_prepare
	media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release
	media: bt8xx: Fix a missing check bug in bt878_probe
	media: st-hva: Fix potential NULL pointer dereferences
	crypto: hisilicon/sec - fixup 3des minimum key size declaration
	Makefile: fix GDB warning with CONFIG_RELR
	media: dvd_usb: memory leak in cinergyt2_fe_attach
	memstick: rtsx_usb_ms: fix UAF
	mmc: sdhci-sprd: use sdhci_sprd_writew
	mmc: via-sdmmc: add a check against NULL pointer dereference
	spi: meson-spicc: fix a wrong goto jump for avoiding memory leak.
	spi: meson-spicc: fix memory leak in meson_spicc_probe
	crypto: shash - avoid comparing pointers to exported functions under CFI
	media: dvb_net: avoid speculation from net slot
	media: siano: fix device register error path
	media: imx-csi: Skip first few frames from a BT.656 source
	hwmon: (max31790) Report correct current pwm duty cycles
	hwmon: (max31790) Fix pwmX_enable attributes
	drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe()
	KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors
	btrfs: fix error handling in __btrfs_update_delayed_inode
	btrfs: abort transaction if we fail to update the delayed inode
	btrfs: sysfs: fix format string for some discard stats
	btrfs: don't clear page extent mapped if we're not invalidating the full page
	btrfs: disable build on platforms having page size 256K
	locking/lockdep: Fix the dep path printing for backwards BFS
	lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
	KVM: s390: get rid of register asm usage
	regulator: mt6358: Fix vdram2 .vsel_mask
	regulator: da9052: Ensure enough delay time for .set_voltage_time_sel
	media: Fix Media Controller API config checks
	ACPI: video: use native backlight for GA401/GA502/GA503
	HID: do not use down_interruptible() when unbinding devices
	EDAC/ti: Add missing MODULE_DEVICE_TABLE
	ACPI: processor idle: Fix up C-state latency if not ordered
	hv_utils: Fix passing zero to 'PTR_ERR' warning
	lib: vsprintf: Fix handling of number field widths in vsscanf
	Input: goodix - platform/x86: touchscreen_dmi - Move upside down quirks to touchscreen_dmi.c
	platform/x86: touchscreen_dmi: Add an extra entry for the upside down Goodix touchscreen on Teclast X89 tablets
	platform/x86: touchscreen_dmi: Add info for the Goodix GT912 panel of TM800A550L tablets
	ACPI: EC: Make more Asus laptops use ECDT _GPE
	block_dump: remove block_dump feature in mark_inode_dirty()
	blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
	blk-mq: clear stale request in tags->rq[] before freeing one request pool
	fs: dlm: cancel work sync othercon
	random32: Fix implicit truncation warning in prandom_seed_state()
	open: don't silently ignore unknown O-flags in openat2()
	drivers: hv: Fix missing error code in vmbus_connect()
	fs: dlm: fix memory leak when fenced
	ACPICA: Fix memory leak caused by _CID repair function
	ACPI: bus: Call kobject_put() in acpi_init() error path
	ACPI: resources: Add checks for ACPI IRQ override
	block: fix race between adding/removing rq qos and normal IO
	platform/x86: asus-nb-wmi: Revert "Drop duplicate DMI quirk structures"
	platform/x86: asus-nb-wmi: Revert "add support for ASUS ROG Zephyrus G14 and G15"
	platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard()
	nvme-pci: fix var. type for increasing cq_head
	nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()
	EDAC/Intel: Do not load EDAC driver when running as a guest
	PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv()
	cifs: improve fallocate emulation
	ACPI: EC: trust DSDT GPE for certain HP laptop
	clocksource: Retry clock read if long delays detected
	clocksource: Check per-CPU clock synchronization when marked unstable
	tpm_tis_spi: add missing SPI device ID entries
	ACPI: tables: Add custom DSDT file as makefile prerequisite
	HID: wacom: Correct base usage for capacitive ExpressKey status bits
	cifs: fix missing spinlock around update to ses->status
	mailbox: qcom: Use PLATFORM_DEVID_AUTO to register platform device
	block: fix discard request merge
	kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
	ia64: mca_drv: fix incorrect array size calculation
	writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
	spi: Allow to have all native CSs in use along with GPIOs
	spi: Avoid undefined behaviour when counting unused native CSs
	media: venus: Rework error fail recover logic
	media: s5p_cec: decrement usage count if disabled
	media: hantro: do a PM resume earlier
	crypto: ixp4xx - dma_unmap the correct address
	crypto: ixp4xx - update IV after requests
	crypto: ux500 - Fix error return code in hash_hw_final()
	sata_highbank: fix deferred probing
	pata_rb532_cf: fix deferred probing
	media: I2C: change 'RST' to "RSET" to fix multiple build errors
	sched/uclamp: Fix wrong implementation of cpu.uclamp.min
	sched/uclamp: Fix locking around cpu_util_update_eff()
	kbuild: Fix objtool dependency for 'OBJECT_FILES_NON_STANDARD_<obj> := n'
	pata_octeon_cf: avoid WARN_ON() in ata_host_activate()
	evm: fix writing <securityfs>/evm overflow
	x86/elf: Use _BITUL() macro in UAPI headers
	crypto: sa2ul - Fix leaks on failure paths with sa_dma_init()
	crypto: sa2ul - Fix pm_runtime enable in sa_ul_probe()
	crypto: ccp - Fix a resource leak in an error handling path
	media: rc: i2c: Fix an error message
	pata_ep93xx: fix deferred probing
	locking/lockdep: Reduce LOCKDEP dependency list
	media: rkvdec: Fix .buf_prepare
	media: exynos4-is: Fix a use after free in isp_video_release
	media: au0828: fix a NULL vs IS_ERR() check
	media: tc358743: Fix error return code in tc358743_probe_of()
	media: gspca/gl860: fix zero-length control requests
	m68k: atari: Fix ATARI_KBD_CORE kconfig unmet dependency warning
	media: siano: Fix out-of-bounds warnings in smscore_load_firmware_family2()
	regulator: fan53880: Fix vsel_mask setting for FAN53880_BUCK
	crypto: nitrox - fix unchecked variable in nitrox_register_interrupts
	crypto: omap-sham - Fix PM reference leak in omap sham ops
	crypto: x86/curve25519 - fix cpu feature checking logic in mod_exit
	crypto: sm2 - remove unnecessary reset operations
	crypto: sm2 - fix a memory leak in sm2
	mmc: usdhi6rol0: fix error return code in usdhi6_probe()
	arm64: consistently use reserved_pg_dir
	arm64/mm: Fix ttbr0 values stored in struct thread_info for software-pan
	media: subdev: remove VIDIOC_DQEVENT_TIME32 handling
	media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx
	hwmon: (lm70) Use device_get_match_data()
	hwmon: (lm70) Revert "hwmon: (lm70) Add support for ACPI"
	hwmon: (max31722) Remove non-standard ACPI device IDs
	hwmon: (max31790) Fix fan speed reporting for fan7..12
	KVM: nVMX: Sync all PGDs on nested transition with shadow paging
	KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
	KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
	KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
	perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number
	KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
	regulator: hi655x: Fix pass wrong pointer to config.driver_data
	btrfs: clear log tree recovering status if starting transaction fails
	x86/sev: Make sure IRQs are disabled while GHCB is active
	x86/sev: Split up runtime #VC handler for correct state tracking
	sched/rt: Fix RT utilization tracking during policy change
	sched/rt: Fix Deadline utilization tracking during policy change
	sched/uclamp: Fix uclamp_tg_restrict()
	lockdep: Fix wait-type for empty stack
	lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
	spi: spi-sun6i: Fix chipselect/clock bug
	crypto: nx - Fix RCU warning in nx842_OF_upd_status
	psi: Fix race between psi_trigger_create/destroy
	media: v4l2-async: Clean v4l2_async_notifier_add_fwnode_remote_subdev
	media: video-mux: Skip dangling endpoints
	PM / devfreq: Add missing error code in devfreq_add_device()
	ACPI: PM / fan: Put fan device IDs into separate header file
	block: avoid double io accounting for flush request
	nvme-pci: look for StorageD3Enable on companion ACPI device instead
	ACPI: sysfs: Fix a buffer overrun problem with description_show()
	mark pstore-blk as broken
	clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
	extcon: extcon-max8997: Fix IRQ freeing at error path
	ACPI: APEI: fix synchronous external aborts in user-mode
	blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
	blk-wbt: make sure throttle is enabled properly
	ACPI: Use DEVICE_ATTR_<RW|RO|WO> macros
	ACPI: bgrt: Fix CFI violation
	cpufreq: Make cpufreq_online() call driver->offline() on errors
	blk-mq: update hctx->dispatch_busy in case of real scheduler
	ocfs2: fix snprintf() checking
	dax: fix ENOMEM handling in grab_mapping_entry()
	mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect
	mm/debug_vm_pgtable/basic: iterate over entire protection_map[]
	mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage()
	swap: fix do_swap_page() race with swapoff
	mm/shmem: fix shmem_swapin() race with swapoff
	mm: memcg/slab: properly set up gfp flags for objcg pointer array
	mm: page_alloc: refactor setup_per_zone_lowmem_reserve()
	mm/page_alloc: fix counting of managed_pages
	xfrm: xfrm_state_mtu should return at least 1280 for ipv6
	drm/bridge/sii8620: fix dependency on extcon
	drm/bridge: Fix the stop condition of drm_bridge_chain_pre_enable()
	drm/amd/dc: Fix a missing check bug in dm_dp_mst_detect()
	drm/ast: Fix missing conversions to managed API
	video: fbdev: imxfb: Fix an error message
	net: mvpp2: Put fwnode in error case during ->probe()
	net: pch_gbe: Propagate error from devm_gpio_request_one()
	pinctrl: renesas: r8a7796: Add missing bias for PRESET# pin
	pinctrl: renesas: r8a77990: JTAG pins do not have pull-down capabilities
	drm/vmwgfx: Mark a surface gpu-dirty after the SVGA3dCmdDXGenMips command
	drm/vmwgfx: Fix cpu updates of coherent multisample surfaces
	net: qrtr: ns: Fix error return code in qrtr_ns_init()
	clk: meson: g12a: fix gp0 and hifi ranges
	net: ftgmac100: add missing error return code in ftgmac100_probe()
	drm: rockchip: set alpha_en to 0 if it is not used
	drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write()
	drm/rockchip: dsi: move all lane config except LCDC mux to bind()
	drm/rockchip: lvds: Fix an error handling path
	drm/rockchip: cdn-dp: fix sign extension on an int multiply for a u64 result
	mptcp: fix pr_debug in mptcp_token_new_connect
	mptcp: generate subflow hmac after mptcp_finish_join()
	RDMA/srp: Fix a recently introduced memory leak
	RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
	RDMA/rtrs: Do not reset hb_missed_max after re-connection
	RDMA/rtrs-srv: Fix memory leak of unfreed rtrs_srv_stats object
	RDMA/rtrs-srv: Fix memory leak when having multiple sessions
	RDMA/rtrs-clt: Check if the queue_depth has changed during a reconnection
	RDMA/rtrs-clt: Fix memory leak of not-freed sess->stats and stats->pcpu_stats
	ehea: fix error return code in ehea_restart_qps()
	clk: tegra30: Use 300MHz for video decoder by default
	xfrm: remove the fragment check for ipv6 beet mode
	net/sched: act_vlan: Fix modify to allow 0
	RDMA/core: Sanitize WQ state received from the userspace
	drm/pl111: depend on CONFIG_VEXPRESS_CONFIG
	RDMA/rxe: Fix failure during driver load
	drm/pl111: Actually fix CONFIG_VEXPRESS_CONFIG depends
	drm/vc4: hdmi: Fix error path of hpd-gpios
	clk: vc5: fix output disabling when enabling a FOD
	drm: qxl: ensure surf.data is ininitialized
	tools/bpftool: Fix error return code in do_batch()
	ath10k: go to path err_unsupported when chip id is not supported
	ath10k: add missing error return code in ath10k_pci_probe()
	wireless: carl9170: fix LEDS build errors & warnings
	ieee802154: hwsim: Fix possible memory leak in hwsim_subscribe_all_others
	clk: imx8mq: remove SYS PLL 1/2 clock gates
	wcn36xx: Move hal_buf allocation to devm_kmalloc in probe
	ssb: Fix error return code in ssb_bus_scan()
	brcmfmac: fix setting of station info chains bitmask
	brcmfmac: correctly report average RSSI in station info
	brcmfmac: Fix a double-free in brcmf_sdio_bus_reset
	brcmsmac: mac80211_if: Fix a resource leak in an error handling path
	cw1200: Revert unnecessary patches that fix unreal use-after-free bugs
	ath11k: Fix an error handling path in ath11k_core_fetch_board_data_api_n()
	ath10k: Fix an error code in ath10k_add_interface()
	ath11k: send beacon template after vdev_start/restart during csa
	netlabel: Fix memory leak in netlbl_mgmt_add_common
	RDMA/mlx5: Don't add slave port to unaffiliated list
	netfilter: nft_exthdr: check for IPv6 packet before further processing
	netfilter: nft_osf: check for TCP packet before further processing
	netfilter: nft_tproxy: restrict support to TCP and UDP transport protocols
	RDMA/rxe: Fix qp reference counting for atomic ops
	selftests/bpf: Whitelist test_progs.h from .gitignore
	xsk: Fix missing validation for skb and unaligned mode
	xsk: Fix broken Tx ring validation
	bpf: Fix libelf endian handling in resolv_btfids
	RDMA/rtrs-srv: Set minimal max_send_wr and max_recv_wr
	samples/bpf: Fix Segmentation fault for xdp_redirect command
	samples/bpf: Fix the error return code of xdp_redirect's main()
	mt76: fix possible NULL pointer dereference in mt76_tx
	mt76: mt7615: fix NULL pointer dereference in tx_prepare_skb()
	net: ethernet: aeroflex: fix UAF in greth_of_remove
	net: ethernet: ezchip: fix UAF in nps_enet_remove
	net: ethernet: ezchip: fix error handling
	vrf: do not push non-ND strict packets with a source LLA through packet taps again
	net: sched: add barrier to ensure correct ordering for lockless qdisc
	tls: prevent oversized sendfile() hangs by ignoring MSG_MORE
	netfilter: nf_tables_offload: check FLOW_DISSECTOR_KEY_BASIC in VLAN transfer logic
	pkt_sched: sch_qfq: fix qfq_change_class() error path
	xfrm: Fix xfrm offload fallback fail case
	iwlwifi: increase PNVM load timeout
	rtw88: 8822c: fix lc calibration timing
	vxlan: add missing rcu_read_lock() in neigh_reduce()
	ip6_tunnel: fix GRE6 segmentation
	net/ipv4: swap flow ports when validating source
	net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues
	tc-testing: fix list handling
	ieee802154: hwsim: Fix memory leak in hwsim_add_one
	ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl()
	bpf: Fix null ptr deref with mixed tail calls and subprogs
	drm/msm: Fix error return code in msm_drm_init()
	drm/msm/dpu: Fix error return code in dpu_mdss_init()
	mac80211: remove iwlwifi specific workaround NDPs of null_response
	net: bcmgenet: Fix attaching to PYH failed on RPi 4B
	ipv6: exthdrs: do not blindly use init_net
	can: j1939: j1939_sk_setsockopt(): prevent allocation of j1939 filter for optlen == 0
	bpf: Do not change gso_size during bpf_skb_change_proto()
	i40e: Fix error handling in i40e_vsi_open
	i40e: Fix autoneg disabling for non-10GBaseT links
	i40e: Fix missing rtnl locking when setting up pf switch
	Revert "ibmvnic: remove duplicate napi_schedule call in open function"
	ibmvnic: set ltb->buff to NULL after freeing
	ibmvnic: free tx_pool if tso_pool alloc fails
	RDMA/cma: Protect RMW with qp_mutex
	net: macsec: fix the length used to copy the key for offloading
	net: phy: mscc: fix macsec key length
	net: atlantic: fix the macsec key length
	ipv6: fix out-of-bound access in ip6_parse_tlv()
	e1000e: Check the PCIm state
	net: dsa: sja1105: fix NULL pointer dereference in sja1105_reload_cbs()
	bpfilter: Specify the log level for the kmsg message
	RDMA/cma: Fix incorrect Packet Lifetime calculation
	gve: Fix swapped vars when fetching max queues
	Revert "be2net: disable bh with spin_lock in be_process_mcc"
	Bluetooth: mgmt: Fix slab-out-of-bounds in tlv_data_is_valid
	Bluetooth: Fix not sending Set Extended Scan Response
	Bluetooth: Fix Set Extended (Scan Response) Data
	Bluetooth: Fix handling of HCI_LE_Advertising_Set_Terminated event
	clk: actions: Fix UART clock dividers on Owl S500 SoC
	clk: actions: Fix SD clocks factor table on Owl S500 SoC
	clk: actions: Fix bisp_factor_table based clocks on Owl S500 SoC
	clk: actions: Fix AHPPREDIV-H-AHB clock chain on Owl S500 SoC
	clk: qcom: clk-alpha-pll: fix CAL_L write in alpha_pll_fabia_prepare
	clk: si5341: Wait for DEVICE_READY on startup
	clk: si5341: Avoid divide errors due to bogus register contents
	clk: si5341: Check for input clock presence and PLL lock on startup
	clk: si5341: Update initialization magic
	writeback: fix obtain a reference to a freeing memcg css
	net: lwtunnel: handle MTU calculation in forwading
	net: sched: fix warning in tcindex_alloc_perfect_hash
	net: tipc: fix FB_MTU eat two pages
	RDMA/mlx5: Don't access NULL-cleared mpi pointer
	RDMA/core: Always release restrack object
	MIPS: Fix PKMAP with 32-bit MIPS huge page support
	staging: fbtft: Rectify GPIO handling
	staging: fbtft: Don't spam logs when probe is deferred
	ASoC: rt5682: Disable irq on shutdown
	rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread()
	serial: fsl_lpuart: don't modify arbitrary data on lpuart32
	serial: fsl_lpuart: remove RTSCTS handling from get_mctrl()
	serial: 8250_omap: fix a timeout loop condition
	tty: nozomi: Fix a resource leak in an error handling function
	mwifiex: re-fix for unaligned accesses
	iio: adis_buffer: do not return ints in irq handlers
	iio: adis16400: do not return ints in irq handlers
	iio: adis16475: do not return ints in irq handlers
	iio: accel: bma180: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: accel: bma220: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: accel: hid: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: accel: kxcjk-1013: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: accel: mxc4005: Fix overread of data and alignment issue.
	iio: accel: stk8312: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: accel: stk8ba50: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: adc: ti-ads1015: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: adc: vf610: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: gyro: bmg160: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: humidity: am2315: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: prox: srf08: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: prox: pulsed-light: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: prox: as3935: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: magn: hmc5843: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: magn: bmc150: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: light: isl29125: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: light: tcs3414: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: light: tcs3472: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: chemical: atlas: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: cros_ec_sensors: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
	iio: potentiostat: lmp91000: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
	ASoC: rk3328: fix missing clk_disable_unprepare() on error in rk3328_platform_probe()
	ASoC: hisilicon: fix missing clk_disable_unprepare() on error in hi6210_i2s_startup()
	backlight: lm3630a_bl: Put fwnode in error case during ->probe()
	ASoC: rsnd: tidyup loop on rsnd_adg_clk_query()
	Input: hil_kbd - fix error return code in hil_dev_connect()
	perf scripting python: Fix tuple_set_u64()
	mtd: partitions: redboot: seek fis-index-block in the right node
	mtd: rawnand: arasan: Ensure proper configuration for the asserted target
	staging: mmal-vchiq: Fix incorrect static vchiq_instance.
	char: pcmcia: error out if 'num_bytes_read' is greater than 4 in set_protocol()
	firmware: stratix10-svc: Fix a resource leak in an error handling path
	tty: nozomi: Fix the error handling path of 'nozomi_card_init()'
	leds: class: The -ENOTSUPP should never be seen by user space
	leds: lm3532: select regmap I2C API
	leds: lm36274: Put fwnode in error case during ->probe()
	leds: lm3692x: Put fwnode in any case during ->probe()
	leds: lm3697: Don't spam logs when probe is deferred
	leds: lp50xx: Put fwnode in error case during ->probe()
	scsi: FlashPoint: Rename si_flags field
	scsi: iscsi: Flush block work before unblock
	mfd: mp2629: Select MFD_CORE to fix build error
	mfd: rn5t618: Fix IRQ trigger by changing it to level mode
	fsi: core: Fix return of error values on failures
	fsi: scom: Reset the FSI2PIB engine for any error
	fsi: occ: Don't accept response from un-initialized OCC
	fsi/sbefifo: Clean up correct FIFO when receiving reset request from SBE
	fsi/sbefifo: Fix reset timeout
	visorbus: fix error return code in visorchipset_init()
	iommu/amd: Fix extended features logging
	s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK
	s390: enable HAVE_IOREMAP_PROT
	s390: appldata depends on PROC_SYSCTL
	selftests: splice: Adjust for handler fallback removal
	iommu/dma: Fix IOVA reserve dma ranges
	ASoC: max98373-sdw: use first_hw_init flag on resume
	ASoC: rt1308-sdw: use first_hw_init flag on resume
	ASoC: rt5682-sdw: use first_hw_init flag on resume
	ASoC: rt700-sdw: use first_hw_init flag on resume
	ASoC: rt711-sdw: use first_hw_init flag on resume
	ASoC: rt715-sdw: use first_hw_init flag on resume
	ASoC: rt5682: fix getting the wrong device id when the suspend_stress_test
	ASoC: rt5682-sdw: set regcache_cache_only false before reading RT5682_DEVICE_ID
	ASoC: mediatek: mtk-btcvsd: Fix an error handling path in 'mtk_btcvsd_snd_probe()'
	usb: gadget: f_fs: Fix setting of device and driver data cross-references
	usb: dwc2: Don't reset the core after setting turnaround time
	eeprom: idt_89hpesx: Put fwnode in matching case during ->probe()
	eeprom: idt_89hpesx: Restore printing the unsupported fwnode name
	thunderbolt: Bond lanes only when dual_link_port != NULL in alloc_dev_default()
	iio: adc: at91-sama5d2: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: adc: hx711: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: adc: mxs-lradc: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: adc: ti-ads8688: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
	iio: magn: rm3100: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
	iio: light: vcnl4000: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	ASoC: fsl_spdif: Fix error handler with pm_runtime_enable
	staging: gdm724x: check for buffer overflow in gdm_lte_multi_sdu_pkt()
	staging: gdm724x: check for overflow in gdm_lte_netif_rx()
	staging: rtl8712: fix error handling in r871xu_drv_init
	staging: rtl8712: fix memory leak in rtl871x_load_fw_cb
	coresight: core: Fix use of uninitialized pointer
	staging: mt7621-dts: fix pci address for PCI memory range
	serial: 8250: Actually allow UPF_MAGIC_MULTIPLIER baud rates
	iio: light: vcnl4035: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	iio: prox: isl29501: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
	ASoC: cs42l42: Correct definition of CS42L42_ADC_PDN_MASK
	of: Fix truncation of memory sizes on 32-bit platforms
	mtd: rawnand: marvell: add missing clk_disable_unprepare() on error in marvell_nfc_resume()
	habanalabs: Fix an error handling path in 'hl_pci_probe()'
	scsi: mpt3sas: Fix error return value in _scsih_expander_add()
	soundwire: stream: Fix test for DP prepare complete
	phy: uniphier-pcie: Fix updating phy parameters
	phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe()
	extcon: sm5502: Drop invalid register write in sm5502_reg_data
	extcon: max8997: Add missing modalias string
	powerpc/powernv: Fix machine check reporting of async store errors
	ASoC: atmel-i2s: Fix usage of capture and playback at the same time
	configfs: fix memleak in configfs_release_bin_file
	ASoC: Intel: sof_sdw: add SOF_RT715_DAI_ID_FIX for AlderLake
	ASoC: fsl_spdif: Fix unexpected interrupt after suspend
	leds: as3645a: Fix error return code in as3645a_parse_node()
	leds: ktd2692: Fix an error handling path
	selftests/ftrace: fix event-no-pid on 1-core machine
	serial: 8250: 8250_omap: Disable RX interrupt after DMA enable
	serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
	powerpc: Offline CPU in stop_this_cpu()
	powerpc/papr_scm: Properly handle UUID types and API
	powerpc/64s: Fix copy-paste data exposure into newly created tasks
	powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable
	ALSA: firewire-lib: Fix 'amdtp_domain_start()' when no AMDTP_OUT_STREAM stream is found
	serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
	serial: mvebu-uart: correctly calculate minimal possible baudrate
	arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
	vfio/pci: Handle concurrent vma faults
	mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled
	mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
	mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
	mm/huge_memory.c: don't discard hugepage if other processes are mapping it
	mm/hugetlb: use helper huge_page_order and pages_per_huge_page
	mm/hugetlb: remove redundant check in preparing and destroying gigantic page
	hugetlb: remove prep_compound_huge_page cleanup
	include/linux/huge_mm.h: remove extern keyword
	mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
	mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page
	lib/math/rational.c: fix divide by zero
	selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
	selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
	selftests/vm/pkeys: refill shadow register after implicit kernel write
	perf llvm: Return -ENOMEM when asprintf() fails
	csky: fix syscache.c fallthrough warning
	csky: syscache: Fixup duplicate cache flush
	exfat: handle wrong stream entry size in exfat_readdir()
	scsi: fc: Correct RHBA attributes length
	scsi: target: cxgbit: Unmap DMA buffer before calling target_execute_cmd()
	mailbox: qcom-ipcc: Fix IPCC mbox channel exhaustion
	fscrypt: don't ignore minor_hash when hash is 0
	fscrypt: fix derivation of SipHash keys on big endian CPUs
	tpm: Replace WARN_ONCE() with dev_err_once() in tpm_tis_status()
	erofs: fix error return code in erofs_read_superblock()
	block: return the correct bvec when checking for gaps
	io_uring: fix blocking inline submission
	mmc: block: Disable CMDQ on the ioctl path
	mmc: vub3000: fix control-request direction
	media: exynos4-is: remove a now unused integer
	scsi: core: Retry I/O for Notify (Enable Spinup) Required error
	crypto: qce - fix error return code in qce_skcipher_async_req_handle()
	s390: preempt: Fix preempt_count initialization
	cred: add missing return error code when set_cred_ucounts() failed
	iommu/dma: Fix compile warning in 32-bit builds
	powerpc/preempt: Don't touch the idle task's preempt_count during hotplug
	Linux 5.10.50

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec4eab24ea8eb5a6d79739a1aec8432d93a8f82c
2021-07-14 17:35:23 +02:00
David Sterba
eefebcda89 btrfs: clear log tree recovering status if starting transaction fails
[ Upstream commit 1aeb6b563aea18cd55c73cf666d1d3245a00f08c ]

When a log recovery is in progress, lots of operations have to take that
into account, so we keep this status per tree during the operation. Long
time ago error handling revamp patch 79787eaab4 ("btrfs: replace many
BUG_ONs with proper error handling") removed clearing of the status in
an error branch. Add it back as was intended in e02119d5a7 ("Btrfs:
Add a write ahead tree log to optimize synchronous operations").

There are probably no visible effects, log replay is done only during
mount and if it fails all structures are cleared so the stale status
won't be kept.

Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:09 +02:00
Christophe Leroy
9c0835c69d btrfs: disable build on platforms having page size 256K
[ Upstream commit b05fbcc36be1f8597a1febef4892053a0b2f3f60 ]

With a config having PAGE_SIZE set to 256K, BTRFS build fails
with the following message

  include/linux/compiler_types.h:326:38: error: call to
  '__compiletime_assert_791' declared with attribute error:
  BUILD_BUG_ON failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0

BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
256K pages at the time being.

There are two platforms that can select 256K pages:
 - hexagon
 - powerpc

Disable BTRFS when 256K page size is selected. Supporting this would
require changes to the subpage mode that's currently being developed.
Given that 256K is many times larger than page sizes commonly used and
for what the algorithms and structures have been tuned, it's out of
scope and disabling build is a reasonable option.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:56 +02:00
Qu Wenruo
ad71a9ad74 btrfs: don't clear page extent mapped if we're not invalidating the full page
[ Upstream commit bcd77455d590eaa0422a5e84ae852007cfce574a ]

[BUG]
With current btrfs subpage rw support, the following script can lead to
fs hang:

  $ mkfs.btrfs -f -s 4k $dev
  $ mount $dev -o nospace_cache $mnt
  $ fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt

The fs will hang at btrfs_start_ordered_extent().

[CAUSE]
In above test case, btrfs_invalidate() will be called with the following
parameters:

  offset = 0 length = 53248 page dirty = 1 subpage dirty bitmap = 0x2000

Since @offset is 0, btrfs_invalidate() will try to invalidate the full
page, and finally call clear_page_extent_mapped() which will detach
subpage structure from the page.

And since the page no longer has subpage structure, the subpage dirty
bitmap will be cleared, preventing the dirty range from being written
back, thus no way to wake up the ordered extent.

[FIX]
Just follow other filesystems, only to invalidate the page if the range
covers the full page.

There are cases like truncate_setsize() which can call
btrfs_invalidatepage() with offset == 0 and length != 0 for the last
page of an inode.

Although the old code will still try to invalidate the full page, we are
still safe to just wait for ordered extent to finish.
So it shouldn't cause extra problems.

Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> # [ppc64]
Tested-by: Anand Jain <anand.jain@oracle.com> # [aarch64]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:55 +02:00
David Sterba
703b494a68 btrfs: sysfs: fix format string for some discard stats
[ Upstream commit 8c5ec995616f1202ab92e195fd75d6f60d86f85c ]

The type of discard_bitmap_bytes and discard_extent_bytes is u64 so the
format should be %llu, though the actual values would hardly ever
overflow to negative values.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:55 +02:00
Josef Bacik
8d05e30c97 btrfs: abort transaction if we fail to update the delayed inode
[ Upstream commit 04587ad9bef6ce9d510325b4ba9852b6129eebdb ]

If we fail to update the delayed inode we need to abort the transaction,
because we could leave an inode with the improper counts or some other
such corruption behind.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:55 +02:00
Josef Bacik
e0ffb169a3 btrfs: fix error handling in __btrfs_update_delayed_inode
[ Upstream commit bb385bedded3ccbd794559600de4a09448810f4a ]

If we get an error while looking up the inode item we'll simply bail
without cleaning up the delayed node.  This results in this style of
warning happening on commit:

  WARNING: CPU: 0 PID: 76403 at fs/btrfs/delayed-inode.c:1365 btrfs_assert_delayed_root_empty+0x5b/0x90
  CPU: 0 PID: 76403 Comm: fsstress Tainted: G        W         5.13.0-rc1+ #373
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  RIP: 0010:btrfs_assert_delayed_root_empty+0x5b/0x90
  RSP: 0018:ffffb8bb815a7e50 EFLAGS: 00010286
  RAX: 0000000000000000 RBX: ffff95d6d07e1888 RCX: ffff95d6c0fa3000
  RDX: 0000000000000002 RSI: 000000000029e91c RDI: ffff95d6c0fc8060
  RBP: ffff95d6c0fc8060 R08: 00008d6d701a2c1d R09: 0000000000000000
  R10: ffff95d6d1760ea0 R11: 0000000000000001 R12: ffff95d6c15a4d00
  R13: ffff95d6c0fa3000 R14: 0000000000000000 R15: ffffb8bb815a7e90
  FS:  00007f490e8dbb80(0000) GS:ffff95d73bc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f6e75555cb0 CR3: 00000001101ce001 CR4: 0000000000370ef0
  Call Trace:
   btrfs_commit_transaction+0x43c/0xb00
   ? finish_wait+0x80/0x80
   ? vfs_fsync_range+0x90/0x90
   iterate_supers+0x8c/0x100
   ksys_sync+0x50/0x90
   __do_sys_sync+0xa/0x10
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Because the iref isn't dropped and this leaves an elevated node->count,
so any release just re-queues it onto the delayed inodes list.  Fix this
by going to the out label to handle the proper cleanup of the delayed
node.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:55 +02:00
David Sterba
80d05ce58a btrfs: clear defrag status of a root if starting transaction fails
commit 6819703f5a365c95488b07066a8744841bf14231 upstream.

The defrag loop processes leaves in batches and starting transaction for
each. The whole defragmentation on a given root is protected by a bit
but in case the transaction fails, the bit is not cleared

In case the transaction fails the bit would prevent starting
defragmentation again, so make sure it's cleared.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:55:40 +02:00
David Sterba
6b00b1717f btrfs: compression: don't try to compress if we don't have enough pages
commit f2165627319ffd33a6217275e5690b1ab5c45763 upstream.

The early check if we should attempt compression does not take into
account the number of input pages. It can happen that there's only one
page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED
have been processed, or an isolated page that won't be converted to an
inline extent.

The single page would be compressed but a later check would drop it
again because the result size must be at least one block shorter than
the input. That can never work with just one page.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:55:40 +02:00
Filipe Manana
34172f601a btrfs: send: fix invalid path for unlink operations after parent orphanization
commit d8ac76cdd1755b21e8c008c28d0b7251c0b14986 upstream.

During an incremental send operation, when processing the new references
for the current inode, we might send an unlink operation for another inode
that has a conflicting path and has more than one hard link. However this
path was computed and cached before we processed previous new references
for the current inode. We may have orphanized a directory of that path
while processing a previous new reference, in which case the path will
be invalid and cause the receiver process to fail.

The following reproducer triggers the problem and explains how/why it
happens in its comments:

  $ cat test-send-unlink.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV >/dev/null
  mount $DEV $MNT

  # Create our test files and directory. Inode 259 (file3) has two hard
  # links.
  touch $MNT/file1
  touch $MNT/file2
  touch $MNT/file3

  mkdir $MNT/A
  ln $MNT/file3 $MNT/A/hard_link

  # Filesystem looks like:
  #
  # .                                     (ino 256)
  # |----- file1                          (ino 257)
  # |----- file2                          (ino 258)
  # |----- file3                          (ino 259)
  # |----- A/                             (ino 260)
  #        |---- hard_link                (ino 259)
  #

  # Now create the base snapshot, which is going to be the parent snapshot
  # for a later incremental send.
  btrfs subvolume snapshot -r $MNT $MNT/snap1
  btrfs send -f /tmp/snap1.send $MNT/snap1

  # Move inode 257 into directory inode 260. This results in computing the
  # path for inode 260 as "/A" and caching it.
  mv $MNT/file1 $MNT/A/file1

  # Move inode 258 (file2) into directory inode 260, with a name of
  # "hard_link", moving first inode 259 away since it currently has that
  # location and name.
  mv $MNT/A/hard_link $MNT/tmp
  mv $MNT/file2 $MNT/A/hard_link

  # Now rename inode 260 to something else (B for example) and then create
  # a hard link for inode 258 that has the old name and location of inode
  # 260 ("/A").
  mv $MNT/A $MNT/B
  ln $MNT/B/hard_link $MNT/A

  # Filesystem now looks like:
  #
  # .                                     (ino 256)
  # |----- tmp                            (ino 259)
  # |----- file3                          (ino 259)
  # |----- B/                             (ino 260)
  # |      |---- file1                    (ino 257)
  # |      |---- hard_link                (ino 258)
  # |
  # |----- A                              (ino 258)

  # Create another snapshot of our subvolume and use it for an incremental
  # send.
  btrfs subvolume snapshot -r $MNT $MNT/snap2
  btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2

  # Now unmount the filesystem, create a new one, mount it and try to
  # apply both send streams to recreate both snapshots.
  umount $DEV

  mkfs.btrfs -f $DEV >/dev/null

  mount $DEV $MNT

  # First add the first snapshot to the new filesystem by applying the
  # first send stream.
  btrfs receive -f /tmp/snap1.send $MNT

  # The incremental receive operation below used to fail with the
  # following error:
  #
  #    ERROR: unlink A/hard_link failed: No such file or directory
  #
  # This is because when send is processing inode 257, it generates the
  # path for inode 260 as "/A", since that inode is its parent in the send
  # snapshot, and caches that path.
  #
  # Later when processing inode 258, it first processes its new reference
  # that has the path of "/A", which results in orphanizing inode 260
  # because there is a a path collision. This results in issuing a rename
  # operation from "/A" to "/o260-6-0".
  #
  # Finally when processing the new reference "B/hard_link" for inode 258,
  # it notices that it collides with inode 259 (not yet processed, because
  # it has a higher inode number), since that inode has the name
  # "hard_link" under the directory inode 260. It also checks that inode
  # 259 has two hardlinks, so it decides to issue a unlink operation for
  # the name "hard_link" for inode 259. However the path passed to the
  # unlink operation is "/A/hard_link", which is incorrect since currently
  # "/A" does not exists, due to the orphanization of inode 260 mentioned
  # before. The path is incorrect because it was computed and cached
  # before the orphanization. This results in the receiver to fail with
  # the above error.
  btrfs receive -f /tmp/snap2.send $MNT

  umount $MNT

When running the test, it fails like this:

  $ ./test-send-unlink.sh
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2
  At subvol snap1
  At snapshot snap2
  ERROR: unlink A/hard_link failed: No such file or directory

Fix this by recomputing a path before issuing an unlink operation when
processing the new references for the current inode if we previously
have orphanized a directory.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:55:40 +02:00
Greg Kroah-Hartman
82658bfd88 This is the 5.10.44 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDJzHwACgkQONu9yGCS
 aT6opRAAuTY0BewZFxfx+tMNplEo6Z/AnerfZN5UxjmIWhvE4NBoIhxgj7ZKHzKE
 5xBP53Dunqa6MVrLv3VCyYstUMHO3qlDmYU6tGz/omstaQxqxB0y6b/8+Q3hkzkK
 SjpLeeMIzbpleZXt+zvi5LIwMb7WM2bVLmH4kPVEAW0+GmPWZRentaF7LtHNOOfU
 LBgAcRH2emxGZ3ucHJDI0xsrkdoWfe+sPZqkAiRI06wlK5EZEcc55uCy3OWzO47Z
 579j4s0GX1RP6iuC8tVfoPFPPATRGAHJABX46w2xqs34Ni4PCA9FZ+cTbXgJyZoV
 ZzMnjSvmtM6SjdJDz5JucUS70zU2ailHuTnjZk41XUUYoY5reG1DWSVxPP5LVh3e
 1eC9P5RTHjCKt/oHA+xvfJJzKL3VyBFLpkssJkh/LOjj1yCCMj2XK7u1q+swCd7O
 mJgkZS30c5bIVgV8tHjL2HPG8HnPRqR2+3vZ1yMCOSEDjQs3Qxhjlhohq33Jlfa3
 wHOuuDvJyxgx+c/G4fgofWjU1eYpCltUthDiATl4w9+sACKm0FRm+ZVUlX/xr6WI
 aCBEuk9hFbXQF6Jfbmg3RrhiyF1BTZJ/MKzxaOmEk8HLLEtW249qz5Up651+caqy
 cfppiiV9M/QWB/soemK9uLnoBNjxEdvP00KI362ED99cNwF3eZc=
 =LmrZ
 -----END PGP SIGNATURE-----

Merge 5.10.44 into android12-5.10-lts

Changes in 5.10.44
	proc: Track /proc/$pid/attr/ opener mm_struct
	ASoC: max98088: fix ni clock divider calculation
	ASoC: amd: fix for pcm_read() error
	spi: Fix spi device unregister flow
	spi: spi-zynq-qspi: Fix stack violation bug
	bpf: Forbid trampoline attach for functions with variable arguments
	net/nfc/rawsock.c: fix a permission check bug
	usb: cdns3: Fix runtime PM imbalance on error
	ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
	ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
	vfio-ccw: Reset FSM state to IDLE inside FSM
	vfio-ccw: Serialize FSM IDLE state with I/O completion
	ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
	spi: sprd: Add missing MODULE_DEVICE_TABLE
	usb: chipidea: udc: assign interrupt number to USB gadget structure
	isdn: mISDN: netjet: Fix crash in nj_probe:
	bonding: init notify_work earlier to avoid uninitialized use
	netlink: disable IRQs for netlink_lock_table()
	net: mdiobus: get rid of a BUG_ON()
	cgroup: disable controllers at parse time
	wq: handle VM suspension in stall detection
	net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
	RDS tcp loopback connection can hang
	net:sfc: fix non-freed irq in legacy irq mode
	scsi: bnx2fc: Return failure if io_req is already in ABTS processing
	scsi: vmw_pvscsi: Set correct residual data length
	scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq
	scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
	net: macb: ensure the device is available before accessing GEMGXL control registers
	net: appletalk: cops: Fix data race in cops_probe1
	net: dsa: microchip: enable phy errata workaround on 9567
	nvme-fabrics: decode host pathing error for connect
	MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
	dm verity: fix require_signatures module_param permissions
	bnx2x: Fix missing error code in bnx2x_iov_init_one()
	nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
	nvmet: fix false keep-alive timeout when a controller is torn down
	powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
	powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
	spi: Don't have controller clean up spi device before driver unbind
	spi: Cleanup on failure of initial setup
	i2c: mpc: Make use of i2c_recover_bus()
	i2c: mpc: implement erratum A-004447 workaround
	ALSA: seq: Fix race of snd_seq_timer_open()
	ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun()
	ALSA: hda/realtek: headphone and mic don't work on an Acer laptop
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly G2
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360 1040 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8
	spi: bcm2835: Fix out-of-bounds access with more than 4 slaves
	Revert "ACPI: sleep: Put the FACS table after using it"
	drm: Fix use-after-free read in drm_getunique()
	drm: Lock pointer access in drm_master_release()
	perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server
	KVM: X86: MMU: Use the correct inherited permissions to get shadow page
	kvm: avoid speculation-based attacks from out-of-range memslot accesses
	staging: rtl8723bs: Fix uninitialized variables
	async_xor: check src_offs is not NULL before updating it
	btrfs: return value from btrfs_mark_extent_written() in case of error
	btrfs: promote debugging asserts to full-fledged checks in validate_super
	cgroup1: don't allow '\n' in renaming
	ftrace: Do not blindly read the ip address in ftrace_bug()
	mmc: renesas_sdhi: abort tuning when timeout detected
	mmc: renesas_sdhi: Fix HS400 on R-Car M3-W+
	USB: f_ncm: ncm_bitrate (speed) is unsigned
	usb: f_ncm: only first packet of aggregate needs to start timer
	usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms
	usb: dwc3-meson-g12a: fix usb2 PHY glue init when phy0 is disabled
	usb: dwc3: meson-g12a: Disable the regulator in the error handling path of the probe
	usb: dwc3: gadget: Bail from dwc3_gadget_exit() if dwc->gadget is NULL
	usb: dwc3: ep0: fix NULL pointer exception
	usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling
	usb: typec: wcove: Use LE to CPU conversion when accessing msg->header
	usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
	usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe()
	usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource()
	usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind
	USB: serial: ftdi_sio: add NovaTech OrionMX product ID
	USB: serial: omninet: add device id for Zyxel Omni 56K Plus
	USB: serial: quatech2: fix control-request directions
	USB: serial: cp210x: fix alternate function for CP2102N QFN20
	usb: gadget: eem: fix wrong eem header operation
	usb: fix various gadgets null ptr deref on 10gbps cabling.
	usb: fix various gadget panics on 10gbps cabling
	usb: typec: tcpm: cancel vdm and state machine hrtimer when unregister tcpm port
	usb: typec: tcpm: cancel frs hrtimer when unregister tcpm port
	regulator: core: resolve supply for boot-on/always-on regulators
	regulator: max77620: Use device_set_of_node_from_dev()
	regulator: bd718x7: Fix the BUCK7 voltage setting on BD71837
	regulator: fan53880: Fix missing n_voltages setting
	regulator: bd71828: Fix .n_voltages settings
	regulator: rtmv20: Fix .set_current_limit/.get_current_limit callbacks
	phy: usb: Fix misuse of IS_ENABLED
	usb: dwc3: gadget: Disable gadget IRQ during pullup disable
	usb: typec: mux: Fix copy-paste mistake in typec_mux_match
	drm/mcde: Fix off by 10^3 in calculation
	drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650
	drm/msm/a6xx: update/fix CP_PROTECT initialization
	drm/msm/a6xx: avoid shadow NULL reference in failure path
	RDMA/ipoib: Fix warning caused by destroying non-initial netns
	RDMA/mlx4: Do not map the core_clock page to user space unless enabled
	ARM: cpuidle: Avoid orphan section warning
	vmlinux.lds.h: Avoid orphan section with !SMP
	tools/bootconfig: Fix error return code in apply_xbc()
	phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe()
	ASoC: core: Fix Null-point-dereference in fmt_single_name()
	ASoC: meson: gx-card: fix sound-dai dt schema
	phy: ti: Fix an error code in wiz_probe()
	gpio: wcd934x: Fix shift-out-of-bounds error
	perf: Fix data race between pin_count increment/decrement
	sched/fair: Keep load_avg and load_sum synced
	sched/fair: Make sure to update tg contrib for blocked load
	sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
	x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
	KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message
	IB/mlx5: Fix initializing CQ fragments buffer
	NFS: Fix a potential NULL dereference in nfs_get_client()
	NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()
	perf session: Correct buffer copying when peeking events
	kvm: fix previous commit for 32-bit builds
	NFS: Fix use-after-free in nfs4_init_client()
	NFSv4: Fix second deadlock in nfs4_evict_inode()
	NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
	scsi: core: Fix error handling of scsi_host_alloc()
	scsi: core: Fix failure handling of scsi_add_host_with_dma()
	scsi: core: Put .shost_dev in failure path if host state changes to RUNNING
	scsi: core: Only put parent device if host state differs from SHOST_CREATED
	tracing: Correct the length check which causes memory corruption
	proc: only require mm_struct for writing
	Linux 5.10.44

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic64172b4e72ccb54d96000b3065dd8b33aa9fef5
2021-06-16 13:14:03 +02:00
Nikolay Borisov
31fe243a63 btrfs: promote debugging asserts to full-fledged checks in validate_super
commit aefd7f7065567a4666f42c0fc8cdb379d2e036bf upstream.

Syzbot managed to trigger this assert while performing its fuzzing.
Turns out it's better to have those asserts turned into full-fledged
checks so that in case buggy btrfs images are mounted the users gets
an error and mounting is stopped. Alternatively with CONFIG_BTRFS_ASSERT
disabled such image would have been erroneously allowed to be mounted.

Reported-by: syzbot+a6bf271c02e4fe66b4e4@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add uuids to the messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 12:01:40 +02:00
Ritesh Harjani
ca69dc891b btrfs: return value from btrfs_mark_extent_written() in case of error
commit e7b2ec3d3d4ebeb4cff7ae45cf430182fa6a49fb upstream.

We always return 0 even in case of an error in btrfs_mark_extent_written().
Fix it to return proper error value in case of a failure. All callers
handle it.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 12:01:40 +02:00
Greg Kroah-Hartman
9e08e97ec6 Merge 5.10.43 into android12-5.10
Changes in 5.10.43
	btrfs: tree-checker: do not error out if extent ref hash doesn't match
	net: usb: cdc_ncm: don't spew notifications
	hwmon: (dell-smm-hwmon) Fix index values
	hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
	netfilter: conntrack: unregister ipv4 sockopts on error unwind
	efi/fdt: fix panic when no valid fdt found
	efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
	efi/libstub: prevent read overflow in find_file_option()
	efi: cper: fix snprintf() use in cper_dimm_err_location()
	vfio/pci: Fix error return code in vfio_ecap_init()
	vfio/pci: zap_vma_ptes() needs MMU
	samples: vfio-mdev: fix error handing in mdpy_fb_probe()
	vfio/platform: fix module_put call in error flow
	ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
	HID: logitech-hidpp: initialize level variable
	HID: pidff: fix error return code in hid_pidff_init()
	HID: i2c-hid: fix format string mismatch
	devlink: Correct VIRTUAL port to not have phys_port attributes
	net/sched: act_ct: Offload connections with commit action
	net/sched: act_ct: Fix ct template allocation for zone 0
	mptcp: always parse mptcp options for MPC reqsk
	nvme-rdma: fix in-casule data send for chained sgls
	ACPICA: Clean up context mutex during object deletion
	perf probe: Fix NULL pointer dereference in convert_variable_location()
	net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
	net: sock: fix in-kernel mark setting
	net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
	net/tls: Fix use-after-free after the TLS device goes down and up
	net/mlx5e: Fix incompatible casting
	net/mlx5: Check firmware sync reset requested is set before trying to abort it
	net/mlx5e: Check for needed capability for cvlan matching
	net/mlx5: DR, Create multi-destination flow table with level less than 64
	nvmet: fix freeing unallocated p2pmem
	netfilter: nft_ct: skip expectations for confirmed conntrack
	netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
	drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
	bpf: Simplify cases in bpf_base_func_proto
	bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
	ieee802154: fix error return code in ieee802154_add_iface()
	ieee802154: fix error return code in ieee802154_llsec_getparams()
	igb: add correct exception tracing for XDP
	ixgbevf: add correct exception tracing for XDP
	cxgb4: fix regression with HASH tc prio value update
	ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
	ice: Fix allowing VF to request more/less queues via virtchnl
	ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
	ice: handle the VF VSI rebuild failure
	ice: report supported and advertised autoneg using PHY capabilities
	ice: Allow all LLDP packets from PF to Tx
	i2c: qcom-geni: Add shutdown callback for i2c
	cxgb4: avoid link re-train during TC-MQPRIO configuration
	i40e: optimize for XDP_REDIRECT in xsk path
	i40e: add correct exception tracing for XDP
	ice: simplify ice_run_xdp
	ice: optimize for XDP_REDIRECT in xsk path
	ice: add correct exception tracing for XDP
	ixgbe: optimize for XDP_REDIRECT in xsk path
	ixgbe: add correct exception tracing for XDP
	arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
	optee: use export_uuid() to copy client UUID
	bus: ti-sysc: Fix am335x resume hang for usb otg module
	arm64: dts: ls1028a: fix memory node
	arm64: dts: zii-ultra: fix 12V_MAIN voltage
	arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
	ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
	ARM: dts: imx7d-pico: Fix the 'tuning-step' property
	ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
	bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
	tipc: add extack messages for bearer/media failure
	tipc: fix unique bearer names sanity check
	serial: stm32: fix threaded interrupt handling
	riscv: vdso: fix and clean-up Makefile
	io_uring: fix link timeout refs
	io_uring: use better types for cflags
	drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
	Bluetooth: fix the erroneous flush_work() order
	Bluetooth: use correct lock to prevent UAF of hdev object
	wireguard: do not use -O3
	wireguard: peer: allocate in kmem_cache
	wireguard: use synchronize_net rather than synchronize_rcu
	wireguard: selftests: remove old conntrack kconfig value
	wireguard: selftests: make sure rp_filter is disabled on vethc
	wireguard: allowedips: initialize list head in selftest
	wireguard: allowedips: remove nodes in O(1)
	wireguard: allowedips: allocate nodes in kmem_cache
	wireguard: allowedips: free empty intermediate nodes when removing single node
	net: caif: added cfserl_release function
	net: caif: add proper error handling
	net: caif: fix memory leak in caif_device_notify
	net: caif: fix memory leak in cfusbl_device_notify
	HID: i2c-hid: Skip ELAN power-on command after reset
	HID: magicmouse: fix NULL-deref on disconnect
	HID: multitouch: require Finger field to mark Win8 reports as MT
	gfs2: fix scheduling while atomic bug in glocks
	ALSA: timer: Fix master timer notification
	ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
	ALSA: hda: update the power_state during the direct-complete
	ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
	ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
	ext4: fix memory leak in ext4_fill_super
	ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
	ext4: fix fast commit alignment issues
	ext4: fix memory leak in ext4_mb_init_backend on error path.
	ext4: fix accessing uninit percpu counter variable with fast_commit
	usb: dwc2: Fix build in periphal-only mode
	pid: take a reference when initializing `cad_pid`
	ocfs2: fix data corruption by fallocate
	mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
	mm/page_alloc: fix counting of free pages after take off from buddy
	x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
	x86/sev: Check SME/SEV support in CPUID first
	nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
	drm/amdgpu: Don't query CE and UE errors
	drm/amdgpu: make sure we unpin the UVD BO
	x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
	powerpc/kprobes: Fix validation of prefixed instructions across page boundary
	btrfs: mark ordered extent and inode with error if we fail to finish
	btrfs: fix error handling in btrfs_del_csums
	btrfs: return errors from btrfs_del_csums in cleanup_ref_head
	btrfs: fixup error handling in fixup_inode_link_counts
	btrfs: abort in rename_exchange if we fail to insert the second ref
	btrfs: fix deadlock when cloning inline extents and low on available space
	mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
	drm/msm/dpu: always use mdp device to scale bandwidth
	btrfs: fix unmountable seed device after fstrim
	KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
	KVM: arm64: Fix debug register indexing
	x86/kvm: Teardown PV features on boot CPU as well
	x86/kvm: Disable kvmclock on all CPUs on shutdown
	x86/kvm: Disable all PV features on crash
	lib/lz4: explicitly support in-place decompression
	i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
	netfilter: nf_tables: missing error reporting for not selected expressions
	xen-netback: take a reference to the RX task thread
	neighbour: allow NUD_NOARP entries to be forced GCed
	Linux 5.10.43

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d7ec0878193e4e454076809b7fb71fcc4e3d810
2021-06-12 14:48:14 +02:00
Anand Jain
fe910d20e2 btrfs: fix unmountable seed device after fstrim
commit 5e753a817b2d5991dfe8a801b7b1e8e79a1c5a20 upstream.

The following test case reproduces an issue of wrongly freeing in-use
blocks on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

Create a seed device and add a sprout device to it:

  $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
  $ btrfstune -S 1 /dev/loop0
  $ mount /dev/loop0 /btrfs
  $ btrfs dev add -f /dev/loop1 /btrfs
  BTRFS info (device loop0): relocating block group 290455552 flags system
  BTRFS info (device loop0): relocating block group 1048576 flags system
  BTRFS info (device loop0): disk added /dev/loop1
  $ umount /btrfs

Mount the sprout device and run fstrim:

  $ mount /dev/loop1 /btrfs
  $ fstrim /btrfs
  $ umount /btrfs

Now try to mount the seed device, and it fails:

  $ mount /dev/loop0 /btrfs
  mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Block 5292032 is missing on the readonly seed device:

 $ dmesg -kt | tail
 <snip>
 BTRFS error (device loop0): bad tree block start, want 5292032 have 0
 BTRFS warning (device loop0): couldn't read-tree root
 BTRFS error (device loop0): open_ctree failed

>From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880:

  $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
  <snip>
  item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
  	block group used 114688 chunk_objectid 256 flags METADATA
  <snip>

>From the dump-tree of the sprout device (taken before the fstrim).
fstrim used block-group 5242880 to find the related free space to free:

  $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
  <snip>
  item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
  	block group used 32768 chunk_objectid 256 flags METADATA
  <snip>

BPF kernel tracing the fstrim command finds the missing block 5292032
within the range of the discarded blocks as below:

  kprobe:btrfs_discard_extent {
  	printf("freeing start %llu end %llu num_bytes %llu:\n",
  		arg1, arg1+arg2, arg2);
  }

  freeing start 5259264 end 5406720 num_bytes 147456
  <snip>

Fix this by avoiding the discard command to the readonly seed device.

Reported-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Filipe Manana
baa6763123 btrfs: fix deadlock when cloning inline extents and low on available space
commit 76a6d5cd74479e7ec8a7f9a29bce63d5549b6b2e upstream.

There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:

1) When starting writeback for an inode with a very small dirty range that
   fits in an inline extent, we deadlock during the writeback when trying
   to insert the inline extent, at cow_file_range_inline(), if the extent
   is going to be located in the leaf for which we are already holding a
   read lock;

2) After successfully starting writeback, for non-inline extent cases,
   the async reclaim thread will hang waiting for an ordered extent to
   complete if the ordered extent completion needs to modify the leaf
   for which the clone task is holding a read lock (for adding or
   replacing file extent items). So the cloning task will wait forever
   on the async reclaim thread to make progress, which in turn is
   waiting for the ordered extent completion which in turn is waiting
   to acquire a write lock on the same leaf.

So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.

Fixes: 05a5a7621c ("Btrfs: implement full reflink support for inline extents")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
0df50d47d1 btrfs: abort in rename_exchange if we fail to insert the second ref
commit dc09ef3562726cd520c8338c1640872a60187af5 upstream.

Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange.  This happens because
we insert the inode ref for one side of the rename, and then for the
other side.  If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind.  Fix this by
aborting if we did the insert for the first inode ref.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
48568f3944 btrfs: fixup error handling in fixup_inode_link_counts
commit 011b28acf940eb61c000059dd9e2cfcbf52ed96b upstream.

This function has the following pattern

	while (1) {
		ret = whatever();
		if (ret)
			goto out;
	}
	ret = 0
out:
	return ret;

However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.

Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the

	ret = 0;
out:

bit so everybody can break if there is an error, which will allow for
proper error handling to occur.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
466d83fdbb btrfs: return errors from btrfs_del_csums in cleanup_ref_head
commit 856bd270dc4db209c779ce1e9555c7641ffbc88e upstream.

We are unconditionally returning 0 in cleanup_ref_head, despite the fact
that btrfs_del_csums could fail.  We need to return the error so the
transaction gets aborted properly, fix this by returning ret from
btrfs_del_csums in cleanup_ref_head.

Reviewed-by: Qu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
5a89982fa2 btrfs: fix error handling in btrfs_del_csums
commit b86652be7c83f70bf406bed18ecf55adb9bfb91b upstream.

Error injection stress would sometimes fail with checksums on disk that
did not have a corresponding extent.  This occurred because the pattern
in btrfs_del_csums was

	while (1) {
		ret = btrfs_search_slot();
		if (ret < 0)
			break;
	}
	ret = 0;
out:
	btrfs_free_path(path);
	return ret;

If we got an error from btrfs_search_slot we'd clear the error because
we were breaking instead of goto out.  Instead of using goto out, simply
handle the cases where we may leave a random value in ret, and get rid
of the

	ret = 0;
out:

pattern and simply allow break to have the proper error reporting.  With
this fix we properly abort the transaction and do not commit thinking we
successfully deleted the csum.

Reviewed-by: Qu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:27 +02:00
Josef Bacik
b547a16b24 btrfs: mark ordered extent and inode with error if we fail to finish
commit d61bec08b904cf171835db98168f82bc338e92e4 upstream.

While doing error injection testing I saw that sometimes we'd get an
abort that wouldn't stop the current transaction commit from completing.
This abort was coming from finish ordered IO, but at this point in the
transaction commit we should have gotten an error and stopped.

It turns out the abort came from finish ordered io while trying to write
out the free space cache.  It occurred to me that any failure inside of
finish_ordered_io isn't actually raised to the person doing the writing,
so we could have any number of failures in this path and think the
ordered extent completed successfully and the inode was fine.

Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and
marking the mapping of the inode with mapping_set_error, so any callers
that simply call fdatawait will also get the error.

With this we're seeing the IO error on the free space inode when we fail
to do the finish_ordered_io.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:27 +02:00
Josef Bacik
1d62b7ac83 btrfs: tree-checker: do not error out if extent ref hash doesn't match
commit 1119a72e223f3073a604f8fccb3a470ccd8a4416 upstream.

The tree checker checks the extent ref hash at read and write time to
make sure we do not corrupt the file system.  Generally extent
references go inline, but if we have enough of them we need to make an
item, which looks like

key.objectid	= <bytenr>
key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
key.offset	= hash(tree, owner, offset)

However if key.offset collide with an unrelated extent reference we'll
simply key.offset++ until we get something that doesn't collide.
Obviously this doesn't match at tree checker time, and thus we error
while writing out the transaction.  This is relatively easy to
reproduce, simply do something like the following

  xfs_io -f -c "pwrite 0 1M" file
  offset=2

  for i in {0..10000}
  do
	  xfs_io -c "reflink file 0 ${offset}M 1M" file
	  offset=$(( offset + 2 ))
  done

  xfs_io -c "reflink file 0 17999258914816 1M" file
  xfs_io -c "reflink file 0 35998517829632 1M" file
  xfs_io -c "reflink file 0 53752752058368 1M" file

  btrfs filesystem sync

And the sync will error out because we'll abort the transaction.  The
magic values above are used because they generate hash collisions with
the first file in the main subvol.

The fix for this is to remove the hash value check from tree checker, as
we have no idea which offset ours should belong to.

Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
Fixes: 0785a9aacf ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:12 +02:00
Greg Kroah-Hartman
c5d480cd47 Merge 5.10.42 into android12-5.10
Changes in 5.10.42
	ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
	ALSA: hda/realtek: Headphone volume is controlled by Front mixer
	ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
	ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
	ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
	ALSA: usb-audio: scarlett2: Improve driver startup messages
	cifs: set server->cipher_type to AES-128-CCM for SMB3.0
	NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
	iommu/vt-d: Fix sysfs leak in alloc_iommu()
	perf intel-pt: Fix sample instruction bytes
	perf intel-pt: Fix transaction abort handling
	perf scripts python: exported-sql-viewer.py: Fix copy to clipboard from Top Calls by elapsed Time report
	perf scripts python: exported-sql-viewer.py: Fix Array TypeError
	perf scripts python: exported-sql-viewer.py: Fix warning display
	proc: Check /proc/$pid/attr/ writes against file opener
	net: hso: fix control-request directions
	net/sched: fq_pie: re-factor fix for fq_pie endless loop
	net/sched: fq_pie: fix OOB access in the traffic path
	netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version
	mac80211: assure all fragments are encrypted
	mac80211: prevent mixed key and fragment cache attacks
	mac80211: properly handle A-MSDUs that start with an RFC 1042 header
	cfg80211: mitigate A-MSDU aggregation attacks
	mac80211: drop A-MSDUs on old ciphers
	mac80211: add fragment cache to sta_info
	mac80211: check defrag PN against current frame
	mac80211: prevent attacks on TKIP/WEP as well
	mac80211: do not accept/forward invalid EAPOL frames
	mac80211: extend protection against mixed key and fragment cache attacks
	ath10k: add CCMP PN replay protection for fragmented frames for PCIe
	ath10k: drop fragments with multicast DA for PCIe
	ath10k: drop fragments with multicast DA for SDIO
	ath10k: drop MPDU which has discard flag set by firmware for SDIO
	ath10k: Fix TKIP Michael MIC verification for PCIe
	ath10k: Validate first subframe of A-MSDU before processing the list
	ath11k: Clear the fragment cache during key install
	dm snapshot: properly fix a crash when an origin has no snapshots
	drm/amd/pm: correct MGpuFanBoost setting
	drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
	drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
	drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
	drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
	selftests/gpio: Use TEST_GEN_PROGS_EXTENDED
	selftests/gpio: Move include of lib.mk up
	selftests/gpio: Fix build when source tree is read only
	kgdb: fix gcc-11 warnings harder
	Documentation: seccomp: Fix user notification documentation
	seccomp: Refactor notification handler to prepare for new semantics
	serial: core: fix suspicious security_locked_down() call
	misc/uss720: fix memory leak in uss720_probe
	thunderbolt: usb4: Fix NVM read buffer bounds and offset issue
	thunderbolt: dma_port: Fix NVM read buffer bounds and offset issue
	KVM: X86: Fix vCPU preempted state from guest's point of view
	KVM: arm64: Prevent mixed-width VM creation
	mei: request autosuspend after sending rx flow control
	staging: iio: cdc: ad7746: avoid overwrite of num_channels
	iio: gyro: fxas21002c: balance runtime power in error path
	iio: dac: ad5770r: Put fwnode in error case during ->probe()
	iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()
	iio: adc: ad7124: Fix missbalanced regulator enable / disable on error.
	iio: adc: ad7124: Fix potential overflow due to non sequential channel numbers
	iio: adc: ad7923: Fix undersized rx buffer.
	iio: adc: ad7793: Add missing error code in ad7793_setup()
	iio: adc: ad7192: Avoid disabling a clock that was never enabled.
	iio: adc: ad7192: handle regulator voltage error first
	serial: 8250: Add UART_BUG_TXRACE workaround for Aspeed VUART
	serial: 8250_dw: Add device HID for new AMD UART controller
	serial: 8250_pci: Add support for new HPE serial device
	serial: 8250_pci: handle FL_NOIRQ board flag
	USB: trancevibrator: fix control-request direction
	Revert "irqbypass: do not start cons/prod when failed connect"
	USB: usbfs: Don't WARN about excessively large memory allocations
	drivers: base: Fix device link removal
	serial: tegra: Fix a mask operation that is always true
	serial: sh-sci: Fix off-by-one error in FIFO threshold register setting
	serial: rp2: use 'request_firmware' instead of 'request_firmware_nowait'
	USB: serial: ti_usb_3410_5052: add startech.com device id
	USB: serial: option: add Telit LE910-S1 compositions 0x7010, 0x7011
	USB: serial: ftdi_sio: add IDs for IDS GmbH Products
	USB: serial: pl2303: add device id for ADLINK ND-6530 GC
	thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID
	usb: dwc3: gadget: Properly track pending and queued SG
	usb: gadget: udc: renesas_usb3: Fix a race in usb3_start_pipen()
	usb: typec: mux: Fix matching with typec_altmode_desc
	net: usb: fix memory leak in smsc75xx_bind
	Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails
	fs/nfs: Use fatal_signal_pending instead of signal_pending
	NFS: fix an incorrect limit in filelayout_decode_layout()
	NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
	NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
	NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
	drm/meson: fix shutdown crash when component not probed
	net/mlx5e: reset XPS on error flow if netdev isn't registered yet
	net/mlx5e: Fix multipath lag activation
	net/mlx5e: Fix error path of updating netdev queues
	{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table
	net/mlx5e: Fix nullptr in add_vlan_push_action()
	net/mlx5: Set reformat action when needed for termination rules
	net/mlx5e: Fix null deref accessing lag dev
	net/mlx4: Fix EEPROM dump support
	net/mlx5: Set term table as an unmanaged flow table
	SUNRPC in case of backlog, hand free slots directly to waiting task
	Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"
	tipc: wait and exit until all work queues are done
	tipc: skb_linearize the head skb when reassembling msgs
	spi: spi-fsl-dspi: Fix a resource leak in an error handling path
	netfilter: flowtable: Remove redundant hw refresh bit
	net: dsa: mt7530: fix VLAN traffic leaks
	net: dsa: fix a crash if ->get_sset_count() fails
	net: dsa: sja1105: update existing VLANs from the bridge VLAN list
	net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic
	net: dsa: sja1105: error out on unsupported PHY mode
	net: dsa: sja1105: add error handling in sja1105_setup()
	net: dsa: sja1105: call dsa_unregister_switch when allocating memory fails
	net: dsa: sja1105: fix VL lookup command packing for P/Q/R/S
	i2c: s3c2410: fix possible NULL pointer deref on read message after write
	i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
	i2c: i801: Don't generate an interrupt on bus reset
	i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
	afs: Fix the nlink handling of dir-over-dir rename
	perf jevents: Fix getting maximum number of fds
	nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
	mptcp: avoid error message on infinite mapping
	mptcp: drop unconditional pr_warn on bad opt
	mptcp: fix data stream corruption
	platform/x86: hp_accel: Avoid invoking _INI to speed up resume
	gpio: cadence: Add missing MODULE_DEVICE_TABLE
	Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions"
	Revert "media: usb: gspca: add a missed check for goto_low_power"
	Revert "ALSA: sb: fix a missing check of snd_ctl_add"
	Revert "serial: max310x: pass return value of spi_register_driver"
	serial: max310x: unregister uart driver in case of failure and abort
	Revert "net: fujitsu: fix a potential NULL pointer dereference"
	net: fujitsu: fix potential null-ptr-deref
	Revert "net/smc: fix a NULL pointer dereference"
	net/smc: properly handle workqueue allocation failure
	Revert "net: caif: replace BUG_ON with recovery code"
	net: caif: remove BUG_ON(dev == NULL) in caif_xmit
	Revert "char: hpet: fix a missing check of ioremap"
	char: hpet: add checks after calling ioremap
	Revert "ALSA: gus: add a check of the status of snd_ctl_add"
	Revert "ALSA: usx2y: Fix potential NULL pointer dereference"
	Revert "isdn: mISDNinfineon: fix potential NULL pointer dereference"
	isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io
	Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()"
	ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()
	Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc"
	isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info
	Revert "dmaengine: qcom_hidma: Check for driver register failure"
	dmaengine: qcom_hidma: comment platform_driver_register call
	Revert "libertas: add checks for the return value of sysfs_create_group"
	libertas: register sysfs groups properly
	Revert "ASoC: cs43130: fix a NULL pointer dereference"
	ASoC: cs43130: handle errors in cs43130_probe() properly
	Revert "media: dvb: Add check on sp8870_readreg"
	media: dvb: Add check on sp8870_readreg return
	Revert "media: gspca: mt9m111: Check write_bridge for timeout"
	media: gspca: mt9m111: Check write_bridge for timeout
	Revert "media: gspca: Check the return value of write_bridge for timeout"
	media: gspca: properly check for errors in po1030_probe()
	Revert "net: liquidio: fix a NULL pointer dereference"
	net: liquidio: Add missing null pointer checks
	Revert "brcmfmac: add a check for the status of usb_register"
	brcmfmac: properly check for bus register errors
	btrfs: return whole extents in fiemap
	scsi: ufs: ufs-mediatek: Fix power down spec violation
	scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic
	openrisc: Define memory barrier mb
	scsi: pm80xx: Fix drives missing during rmmod/insmod loop
	btrfs: release path before starting transaction when cloning inline extent
	btrfs: do not BUG_ON in link_to_fixup_dir
	platform/x86: hp-wireless: add AMD's hardware id to the supported list
	platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI
	platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet
	SMB3: incorrect file id in requests compounded with open
	drm/amd/display: Disconnect non-DP with no EDID
	drm/amd/amdgpu: fix refcount leak
	drm/amdgpu: Fix a use-after-free
	drm/amd/amdgpu: fix a potential deadlock in gpu reset
	drm/amdgpu: stop touching sched.ready in the backend
	platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
	block: fix a race between del_gendisk and BLKRRPART
	linux/bits.h: fix compilation error with GENMASK
	net: netcp: Fix an error message
	net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
	interconnect: qcom: bcm-voter: add a missing of_node_put()
	interconnect: qcom: Add missing MODULE_DEVICE_TABLE
	ASoC: cs42l42: Regmap must use_single_read/write
	net: stmmac: Fix MAC WoL not working if PHY does not support WoL
	net: ipa: memory region array is variable size
	vfio-ccw: Check initialized flag in cp_init()
	spi: Assume GPIO CS active high in ACPI case
	net: really orphan skbs tied to closing sk
	net: packetmmap: fix only tx timestamp on request
	net: fec: fix the potential memory leak in fec_enet_init()
	chelsio/chtls: unlock on error in chtls_pt_recvmsg()
	net: mdio: thunder: Fix a double free issue in the .remove function
	net: mdio: octeon: Fix some double free issues
	cxgb4/ch_ktls: Clear resources when pf4 device is removed
	openvswitch: meter: fix race when getting now_ms.
	tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
	net: sched: fix packet stuck problem for lockless qdisc
	net: sched: fix tx action rescheduling issue during deactivation
	net: sched: fix tx action reschedule issue with stopped queue
	net: hso: check for allocation failure in hso_create_bulk_serial_device()
	net: bnx2: Fix error return code in bnx2_init_board()
	bnxt_en: Include new P5 HV definition in VF check.
	bnxt_en: Fix context memory setup for 64K page size.
	mld: fix panic in mld_newpack()
	net/smc: remove device from smcd_dev_list after failed device_add()
	gve: Check TX QPL was actually assigned
	gve: Update mgmt_msix_idx if num_ntfy changes
	gve: Add NULL pointer checks when freeing irqs.
	gve: Upgrade memory barrier in poll routine
	gve: Correct SKB queue index validation.
	iommu/virtio: Add missing MODULE_DEVICE_TABLE
	net: hns3: fix incorrect resp_msg issue
	net: hns3: put off calling register_netdev() until client initialize complete
	iommu/vt-d: Use user privilege for RID2PASID translation
	cxgb4: avoid accessing registers when clearing filters
	staging: emxx_udc: fix loop in _nbu2ss_nuke()
	ASoC: cs35l33: fix an error code in probe()
	bpf, offload: Reorder offload callback 'prepare' in verifier
	bpf: Set mac_len in bpf_skb_change_head
	ixgbe: fix large MTU request from VF
	ASoC: qcom: lpass-cpu: Use optional clk APIs
	scsi: libsas: Use _safe() loop in sas_resume_port()
	net: lantiq: fix memory corruption in RX ring
	ipv6: record frag_max_size in atomic fragments in input path
	ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
	net: ethernet: mtk_eth_soc: Fix packet statistics support for MT7628/88
	sch_dsmark: fix a NULL deref in qdisc_reset()
	net: hsr: fix mac_len checks
	MIPS: alchemy: xxs1500: add gpio-au1000.h header file
	MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
	net: zero-initialize tc skb extension on allocation
	net: mvpp2: add buffer header handling in RX
	i915: fix build warning in intel_dp_get_link_status()
	samples/bpf: Consider frame size in tx_only of xdpsock sample
	net: hns3: check the return of skb_checksum_help()
	bpftool: Add sock_release help info for cgroup attach/prog load command
	SUNRPC: More fixes for backlog congestion
	Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference""
	net: hso: bail out on interrupt URB allocation failure
	scripts/clang-tools: switch explicitly to Python 3
	neighbour: Prevent Race condition in neighbour subsytem
	usb: core: reduce power-on-good delay time of root hub
	Linux 5.10.42

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I05d98d1355a080e0951b4b2ae77f0a9ccb6dfc5d
2021-06-03 18:47:38 +02:00
Josef Bacik
7e13db5039 btrfs: do not BUG_ON in link_to_fixup_dir
[ Upstream commit 91df99a6eb50d5a1bc70fff4a09a0b7ae6aab96d ]

While doing error injection testing I got the following panic

  kernel BUG at fs/btrfs/tree-log.c:1862!
  invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 1 PID: 7836 Comm: mount Not tainted 5.13.0-rc1+ #305
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  RIP: 0010:link_to_fixup_dir+0xd5/0xe0
  RSP: 0018:ffffb5800180fa30 EFLAGS: 00010216
  RAX: fffffffffffffffb RBX: 00000000fffffffb RCX: ffff8f595287faf0
  RDX: ffffb5800180fa37 RSI: ffff8f5954978800 RDI: 0000000000000000
  RBP: ffff8f5953af9450 R08: 0000000000000019 R09: 0000000000000001
  R10: 000151f408682970 R11: 0000000120021001 R12: ffff8f5954978800
  R13: ffff8f595287faf0 R14: ffff8f5953c77dd0 R15: 0000000000000065
  FS:  00007fc5284c8c40(0000) GS:ffff8f59bbd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc5287f47c0 CR3: 000000011275e002 CR4: 0000000000370ee0
  Call Trace:
   replay_one_buffer+0x409/0x470
   ? btree_read_extent_buffer_pages+0xd0/0x110
   walk_up_log_tree+0x157/0x1e0
   walk_log_tree+0xa6/0x1d0
   btrfs_recover_log_trees+0x1da/0x360
   ? replay_one_extent+0x7b0/0x7b0
   open_ctree+0x1486/0x1720
   btrfs_mount_root.cold+0x12/0xea
   ? __kmalloc_track_caller+0x12f/0x240
   legacy_get_tree+0x24/0x40
   vfs_get_tree+0x22/0xb0
   vfs_kern_mount.part.0+0x71/0xb0
   btrfs_mount+0x10d/0x380
   ? vfs_parse_fs_string+0x4d/0x90
   legacy_get_tree+0x24/0x40
   vfs_get_tree+0x22/0xb0
   path_mount+0x433/0xa10
   __x64_sys_mount+0xe3/0x120
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

We can get -EIO or any number of legitimate errors from
btrfs_search_slot(), panicing here is not the appropriate response.  The
error path for this code handles errors properly, simply return the
error.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:44 +02:00
Filipe Manana
88f566beb1 btrfs: release path before starting transaction when cloning inline extent
[ Upstream commit 6416954ca75baed71640bf3828625bf165fb9b5e ]

When cloning an inline extent there are a few cases, such as when we have
an implicit hole at file offset 0, where we start a transaction while
holding a read lock on a leaf. Starting the transaction results in a call
to sb_start_intwrite(), which results in doing a read lock on a percpu
semaphore. Lockdep doesn't like this and complains about it:

  [46.580704] ======================================================
  [46.580752] WARNING: possible circular locking dependency detected
  [46.580799] 5.13.0-rc1 #28 Not tainted
  [46.580832] ------------------------------------------------------
  [46.580877] cloner/3835 is trying to acquire lock:
  [46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0
  [46.581167]
  [46.581167] but task is already holding lock:
  [46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0
  [46.581293]
  [46.581293] which lock already depends on the new lock.
  [46.581293]
  [46.581351]
  [46.581351] the existing dependency chain (in reverse order) is:
  [46.581410]
  [46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}:
  [46.581464]        down_read_nested+0x68/0x200
  [46.581536]        __btrfs_tree_read_lock+0x70/0x1d0
  [46.581577]        btrfs_read_lock_root_node+0x88/0x200
  [46.581623]        btrfs_search_slot+0x298/0xb70
  [46.581665]        btrfs_set_inode_index+0xfc/0x260
  [46.581708]        btrfs_new_inode+0x26c/0x950
  [46.581749]        btrfs_create+0xf4/0x2b0
  [46.581782]        lookup_open.isra.57+0x55c/0x6a0
  [46.581855]        path_openat+0x418/0xd20
  [46.581888]        do_filp_open+0x9c/0x130
  [46.581920]        do_sys_openat2+0x2ec/0x430
  [46.581961]        do_sys_open+0x90/0xc0
  [46.581993]        system_call_exception+0x3d4/0x410
  [46.582037]        system_call_common+0xec/0x278
  [46.582078]
  [46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}:
  [46.582135]        __lock_acquire+0x1e90/0x2c50
  [46.582176]        lock_acquire+0x2b4/0x5b0
  [46.582263]        start_transaction+0x3cc/0x950
  [46.582308]        clone_copy_inline_extent+0xe4/0x5a0
  [46.582353]        btrfs_clone+0x5fc/0x880
  [46.582388]        btrfs_clone_files+0xd8/0x1c0
  [46.582434]        btrfs_remap_file_range+0x3d8/0x590
  [46.582481]        do_clone_file_range+0x10c/0x270
  [46.582558]        vfs_clone_file_range+0x1b0/0x310
  [46.582605]        ioctl_file_clone+0x90/0x130
  [46.582651]        do_vfs_ioctl+0x874/0x1ac0
  [46.582697]        sys_ioctl+0x6c/0x120
  [46.582733]        system_call_exception+0x3d4/0x410
  [46.582777]        system_call_common+0xec/0x278
  [46.582822]
  [46.582822] other info that might help us debug this:
  [46.582822]
  [46.582888]  Possible unsafe locking scenario:
  [46.582888]
  [46.582942]        CPU0                    CPU1
  [46.582984]        ----                    ----
  [46.583028]   lock(btrfs-tree-00);
  [46.583062]                                lock(sb_internal#2);
  [46.583119]                                lock(btrfs-tree-00);
  [46.583174]   lock(sb_internal#2);
  [46.583212]
  [46.583212]  *** DEADLOCK ***
  [46.583212]
  [46.583266] 6 locks held by cloner/3835:
  [46.583299]  #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130
  [46.583382]  #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0
  [46.583477]  #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0
  [46.583574]  #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590
  [46.583657]  #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590
  [46.583743]  #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0
  [46.583828]
  [46.583828] stack backtrace:
  [46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28
  [46.583931] Call Trace:
  [46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable)
  [46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400
  [46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190
  [46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50
  [46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0
  [46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950
  [46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0
  [46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880
  [46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0
  [46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590
  [46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270
  [46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310
  [46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130
  [46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0
  [46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120
  [46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410
  [46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278
  [46.585114] --- interrupt: c00 at 0x7ffff7e22990
  [46.585160] NIP:  00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000
  [46.585224] REGS: c0000000167c7e80 TRAP: 0c00   Not tainted  (5.13.0-rc1)
  [46.585280] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28000244  XER: 00000000
  [46.585374] IRQMASK: 0
  [46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004
  [46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000
  [46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000
  [46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000
  [46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  [46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000
  [46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004
  [46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f
  [46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990
  [46.585964] LR [00000001000010ec] 0x1000010ec
  [46.586010] --- interrupt: c00

This should be a false positive, as both locks are acquired in read mode.
Nevertheless, we don't need to hold a leaf locked when we start the
transaction, so just release the leaf (path) before starting it.

Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:44 +02:00
Boris Burkov
c7e0c6047c btrfs: return whole extents in fiemap
[ Upstream commit 15c7745c9a0078edad1f7df5a6bb7b80bc8cca23 ]

  `xfs_io -c 'fiemap <off> <len>' <file>`

can give surprising results on btrfs that differ from xfs.

btrfs prints out extents trimmed to fit the user input. If the user's
fiemap request has an offset, then rather than returning each whole
extent which intersects that range, we also trim the start extent to not
have start < off.

Documentation in filesystems/fiemap.txt and the xfs_io man page suggests
that returning the whole extent is expected.

Some cases which all yield the same fiemap in xfs, but not btrfs:
  dd if=/dev/zero of=$f bs=4k count=1
  sudo xfs_io -c 'fiemap 0 1024' $f
    0: [0..7]: 26624..26631
  sudo xfs_io -c 'fiemap 2048 1024' $f
    0: [4..7]: 26628..26631
  sudo xfs_io -c 'fiemap 2048 4096' $f
    0: [4..7]: 26628..26631
  sudo xfs_io -c 'fiemap 3584 512' $f
    0: [7..7]: 26631..26631
  sudo xfs_io -c 'fiemap 4091 5' $f
    0: [7..6]: 26631..26630

I believe this is a consequence of the logic for merging contiguous
extents represented by separate extent items. That logic needs to track
the last offset as it loops through the extent items, which happens to
pick up the start offset on the first iteration, and trim off the
beginning of the full extent. To fix it, start `off` at 0 rather than
`start` so that we keep the iteration/merging intact without cutting off
the start of the extent.

after the fix, all the above commands give:

  0: [0..7]: 26624..26631

The merging logic is exercised by fstest generic/483, and I have written
a new fstest for checking we don't have backwards or zero-length fiemaps
for cases like those above.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:43 +02:00
Greg Kroah-Hartman
4968ab31d1 Merge 5.10.40 into android12-5.10
Changes in 5.10.40
	firmware: arm_scpi: Prevent the ternary sign expansion bug
	openrisc: Fix a memory leak
	tee: amdtee: unload TA only when its refcount becomes 0
	RDMA/siw: Properly check send and receive CQ pointers
	RDMA/siw: Release xarray entry
	RDMA/core: Prevent divide-by-zero error triggered by the user
	RDMA/rxe: Clear all QP fields if creation failed
	scsi: ufs: core: Increase the usable queue depth
	scsi: qedf: Add pointer checks in qedf_update_link_speed()
	scsi: qla2xxx: Fix error return code in qla82xx_write_flash_dword()
	RDMA/mlx5: Recover from fatal event in dual port mode
	RDMA/core: Don't access cm_id after its destruction
	nvmet: remove unused ctrl->cqs
	nvmet: fix memory leak in nvmet_alloc_ctrl()
	nvme-loop: fix memory leak in nvme_loop_create_ctrl()
	nvme-tcp: rerun io_work if req_list is not empty
	nvme-fc: clear q_live at beginning of association teardown
	platform/mellanox: mlxbf-tmfifo: Fix a memory barrier issue
	platform/x86: intel_int0002_vgpio: Only call enable_irq_wake() when using s2idle
	platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios
	RDMA/mlx5: Fix query DCT via DEVX
	RDMA/uverbs: Fix a NULL vs IS_ERR() bug
	tools/testing/selftests/exec: fix link error
	powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
	ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
	nvmet: seset ns->file when open fails
	perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
	locking/lockdep: Correct calling tracepoints
	locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
	powerpc: Fix early setup to make early_ioremap() work
	btrfs: avoid RCU stalls while running delayed iputs
	cifs: fix memory leak in smb2_copychunk_range
	misc: eeprom: at24: check suspend status before disable regulator
	ALSA: dice: fix stream format for TC Electronic Konnekt Live at high sampling transfer frequency
	ALSA: intel8x0: Don't update period unless prepared
	ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field
	ALSA: line6: Fix racy initialization of LINE6 MIDI
	ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26
	ALSA: firewire-lib: fix calculation for size of IR context payload
	ALSA: usb-audio: Validate MS endpoint descriptors
	ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro
	ALSA: hda: fixup headset for ASUS GU502 laptop
	Revert "ALSA: sb8: add a check for request_region"
	ALSA: firewire-lib: fix check for the size of isochronous packet payload
	ALSA: hda/realtek: reset eapd coeff to default value for alc287
	ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293
	ALSA: hda/realtek: Fix silent headphone output on ASUS UX430UA
	ALSA: hda/realtek: Add fixup for HP OMEN laptop
	ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx
	uio_hv_generic: Fix a memory leak in error handling paths
	Revert "rapidio: fix a NULL pointer dereference when create_workqueue() fails"
	rapidio: handle create_workqueue() failure
	Revert "serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference"
	nvme-tcp: fix possible use-after-completion
	x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
	x86/sev-es: Invalidate the GHCB after completing VMGEXIT
	x86/sev-es: Don't return NULL from sev_es_get_ghcb()
	x86/sev-es: Use __put_user()/__get_user() for data accesses
	x86/sev-es: Forward page-faults which happen during emulation
	drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
	drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
	drm/amdgpu: update gc golden setting for Navi12
	drm/amdgpu: update sdma golden setting for Navi12
	powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
	powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
	mmc: sdhci-pci-gli: increase 1.8V regulator wait
	xen-pciback: redo VF placement in the virtual topology
	xen-pciback: reconfigure also from backend watch handler
	ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
	dm snapshot: fix crash with transient storage and zero chunk size
	kcsan: Fix debugfs initcall return type
	Revert "video: hgafb: fix potential NULL pointer dereference"
	Revert "net: stmicro: fix a missing check of clk_prepare"
	Revert "leds: lp5523: fix a missing check of return value of lp55xx_read"
	Revert "hwmon: (lm80) fix a missing check of bus read in lm80 probe"
	Revert "video: imsttfb: fix potential NULL pointer dereferences"
	Revert "ecryptfs: replace BUG_ON with error handling code"
	Revert "scsi: ufs: fix a missing check of devm_reset_control_get"
	Revert "gdrom: fix a memory leak bug"
	cdrom: gdrom: deallocate struct gdrom_unit fields in remove_gdrom
	cdrom: gdrom: initialize global variable at init time
	Revert "media: rcar_drif: fix a memory disclosure"
	Revert "rtlwifi: fix a potential NULL pointer dereference"
	Revert "qlcnic: Avoid potential NULL pointer dereference"
	Revert "niu: fix missing checks of niu_pci_eeprom_read"
	ethernet: sun: niu: fix missing checks of niu_pci_eeprom_read()
	net: stmicro: handle clk_prepare() failure during init
	scsi: ufs: handle cleanup correctly on devm_reset_control_get error
	net: rtlwifi: properly check for alloc_workqueue() failure
	ics932s401: fix broken handling of errors when word reading fails
	leds: lp5523: check return value of lp5xx_read and jump to cleanup code
	qlcnic: Add null check after calling netdev_alloc_skb
	video: hgafb: fix potential NULL pointer dereference
	vgacon: Record video mode changes with VT_RESIZEX
	vt_ioctl: Revert VT_RESIZEX parameter handling removal
	vt: Fix character height handling with VT_RESIZEX
	tty: vt: always invoke vc->vc_sw->con_resize callback
	drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7
	openrisc: mm/init.c: remove unused memblock_region variable in map_ram()
	x86/Xen: swap NX determination and GDT setup on BSP
	nvme-multipath: fix double initialization of ANA state
	rtc: pcf85063: fallback to parent of_node
	x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path
	nvmet: use new ana_log_size instead the old one
	video: hgafb: correctly handle card detect failure during probe
	Bluetooth: SMP: Fail if remote and local public keys are identical
	Linux 5.10.40

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4523cf43d1da6bea507e4027bd83bc491a574f41
2021-05-27 08:36:46 +02:00
Josef Bacik
56001dda03 btrfs: avoid RCU stalls while running delayed iputs
commit 71795ee590111e3636cc3c148289dfa9fa0a5fc3 upstream.

Generally a delayed iput is added when we might do the final iput, so
usually we'll end up sleeping while processing the delayed iputs
naturally.  However there's no guarantee of this, especially for small
files.  In production we noticed 5 instances of RCU stalls while testing
a kernel release overnight across 1000 machines, so this is relatively
common:

  host count: 5
  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: ....: (20998 ticks this GP) idle=59e/1/0x4000000000000002 softirq=12333372/12333372 fqs=3208
   	(t=21031 jiffies g=27810193 q=41075) NMI backtrace for cpu 1
  CPU: 1 PID: 1713 Comm: btrfs-cleaner Kdump: loaded Not tainted 5.6.13-0_fbk12_rc1_5520_gec92bffc1ec9 #1
  Call Trace:
    <IRQ> dump_stack+0x50/0x70
    nmi_cpu_backtrace.cold.6+0x30/0x65
    ? lapic_can_unplug_cpu.cold.30+0x40/0x40
    nmi_trigger_cpumask_backtrace+0xba/0xca
    rcu_dump_cpu_stacks+0x99/0xc7
    rcu_sched_clock_irq.cold.90+0x1b2/0x3a3
    ? trigger_load_balance+0x5c/0x200
    ? tick_sched_do_timer+0x60/0x60
    ? tick_sched_do_timer+0x60/0x60
    update_process_times+0x24/0x50
    tick_sched_timer+0x37/0x70
    __hrtimer_run_queues+0xfe/0x270
    hrtimer_interrupt+0xf4/0x210
    smp_apic_timer_interrupt+0x5e/0x120
    apic_timer_interrupt+0xf/0x20 </IRQ>
   RIP: 0010:queued_spin_lock_slowpath+0x17d/0x1b0
   RSP: 0018:ffffc9000da5fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
   RAX: 0000000000000000 RBX: ffff889fa81d0cd8 RCX: 0000000000000029
   RDX: ffff889fff86c0c0 RSI: 0000000000080000 RDI: ffff88bfc2da7200
   RBP: ffff888f2dcdd768 R08: 0000000001040000 R09: 0000000000000000
   R10: 0000000000000001 R11: ffffffff82a55560 R12: ffff88bfc2da7200
   R13: 0000000000000000 R14: ffff88bff6c2a360 R15: ffffffff814bd870
   ? kzalloc.constprop.57+0x30/0x30
   list_lru_add+0x5a/0x100
   inode_lru_list_add+0x20/0x40
   iput+0x1c1/0x1f0
   run_delayed_iput_locked+0x46/0x90
   btrfs_run_delayed_iputs+0x3f/0x60
   cleaner_kthread+0xf2/0x120
   kthread+0x10b/0x130

Fix this by adding a cond_resched_lock() to the loop processing delayed
iputs so we can avoid these sort of stalls.

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26 12:06:50 +02:00
Greg Kroah-Hartman
76002c201f Merge 5.10.38 into android12-5.10
Changes in 5.10.38
	KEYS: trusted: Fix memory leak on object td
	tpm: fix error return code in tpm2_get_cc_attrs_tbl()
	tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt()
	tpm, tpm_tis: Reserve locality in tpm_tis_resume()
	KVM: x86/mmu: Remove the defunct update_pte() paging hook
	KVM/VMX: Invoke NMI non-IST entry instead of IST entry
	ACPI: PM: Add ACPI ID of Alder Lake Fan
	PM: runtime: Fix unpaired parent child_count for force_resume
	cpufreq: intel_pstate: Use HWP if enabled by platform firmware
	kvm: Cap halt polling at kvm->max_halt_poll_ns
	ath11k: fix thermal temperature read
	fs: dlm: fix debugfs dump
	fs: dlm: add errno handling to check callback
	fs: dlm: check on minimum msglen size
	fs: dlm: flush swork on shutdown
	tipc: convert dest node's address to network order
	ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
	net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
	net: stmmac: Set FIFO sizes for ipq806x
	ASoC: rsnd: core: Check convert rate in rsnd_hw_params
	Bluetooth: Fix incorrect status handling in LE PHY UPDATE event
	i2c: bail out early when RDWR parameters are wrong
	ALSA: hdsp: don't disable if not enabled
	ALSA: hdspm: don't disable if not enabled
	ALSA: rme9652: don't disable if not enabled
	ALSA: bebob: enable to deliver MIDI messages for multiple ports
	Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
	Bluetooth: initialize skb_queue_head at l2cap_chan_create()
	net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports
	net: bridge: when suppression is enabled exclude RARP packets
	Bluetooth: check for zapped sk before connecting
	selftests/powerpc: Fix L1D flushing tests for Power10
	powerpc/32: Statically initialise first emergency context
	net: hns3: remediate a potential overflow risk of bd_num_list
	net: hns3: add handling for xmit skb with recursive fraglist
	ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
	ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
	ice: handle increasing Tx or Rx ring sizes
	Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip.
	ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055
	i2c: Add I2C_AQ_NO_REP_START adapter quirk
	MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED
	coresight: Do not scan for graph if none is present
	IB/hfi1: Correct oversized ring allocation
	mac80211: clear the beacon's CRC after channel switch
	pinctrl: samsung: use 'int' for register masks in Exynos
	rtw88: 8822c: add LC calibration for RTL8822C
	mt76: mt7615: support loading EEPROM for MT7613BE
	mt76: mt76x0: disable GTK offloading
	mt76: mt7915: fix txpower init for TSSI off chips
	fuse: invalidate attrs when page writeback completes
	virtiofs: fix userns
	cuse: prevent clone
	iwlwifi: pcie: make cfg vs. trans_cfg more robust
	powerpc/mm: Add cond_resched() while removing hpte mappings
	ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init()
	Revert "iommu/amd: Fix performance counter initialization"
	iommu/amd: Remove performance counter pre-initialization test
	drm/amd/display: Force vsync flip when reconfiguring MPCC
	selftests: Set CC to clang in lib.mk if LLVM is set
	kconfig: nconf: stop endless search loops
	ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740
	ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp
	ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
	sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
	flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
	powerpc/smp: Set numa node before updating mask
	ASoC: rt286: Generalize support for ALC3263 codec
	ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
	net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule
	samples/bpf: Fix broken tracex1 due to kprobe argument change
	powerpc/pseries: Stop calling printk in rtas_stop_self()
	drm/amd/display: fixed divide by zero kernel crash during dsc enablement
	drm/amd/display: add handling for hdcp2 rx id list validation
	drm/amdgpu: Add mem sync flag for IB allocated by SA
	mt76: mt7615: fix entering driver-own state on mt7663
	crypto: ccp: Free SEV device if SEV init fails
	wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
	wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
	qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth
	powerpc/iommu: Annotate nested lock for lockdep
	iavf: remove duplicate free resources calls
	net: ethernet: mtk_eth_soc: fix RX VLAN offload
	selftests: mlxsw: Increase the tolerance of backlog buildup
	selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
	kbuild: generate Module.symvers only when vmlinux exists
	bnxt_en: Add PCI IDs for Hyper-V VF devices.
	ia64: module: fix symbolizer crash on fdescr
	watchdog: rename __touch_watchdog() to a better descriptive name
	watchdog: explicitly update timestamp when reporting softlockup
	watchdog/softlockup: remove logic that tried to prevent repeated reports
	watchdog: fix barriers when printing backtraces from all CPUs
	ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
	thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params()
	f2fs: move ioctl interface definitions to separated file
	f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE
	f2fs: fix to allow migrating fully valid segment
	f2fs: fix panic during f2fs_resize_fs()
	f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
	remoteproc: qcom_q6v5_mss: Replace ioremap with memremap
	remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader
	PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
	PCI: Release OF node in pci_scan_device()'s error path
	ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
	f2fs: fix to align to section for fallocate() on pinned file
	f2fs: fix to update last i_size if fallocate partially succeeds
	PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR
	PCI: endpoint: Add helper API to get the 'next' unreserved BAR
	PCI: endpoint: Make *_free_bar() to return error codes on failure
	PCI: endpoint: Fix NULL pointer dereference for ->get_features()
	f2fs: fix to avoid touching checkpointed data in get_victim()
	f2fs: fix to cover __allocate_new_section() with curseg_lock
	f2fs: Fix a hungtask problem in atomic write
	f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
	rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
	NFS: nfs4_bitmask_adjust() must not change the server global bitmasks
	NFS: Fix attribute bitmask in _nfs42_proc_fallocate()
	NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
	NFS: Deal correctly with attribute generation counter overflow
	PCI: endpoint: Fix missing destroy_workqueue()
	pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
	NFSv4.2 fix handling of sr_eof in SEEK's reply
	SUNRPC: Move fault injection call sites
	SUNRPC: Remove trace_xprt_transmit_queued
	SUNRPC: Handle major timeout in xprt_adjust_timeout()
	thermal/drivers/tsens: Fix missing put_device error
	NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
	nfsd: ensure new clients break delegations
	rtc: fsl-ftm-alarm: add MODULE_TABLE()
	dmaengine: idxd: Fix potential null dereference on pointer status
	dmaengine: idxd: fix dma device lifetime
	dmaengine: idxd: fix cdev setup and free device lifetime issues
	SUNRPC: fix ternary sign expansion bug in tracing
	pwm: atmel: Fix duty cycle calculation in .get_state()
	xprtrdma: Avoid Receive Queue wrapping
	xprtrdma: Fix cwnd update ordering
	xprtrdma: rpcrdma_mr_pop() already does list_del_init()
	swiotlb: Fix the type of index
	ceph: fix inode leak on getattr error in __fh_to_dentry
	scsi: qla2xxx: Prevent PRLI in target mode
	scsi: ufs: core: Do not put UFS power into LPM if link is broken
	scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
	scsi: ufs: core: Narrow down fast path in system suspend path
	rtc: ds1307: Fix wday settings for rx8130
	net: hns3: fix incorrect configuration for igu_egu_hw_err
	net: hns3: initialize the message content in hclge_get_link_mode()
	net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()
	net: hns3: fix for vxlan gpe tx checksum bug
	net: hns3: use netif_tx_disable to stop the transmit queue
	net: hns3: disable phy loopback setting in hclge_mac_start_phy
	sctp: do asoc update earlier in sctp_sf_do_dupcook_a
	RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
	sunrpc: Fix misplaced barrier in call_decode
	libbpf: Fix signed overflow in ringbuf_process_ring
	block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
	block/rnbd-clt: Check the return value of the function rtrs_clt_query
	ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
	sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
	netfilter: xt_SECMARK: add new revision to fix structure layout
	xsk: Fix for xp_aligned_validate_desc() when len == chunk_size
	net: stmmac: Clear receive all(RA) bit when promiscuous mode is off
	drm/radeon: Fix off-by-one power_state index heap overwrite
	drm/radeon: Avoid power table parsing memory leaks
	arm64: entry: factor irq triage logic into macros
	arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
	khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
	mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
	mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
	ksm: fix potential missing rmap_item for stable_node
	mm/gup: check every subpage of a compound page during isolation
	mm/gup: return an error on migration failure
	mm/gup: check for isolation errors
	ethtool: fix missing NLM_F_MULTI flag when dumping
	net: fix nla_strcmp to handle more then one trailing null character
	smc: disallow TCP_ULP in smc_setsockopt()
	netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
	netfilter: nftables: Fix a memleak from userdata error path in new objects
	can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
	can: mcp251x: fix resume from sleep before interface was brought up
	can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
	sched: Fix out-of-bound access in uclamp
	sched/fair: Fix unfairness caused by missing load decay
	fs/proc/generic.c: fix incorrect pde_is_permanent check
	kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
	kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
	kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
	netfilter: nftables: avoid overflows in nft_hash_buckets()
	i40e: fix broken XDP support
	i40e: Fix use-after-free in i40e_client_subtask()
	i40e: fix the restart auto-negotiation after FEC modified
	i40e: Fix PHY type identifiers for 2.5G and 5G adapters
	mptcp: fix splat when closing unaccepted socket
	f2fs: avoid unneeded data copy in f2fs_ioc_move_range()
	ARC: entry: fix off-by-one error in syscall number validation
	ARC: mm: PAE: use 40-bit physical page mask
	ARC: mm: Use max_high_pfn as a HIGHMEM zone border
	powerpc/64s: Fix crashes when toggling stf barrier
	powerpc/64s: Fix crashes when toggling entry flush barrier
	hfsplus: prevent corruption in shrinking truncate
	squashfs: fix divide error in calculate_skip()
	userfaultfd: release page in error path to avoid BUG_ON
	kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
	mm/hugetlb: fix F_SEAL_FUTURE_WRITE
	blk-iocost: fix weight updates of inner active iocgs
	arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
	arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
	btrfs: fix race leading to unpersisted data and metadata on fsync
	drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
	drm/amd/display: Initialize attribute for hdcp_srm sysfs file
	drm/i915: Avoid div-by-zero on gen2
	kvm: exit halt polling on need_resched() as well
	KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer
	drm/msm/dp: initialize audio_comp when audio starts
	KVM: x86: Cancel pvclock_gtod_work on module removal
	KVM: x86: Prevent deadlock against tk_core.seq
	dax: Add an enum for specifying dax wakup mode
	dax: Add a wakeup mode parameter to put_unlocked_entry()
	dax: Wake up all waiters after invalidating dax entry
	xen/unpopulated-alloc: consolidate pgmap manipulation
	xen/unpopulated-alloc: fix error return code in fill_list()
	perf tools: Fix dynamic libbpf link
	usb: dwc3: gadget: Free gadget structure only after freeing endpoints
	iio: light: gp2ap002: Fix rumtime PM imbalance on error
	iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
	iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
	usb: fotg210-hcd: Fix an error message
	hwmon: (occ) Fix poll rate limiting
	usb: musb: Fix an error message
	ACPI: scan: Fix a memory leak in an error handling path
	kyber: fix out of bounds access when preempted
	nvmet: add lba to sect conversion helpers
	nvmet: fix inline bio check for bdev-ns
	nvmet-rdma: Fix NULL deref when SEND is completed with error
	f2fs: compress: fix to free compress page correctly
	f2fs: compress: fix race condition of overwrite vs truncate
	f2fs: compress: fix to assign cc.cluster_idx correctly
	nbd: Fix NULL pointer in flush_workqueue
	blk-mq: plug request for shared sbitmap
	blk-mq: Swap two calls in blk_mq_exit_queue()
	usb: dwc3: omap: improve extcon initialization
	usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
	usb: xhci: Increase timeout for HC halt
	usb: dwc2: Fix gadget DMA unmap direction
	usb: core: hub: fix race condition about TRSMRCY of resume
	usb: dwc3: gadget: Enable suspend events
	usb: dwc3: gadget: Return success always for kick transfer in ep queue
	usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
	usb: typec: ucsi: Put fwnode in any case during ->probe()
	xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
	xhci: Do not use GFP_KERNEL in (potentially) atomic context
	xhci: Add reset resume quirk for AMD xhci controller.
	iio: gyro: mpu3050: Fix reported temperature value
	iio: tsl2583: Fix division by a zero lux_val
	cdc-wdm: untangle a circular dependency between callback and softint
	xen/gntdev: fix gntdev_mmap() error exit path
	KVM: x86: Emulate RDPID only if RDTSCP is supported
	KVM: x86: Move RDPID emulation intercept to its own enum
	KVM: nVMX: Always make an attempt to map eVMCS after migration
	KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported
	KVM: VMX: Disable preemption when probing user return MSRs
	Revert "iommu/vt-d: Remove WO permissions on second-level paging entries"
	Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL"
	iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
	iommu/vt-d: Remove WO permissions on second-level paging entries
	mm: fix struct page layout on 32-bit systems
	MIPS: Reinstate platform `__div64_32' handler
	MIPS: Avoid DIVU in `__div64_32' is result would be zero
	MIPS: Avoid handcoded DIVU in `__div64_32' altogether
	clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
	clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
	ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
	ARM: 9012/1: move device tree mapping out of linear region
	ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
	ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section
	usb: typec: tcpm: Fix error while calculating PPS out values
	kobject_uevent: remove warning in init_uevent_argv()
	drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
	drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
	drm/i915/overlay: Fix active retire callback alignment
	drm/i915: Fix crash in auto_retire
	clk: exynos7: Mark aclk_fsys1_200 as critical
	media: rkvdec: Remove of_match_ptr()
	i2c: mediatek: Fix send master code at more than 1MHz
	dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
	dt-bindings: serial: 8250: Remove duplicated compatible strings
	debugfs: Make debugfs_allow RO after init
	ext4: fix debug format string warning
	nvme: do not try to reconfigure APST when the controller is not live
	ASoC: rsnd: check all BUSIF status when error
	Linux 5.10.38

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia32e01283b488a38be48015c58a0e481f09aaf65
2021-05-20 15:35:25 +02:00
Filipe Manana
bccb7dd137 btrfs: fix race leading to unpersisted data and metadata on fsync
commit 626e9f41f7c281ba3e02843702f68471706aa6d9 upstream.

When doing a fast fsync on a file, there is a race which can result in the
fsync returning success to user space without logging the inode and without
durably persisting new data.

The following example shows one possible scenario for this:

   $ mkfs.btrfs -f /dev/sdc
   $ mount /dev/sdc /mnt

   $ touch /mnt/bar
   $ xfs_io -f -c "pwrite -S 0xab 0 1M" -c "fsync" /mnt/baz

   # Now we have:
   # file bar == inode 257
   # file baz == inode 258

   $ mv /mnt/baz /mnt/foo

   # Now we have:
   # file bar == inode 257
   # file foo == inode 258

   $ xfs_io -c "pwrite -S 0xcd 0 1M" /mnt/foo

   # fsync bar before foo, it is important to trigger the race.
   $ xfs_io -c "fsync" /mnt/bar
   $ xfs_io -c "fsync" /mnt/foo

   # After this:
   # inode 257, file bar, is empty
   # inode 258, file foo, has 1M filled with 0xcd

   <power failure>

   # Replay the log:
   $ mount /dev/sdc /mnt

   # After this point file foo should have 1M filled with 0xcd and not 0xab

The following steps explain how the race happens:

1) Before the first fsync of inode 258, when it has the "baz" name, its
   ->logged_trans is 0, ->last_sub_trans is 0 and ->last_log_commit is -1.
   The inode also has the full sync flag set;

2) After the first fsync, we set inode 258 ->logged_trans to 6, which is
   the generation of the current transaction, and set ->last_log_commit
   to 0, which is the current value of ->last_sub_trans (done at
   btrfs_log_inode()).

   The full sync flag is cleared from the inode during the fsync.

   The log sub transaction that was committed had an ID of 0 and when we
   synced the log, at btrfs_sync_log(), we incremented root->log_transid
   from 0 to 1;

3) During the rename:

   We update inode 258, through btrfs_update_inode(), and that causes its
   ->last_sub_trans to be set to 1 (the current log transaction ID), and
   ->last_log_commit remains with a value of 0.

   After updating inode 258, because we have previously logged the inode
   in the previous fsync, we log again the inode through the call to
   btrfs_log_new_name(). This results in updating the inode's
   ->last_log_commit from 0 to 1 (the current value of its
   ->last_sub_trans).

   The ->last_sub_trans of inode 257 is updated to 1, which is the ID of
   the next log transaction;

4) Then a buffered write against inode 258 is made. This leaves the value
   of ->last_sub_trans as 1 (the ID of the current log transaction, stored
   at root->log_transid);

5) Then an fsync against inode 257 (or any other inode other than 258),
   happens. This results in committing the log transaction with ID 1,
   which results in updating root->last_log_commit to 1 and bumping
   root->log_transid from 1 to 2;

6) Then an fsync against inode 258 starts. We flush delalloc and wait only
   for writeback to complete, since the full sync flag is not set in the
   inode's runtime flags - we do not wait for ordered extents to complete.

   Then, at btrfs_sync_file(), we call btrfs_inode_in_log() before the
   ordered extent completes. The call returns true:

     static inline bool btrfs_inode_in_log(...)
     {
         bool ret = false;

         spin_lock(&inode->lock);
         if (inode->logged_trans == generation &&
             inode->last_sub_trans <= inode->last_log_commit &&
             inode->last_sub_trans <= inode->root->last_log_commit)
                 ret = true;
         spin_unlock(&inode->lock);
         return ret;
     }

   generation has a value of 6 (fs_info->generation), ->logged_trans also
   has a value of 6 (set when we logged the inode during the first fsync
   and when logging it during the rename), ->last_sub_trans has a value
   of 1, set during the rename (step 3), ->last_log_commit also has a
   value of 1 (set in step 3) and root->last_log_commit has a value of 1,
   which was set in step 5 when fsyncing inode 257.

   As a consequence we don't log the inode, any new extents and do not
   sync the log, resulting in a data loss if a power failure happens
   after the fsync and before the current transaction commits.
   Also, because we do not log the inode, after a power failure the mtime
   and ctime of the inode do not match those we had before.

   When the ordered extent completes before we call btrfs_inode_in_log(),
   then the call returns false and we log the inode and sync the log,
   since at the end of ordered extent completion we update the inode and
   set ->last_sub_trans to 2 (the value of root->log_transid) and
   ->last_log_commit to 1.

This problem is found after removing the check for the emptiness of the
inode's list of modified extents in the recent commit 209ecbb8585bf6
("btrfs: remove stale comment and logic from btrfs_inode_in_log()"),
added in the 5.13 merge window. However checking the emptiness of the
list is not really the way to solve this problem, and was never intended
to, because while that solves the problem for COW writes, the problem
persists for NOCOW writes because in that case the list is always empty.

In the case of NOCOW writes, even though we wait for the writeback to
complete before returning from btrfs_sync_file(), we end up not logging
the inode, which has a new mtime/ctime, and because we don't sync the log,
we never issue disk barriers (send REQ_PREFLUSH to the device) since that
only happens when we sync the log (when we write super blocks at
btrfs_sync_log()). So effectively, for a NOCOW case, when we return from
btrfs_sync_file() to user space, we are not guaranteeing that the data is
durably persisted on disk.

Also, while the example above uses a rename exchange to show how the
problem happens, it is not the only way to trigger it. An alternative
could be adding a new hard link to inode 258, since that also results
in calling btrfs_log_new_name() and updating the inode in the log.
An example reproducer using the addition of a hard link instead of a
rename operation:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ touch /mnt/bar
  $ xfs_io -f -c "pwrite -S 0xab 0 1M" -c "fsync" /mnt/foo

  $ ln /mnt/foo /mnt/foo_link
  $ xfs_io -c "pwrite -S 0xcd 0 1M" /mnt/foo

  $ xfs_io -c "fsync" /mnt/bar
  $ xfs_io -c "fsync" /mnt/foo

  <power failure>

  # Replay the log:
  $ mount /dev/sdc /mnt

  # After this point file foo often has 1M filled with 0xab and not 0xcd

The reasons leading to the final fsync of file foo, inode 258, not
persisting the new data are the same as for the previous example with
a rename operation.

So fix by never skipping logging and log syncing when there are still any
ordered extents in flight. To avoid making the conditional if statement
that checks if logging an inode is needed harder to read, place all the
logic into an helper function with separate if statements to make it more
manageable and easier to read.

A test case for fstests will follow soon.

For NOCOW writes, the problem existed before commit b5e6c3e170
("btrfs: always wait on ordered extents at fsync time"), introduced in
kernel 4.19, then it went away with that commit since we started to always
wait for ordered extent completion before logging.

The problem came back again once the fast fsync path was changed again to
avoid waiting for ordered extent completion, in commit 487781796d
("btrfs: make fast fsyncs wait only for writeback"), added in kernel 5.10.

However, for COW writes, the race only happens after the recent
commit 209ecbb8585bf6 ("btrfs: remove stale comment and logic from
btrfs_inode_in_log()"), introduced in the 5.13 merge window. For NOCOW
writes, the bug existed before that commit. So tag 5.10+ as the release
for stable backports.

CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19 10:13:11 +02:00
Greg Kroah-Hartman
a1ac3f3093 Merge 5.10.36 into android12-5.10
Changes in 5.10.36
	bus: mhi: core: Fix check for syserr at power_up
	bus: mhi: core: Clear configuration from channel context during reset
	bus: mhi: core: Sanity check values from remote device before use
	nitro_enclaves: Fix stale file descriptors on failed usercopy
	dyndbg: fix parsing file query without a line-range suffix
	s390/disassembler: increase ebpf disasm buffer size
	s390/zcrypt: fix zcard and zqueue hot-unplug memleak
	vhost-vdpa: fix vm_flags for virtqueue doorbell mapping
	tpm: acpi: Check eventlog signature before using it
	ACPI: custom_method: fix potential use-after-free issue
	ACPI: custom_method: fix a possible memory leak
	ftrace: Handle commands when closing set_ftrace_filter file
	ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld
	arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node
	arm64: dts: mt8173: fix property typo of 'phys' in dsi node
	ecryptfs: fix kernel panic with null dev_name
	fs/epoll: restore waking from ep_done_scan()
	mtd: spi-nor: core: Fix an issue of releasing resources during read/write
	Revert "mtd: spi-nor: macronix: Add support for mx25l51245g"
	mtd: spinand: core: add missing MODULE_DEVICE_TABLE()
	mtd: rawnand: atmel: Update ecc_stats.corrected counter
	mtd: physmap: physmap-bt1-rom: Fix unintentional stack access
	erofs: add unsupported inode i_format check
	spi: stm32-qspi: fix pm_runtime usage_count counter
	spi: spi-ti-qspi: Free DMA resources
	scsi: qla2xxx: Fix crash in qla2xxx_mqueuecommand()
	scsi: mpt3sas: Block PCI config access from userspace during reset
	mmc: uniphier-sd: Fix an error handling path in uniphier_sd_probe()
	mmc: uniphier-sd: Fix a resource leak in the remove function
	mmc: sdhci: Check for reset prior to DMA address unmap
	mmc: sdhci-pci: Fix initialization of some SD cards for Intel BYT-based controllers
	mmc: sdhci-tegra: Add required callbacks to set/clear CQE_EN bit
	mmc: block: Update ext_csd.cache_ctrl if it was written
	mmc: block: Issue a cache flush only when it's enabled
	mmc: core: Do a power cycle when the CMD11 fails
	mmc: core: Set read only for SD cards with permanent write protect bit
	mmc: core: Fix hanging on I/O during system suspend for removable cards
	irqchip/gic-v3: Do not enable irqs when handling spurious interrups
	cifs: Return correct error code from smb2_get_enc_key
	cifs: fix out-of-bound memory access when calling smb3_notify() at mount point
	cifs: detect dead connections only when echoes are enabled.
	smb2: fix use-after-free in smb2_ioctl_query_info()
	btrfs: handle remount to no compress during compression
	x86/build: Disable HIGHMEM64G selection for M486SX
	btrfs: fix metadata extent leak after failure to create subvolume
	intel_th: pci: Add Rocket Lake CPU support
	btrfs: fix race between transaction aborts and fsyncs leading to use-after-free
	posix-timers: Preserve return value in clock_adjtime32()
	fbdev: zero-fill colormap in fbcmap.c
	cpuidle: tegra: Fix C7 idling state on Tegra114
	bus: ti-sysc: Probe for l4_wkup and l4_cfg interconnect devices first
	staging: wimax/i2400m: fix byte-order issue
	spi: ath79: always call chipselect function
	spi: ath79: remove spi-master setup and cleanup assignment
	bus: mhi: core: Destroy SBL devices when moving to mission mode
	crypto: api - check for ERR pointers in crypto_destroy_tfm()
	crypto: qat - fix unmap invalid dma address
	usb: gadget: uvc: add bInterval checking for HS mode
	usb: webcam: Invalid size of Processing Unit Descriptor
	x86/sev: Do not require Hypervisor CPUID bit for SEV guests
	crypto: hisilicon/sec - fixes a printing error
	genirq/matrix: Prevent allocation counter corruption
	usb: gadget: f_uac2: validate input parameters
	usb: gadget: f_uac1: validate input parameters
	usb: dwc3: gadget: Ignore EP queue requests during bus reset
	usb: xhci: Fix port minor revision
	kselftest/arm64: mte: Fix compilation with native compiler
	ARM: tegra: acer-a500: Rename avdd to vdda of touchscreen node
	PCI: PM: Do not read power state in pci_enable_device_flags()
	kselftest/arm64: mte: Fix MTE feature detection
	ARM: dts: BCM5301X: fix "reg" formatting in /memory node
	ARM: dts: ux500: Fix up TVK R3 sensors
	x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)
	x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS
	efi/libstub: Add $(CLANG_FLAGS) to x86 flags
	soc/tegra: pmc: Fix completion of power-gate toggling
	arm64: dts: imx8mq-librem5-r3: Mark buck3 as always on
	tee: optee: do not check memref size on return from Secure World
	soundwire: cadence: only prepare attached devices on clock stop
	perf/arm_pmu_platform: Use dev_err_probe() for IRQ errors
	perf/arm_pmu_platform: Fix error handling
	random: initialize ChaCha20 constants with correct endianness
	usb: xhci-mtk: support quirk to disable usb2 lpm
	fpga: dfl: pci: add DID for D5005 PAC cards
	xhci: check port array allocation was successful before dereferencing it
	xhci: check control context is valid before dereferencing it.
	xhci: fix potential array out of bounds with several interrupters
	bus: mhi: core: Clear context for stopped channels from remove()
	ARM: dts: at91: change the key code of the gpio key
	tools/power/x86/intel-speed-select: Increase string size
	platform/x86: ISST: Account for increased timeout in some cases
	spi: dln2: Fix reference leak to master
	spi: omap-100k: Fix reference leak to master
	spi: qup: fix PM reference leak in spi_qup_remove()
	usb: gadget: tegra-xudc: Fix possible use-after-free in tegra_xudc_remove()
	usb: musb: fix PM reference leak in musb_irq_work()
	usb: core: hub: Fix PM reference leak in usb_port_resume()
	usb: dwc3: gadget: Check for disabled LPM quirk
	tty: n_gsm: check error while registering tty devices
	intel_th: Consistency and off-by-one fix
	phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove()
	crypto: sun8i-ss - Fix PM reference leak when pm_runtime_get_sync() fails
	crypto: sun8i-ce - Fix PM reference leak in sun8i_ce_probe()
	crypto: stm32/hash - Fix PM reference leak on stm32-hash.c
	crypto: stm32/cryp - Fix PM reference leak on stm32-cryp.c
	crypto: sa2ul - Fix PM reference leak in sa_ul_probe()
	crypto: omap-aes - Fix PM reference leak on omap-aes.c
	platform/x86: intel_pmc_core: Don't use global pmcdev in quirks
	spi: sync up initial chipselect state
	btrfs: do proper error handling in create_reloc_root
	btrfs: do proper error handling in btrfs_update_reloc_root
	btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
	drm: Added orientation quirk for OneGX1 Pro
	drm/qxl: do not run release if qxl failed to init
	drm/qxl: release shadow on shutdown
	drm/ast: Fix invalid usage of AST_MAX_HWC_WIDTH in cursor atomic_check
	drm/amd/display: changing sr exit latency
	drm/ast: fix memory leak when unload the driver
	drm/amd/display: Check for DSC support instead of ASIC revision
	drm/amd/display: Don't optimize bandwidth before disabling planes
	drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work
	drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field overwritten' issue
	scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe
	scsi: lpfc: Fix pt2pt connection does not recover after LOGO
	drm/amdgpu: Fix some unload driver issues
	sched/pelt: Fix task util_est update filtering
	kvfree_rcu: Use same set of GFP flags as does single-argument
	scsi: target: pscsi: Fix warning in pscsi_complete_cmd()
	media: ite-cir: check for receive overflow
	media: drivers: media: pci: sta2x11: fix Kconfig dependency on GPIOLIB
	media: imx: capture: Return -EPIPE from __capture_legacy_try_fmt()
	atomisp: don't let it go past pipes array
	power: supply: bq27xxx: fix power_avg for newer ICs
	extcon: arizona: Fix some issues when HPDET IRQ fires after the jack has been unplugged
	extcon: arizona: Fix various races on driver unbind
	media: media/saa7164: fix saa7164_encoder_register() memory leak bugs
	media: gspca/sq905.c: fix uninitialized variable
	power: supply: Use IRQF_ONESHOT
	backlight: qcom-wled: Use sink_addr for sync toggle
	backlight: qcom-wled: Fix FSC update issue for WLED5
	drm/amdgpu: mask the xgmi number of hops reported from psp to kfd
	drm/amdkfd: Fix UBSAN shift-out-of-bounds warning
	drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f
	drm/amd/pm: fix workload mismatch on vega10
	drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'
	drm/amd/display: DCHUB underflow counter increasing in some scenarios
	drm/amd/display: fix dml prefetch validation
	scsi: qla2xxx: Always check the return value of qla24xx_get_isp_stats()
	drm/vkms: fix misuse of WARN_ON
	scsi: qla2xxx: Fix use after free in bsg
	mmc: sdhci-esdhc-imx: validate pinctrl before use it
	mmc: sdhci-pci: Add PCI IDs for Intel LKF
	mmc: sdhci-brcmstb: Remove CQE quirk
	ata: ahci: Disable SXS for Hisilicon Kunpeng920
	drm/komeda: Fix bit check to import to value of proper type
	nvmet: return proper error code from discovery ctrl
	selftests/resctrl: Enable gcc checks to detect buffer overflows
	selftests/resctrl: Fix compilation issues for global variables
	selftests/resctrl: Fix compilation issues for other global variables
	selftests/resctrl: Clean up resctrl features check
	selftests/resctrl: Fix missing options "-n" and "-p"
	selftests/resctrl: Use resctrl/info for feature detection
	selftests/resctrl: Fix incorrect parsing of iMC counters
	selftests/resctrl: Fix checking for < 0 for unsigned values
	power: supply: cpcap-charger: Add usleep to cpcap charger to avoid usb plug bounce
	scsi: smartpqi: Use host-wide tag space
	scsi: smartpqi: Correct request leakage during reset operations
	scsi: smartpqi: Add new PCI IDs
	scsi: scsi_dh_alua: Remove check for ASC 24h in alua_rtpg()
	media: em28xx: fix memory leak
	media: vivid: update EDID
	drm/msm/dp: Fix incorrect NULL check kbot warnings in DP driver
	clk: socfpga: arria10: Fix memory leak of socfpga_clk on error return
	power: supply: generic-adc-battery: fix possible use-after-free in gab_remove()
	power: supply: s3c_adc_battery: fix possible use-after-free in s3c_adc_bat_remove()
	media: tc358743: fix possible use-after-free in tc358743_remove()
	media: adv7604: fix possible use-after-free in adv76xx_remove()
	media: i2c: adv7511-v4l2: fix possible use-after-free in adv7511_remove()
	media: i2c: tda1997: Fix possible use-after-free in tda1997x_remove()
	media: i2c: adv7842: fix possible use-after-free in adv7842_remove()
	media: platform: sti: Fix runtime PM imbalance in regs_show
	media: sun8i-di: Fix runtime PM imbalance in deinterlace_start_streaming
	media: dvb-usb: fix memory leak in dvb_usb_adapter_init
	media: gscpa/stv06xx: fix memory leak
	sched/fair: Ignore percpu threads for imbalance pulls
	drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal
	drm/msm/mdp5: Do not multiply vclk line count by 100
	drm/amdgpu/ttm: Fix memory leak userptr pages
	drm/radeon/ttm: Fix memory leak userptr pages
	drm/amd/display: Fix debugfs link_settings entry
	drm/amd/display: Fix UBSAN: shift-out-of-bounds warning
	drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug
	amdgpu: avoid incorrect %hu format string
	drm/amd/display: Try YCbCr420 color when YCbCr444 fails
	drm/amdgpu: fix NULL pointer dereference
	scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO response
	scsi: lpfc: Fix error handling for mailboxes completed in MBX_POLL mode
	scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logic
	mfd: intel-m10-bmc: Fix the register access range
	mfd: da9063: Support SMBus and I2C mode
	mfd: arizona: Fix rumtime PM imbalance on error
	scsi: libfc: Fix a format specifier
	perf: Rework perf_event_exit_event()
	sched,fair: Alternative sched_slice()
	block/rnbd-clt: Fix missing a memory free when unloading the module
	s390/archrandom: add parameter check for s390_arch_random_generate
	sched,psi: Handle potential task count underflow bugs more gracefully
	power: supply: cpcap-battery: fix invalid usage of list cursor
	ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer
	ALSA: hda/conexant: Re-order CX5066 quirk table entries
	ALSA: sb: Fix two use after free in snd_sb_qsound_build
	ALSA: usb-audio: Explicitly set up the clock selector
	ALSA: usb-audio: Add dB range mapping for Sennheiser Communications Headset PC 8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G7
	ALSA: hda/realtek: GA503 use same quirks as GA401
	ALSA: hda/realtek: fix mic boost on Intel NUC 8
	ALSA: hda/realtek - Headset Mic issue on HP platform
	ALSA: hda/realtek: fix static noise on ALC285 Lenovo laptops
	ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx
	tools/power/turbostat: Fix turbostat for AMD Zen CPUs
	btrfs: fix race when picking most recent mod log operation for an old root
	arm64/vdso: Discard .note.gnu.property sections in vDSO
	Makefile: Move -Wno-unused-but-set-variable out of GCC only block
	fs: fix reporting supported extra file attributes for statx()
	virtiofs: fix memory leak in virtio_fs_probe()
	kcsan, debugfs: Move debugfs file creation out of early init
	ubifs: Only check replay with inode type to judge if inode linked
	f2fs: fix error handling in f2fs_end_enable_verity()
	f2fs: fix to avoid out-of-bounds memory access
	mlxsw: spectrum_mr: Update egress RIF list before route's action
	openvswitch: fix stack OOB read while fragmenting IPv4 packets
	ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
	NFS: fs_context: validate UDP retrans to prevent shift out-of-bounds
	NFS: Don't discard pNFS layout segments that are marked for return
	NFSv4: Don't discard segments marked for return in _pnfs_return_layout()
	Input: ili210x - add missing negation for touch indication on ili210x
	jffs2: Fix kasan slab-out-of-bounds problem
	jffs2: Hook up splice_write callback
	powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors
	powerpc/eeh: Fix EEH handling for hugepages in ioremap space.
	powerpc/kexec_file: Use current CPU info while setting up FDT
	powerpc/32: Fix boot failure with CONFIG_STACKPROTECTOR
	powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
	intel_th: pci: Add Alder Lake-M support
	tpm: efi: Use local variable for calculating final log size
	tpm: vtpm_proxy: Avoid reading host log when using a virtual device
	crypto: arm/curve25519 - Move '.fpu' after '.arch'
	crypto: rng - fix crypto_rng_reset() refcounting when !CRYPTO_STATS
	md/raid1: properly indicate failure when ending a failed write request
	dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences
	fuse: fix write deadlock
	exfat: fix erroneous discard when clear cluster bit
	sfc: farch: fix TX queue lookup in TX flush done handling
	sfc: farch: fix TX queue lookup in TX event handling
	security: commoncap: fix -Wstringop-overread warning
	Fix misc new gcc warnings
	jffs2: check the validity of dstlen in jffs2_zlib_compress()
	smb3: when mounting with multichannel include it in requested capabilities
	smb3: do not attempt multichannel to server which does not support it
	Revert 337f13046f ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
	futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI
	x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
	kbuild: update config_data.gz only when the content of .config is changed
	ext4: annotate data race in start_this_handle()
	ext4: annotate data race in jbd2_journal_dirty_metadata()
	ext4: fix check to prevent false positive report of incorrect used inodes
	ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
	ext4: fix error code in ext4_commit_super
	ext4: fix ext4_error_err save negative errno into superblock
	ext4: fix error return code in ext4_fc_perform_commit()
	ext4: allow the dax flag to be set and cleared on inline directories
	ext4: Fix occasional generic/418 failure
	media: dvbdev: Fix memory leak in dvb_media_device_free()
	media: dvb-usb: Fix use-after-free access
	media: dvb-usb: Fix memory leak at error in dvb_usb_device_init()
	media: staging/intel-ipu3: Fix memory leak in imu_fmt
	media: staging/intel-ipu3: Fix set_fmt error handling
	media: staging/intel-ipu3: Fix race condition during set_fmt
	media: v4l2-ctrls: fix reference to freed memory
	media: venus: hfi_parser: Don't initialize parser on v1
	usb: gadget: dummy_hcd: fix gpf in gadget_setup
	usb: gadget: Fix double free of device descriptor pointers
	usb: gadget/function/f_fs string table fix for multiple languages
	usb: dwc3: gadget: Remove FS bInterval_m1 limitation
	usb: dwc3: gadget: Fix START_TRANSFER link state check
	usb: dwc3: core: Do core softreset when switch mode
	usb: dwc2: Fix session request interrupt handler
	tty: fix memory leak in vc_deallocate
	rsi: Use resume_noirq for SDIO
	tools/power turbostat: Fix offset overflow issue in index converting
	tracing: Map all PIDs to command lines
	tracing: Restructure trace_clock_global() to never block
	dm persistent data: packed struct should have an aligned() attribute too
	dm space map common: fix division bug in sm_ll_find_free_block()
	dm integrity: fix missing goto in bitmap_flush_interval error handling
	dm rq: fix double free of blk_mq_tag_set in dev remove after table load fails
	lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf()
	thermal/drivers/cpufreq_cooling: Fix slab OOB issue
	thermal/core/fair share: Lock the thermal zone while looping over instances
	Linux 5.10.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7b8075de5edd8de69205205cddb9a3273d7d0810
2021-05-13 14:22:11 +02:00
Filipe Manana
1d852d6bb4 btrfs: fix race when picking most recent mod log operation for an old root
[ Upstream commit f9690f426b2134cc3e74bfc5d9dfd6a4b2ca5281 ]

Commit dbcc7d57bffc0c ("btrfs: fix race when cloning extent buffer during
rewind of an old root"), fixed a race when we need to rewind the extent
buffer of an old root. It was caused by picking a new mod log operation
for the extent buffer while getting a cloned extent buffer with an outdated
number of items (off by -1), because we cloned the extent buffer without
locking it first.

However there is still another similar race, but in the opposite direction.
The cloned extent buffer has a number of items that does not match the
number of tree mod log operations that are going to be replayed. This is
because right after we got the last (most recent) tree mod log operation to
replay and before locking and cloning the extent buffer, another task adds
a new pointer to the extent buffer, which results in adding a new tree mod
log operation and incrementing the number of items in the extent buffer.
So after cloning we have mismatch between the number of items in the extent
buffer and the number of mod log operations we are going to apply to it.
This results in hitting a BUG_ON() that produces the following stack trace:

   ------------[ cut here ]------------
   kernel BUG at fs/btrfs/tree-mod-log.c:675!
   invalid opcode: 0000 [#1] SMP KASAN PTI
   CPU: 3 PID: 4811 Comm: crawl_1215 Tainted: G        W         5.12.0-7d1efdf501f8-misc-next+ #99
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
   RIP: 0010:tree_mod_log_rewind+0x3b1/0x3c0
   Code: 05 48 8d 74 10 (...)
   RSP: 0018:ffffc90001027090 EFLAGS: 00010293
   RAX: 0000000000000000 RBX: ffff8880a8514600 RCX: ffffffffaa9e59b6
   RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8880a851462c
   RBP: ffffc900010270e0 R08: 00000000000000c0 R09: ffffed1004333417
   R10: ffff88802199a0b7 R11: ffffed1004333416 R12: 000000000000000e
   R13: ffff888135af8748 R14: ffff88818766ff00 R15: ffff8880a851462c
   FS:  00007f29acf62700(0000) GS:ffff8881f2200000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007f0e6013f718 CR3: 000000010d42e003 CR4: 0000000000170ee0
   Call Trace:
    btrfs_get_old_root+0x16a/0x5c0
    ? lock_downgrade+0x400/0x400
    btrfs_search_old_slot+0x192/0x520
    ? btrfs_search_slot+0x1090/0x1090
    ? free_extent_buffer.part.61+0xd7/0x140
    ? free_extent_buffer+0x13/0x20
    resolve_indirect_refs+0x3e9/0xfc0
    ? lock_downgrade+0x400/0x400
    ? __kasan_check_read+0x11/0x20
    ? add_prelim_ref.part.11+0x150/0x150
    ? lock_downgrade+0x400/0x400
    ? __kasan_check_read+0x11/0x20
    ? lock_acquired+0xbb/0x620
    ? __kasan_check_write+0x14/0x20
    ? do_raw_spin_unlock+0xa8/0x140
    ? rb_insert_color+0x340/0x360
    ? prelim_ref_insert+0x12d/0x430
    find_parent_nodes+0x5c3/0x1830
    ? stack_trace_save+0x87/0xb0
    ? resolve_indirect_refs+0xfc0/0xfc0
    ? fs_reclaim_acquire+0x67/0xf0
    ? __kasan_check_read+0x11/0x20
    ? lockdep_hardirqs_on_prepare+0x210/0x210
    ? fs_reclaim_acquire+0x67/0xf0
    ? __kasan_check_read+0x11/0x20
    ? ___might_sleep+0x10f/0x1e0
    ? __kasan_kmalloc+0x9d/0xd0
    ? trace_hardirqs_on+0x55/0x120
    btrfs_find_all_roots_safe+0x142/0x1e0
    ? find_parent_nodes+0x1830/0x1830
    ? trace_hardirqs_on+0x55/0x120
    ? ulist_free+0x1f/0x30
    ? btrfs_inode_flags_to_xflags+0x50/0x50
    iterate_extent_inodes+0x20e/0x580
    ? tree_backref_for_extent+0x230/0x230
    ? release_extent_buffer+0x225/0x280
    ? read_extent_buffer+0xdd/0x110
    ? lock_downgrade+0x400/0x400
    ? __kasan_check_read+0x11/0x20
    ? lock_acquired+0xbb/0x620
    ? __kasan_check_write+0x14/0x20
    ? do_raw_spin_unlock+0xa8/0x140
    ? _raw_spin_unlock+0x22/0x30
    ? release_extent_buffer+0x225/0x280
    iterate_inodes_from_logical+0x129/0x170
    ? iterate_inodes_from_logical+0x129/0x170
    ? btrfs_inode_flags_to_xflags+0x50/0x50
    ? iterate_extent_inodes+0x580/0x580
    ? __vmalloc_node+0x92/0xb0
    ? init_data_container+0x34/0xb0
    ? init_data_container+0x34/0xb0
    ? kvmalloc_node+0x60/0x80
    btrfs_ioctl_logical_to_ino+0x158/0x230
    btrfs_ioctl+0x2038/0x4360
    ? __kasan_check_write+0x14/0x20
    ? mmput+0x3b/0x220
    ? btrfs_ioctl_get_supported_features+0x30/0x30
    ? __kasan_check_read+0x11/0x20
    ? __kasan_check_read+0x11/0x20
    ? lock_release+0xc8/0x650
    ? __might_fault+0x64/0xd0
    ? __kasan_check_read+0x11/0x20
    ? lock_downgrade+0x400/0x400
    ? lockdep_hardirqs_on_prepare+0x210/0x210
    ? lockdep_hardirqs_on_prepare+0x13/0x210
    ? _raw_spin_unlock_irqrestore+0x51/0x63
    ? __kasan_check_read+0x11/0x20
    ? do_vfs_ioctl+0xfc/0x9d0
    ? ioctl_file_clone+0xe0/0xe0
    ? lock_downgrade+0x400/0x400
    ? lockdep_hardirqs_on_prepare+0x210/0x210
    ? __kasan_check_read+0x11/0x20
    ? lock_release+0xc8/0x650
    ? __task_pid_nr_ns+0xd3/0x250
    ? __kasan_check_read+0x11/0x20
    ? __fget_files+0x160/0x230
    ? __fget_light+0xf2/0x110
    __x64_sys_ioctl+0xc3/0x100
    do_syscall_64+0x37/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
   RIP: 0033:0x7f29ae85b427
   Code: 00 00 90 48 8b (...)
   RSP: 002b:00007f29acf5fcf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   RAX: ffffffffffffffda RBX: 00007f29acf5ff40 RCX: 00007f29ae85b427
   RDX: 00007f29acf5ff48 RSI: 00000000c038943b RDI: 0000000000000003
   RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f29acf60120
   R10: 00005640d5fc7b00 R11: 0000000000000246 R12: 0000000000000003
   R13: 00007f29acf5ff48 R14: 00007f29acf5ff40 R15: 00007f29acf5fef8
   Modules linked in:
   ---[ end trace 85e5fce078dfbe04 ]---

  (gdb) l *(tree_mod_log_rewind+0x3b1)
  0xffffffff819e5b21 is in tree_mod_log_rewind (fs/btrfs/tree-mod-log.c:675).
  670                      * the modification. As we're going backwards, we do the
  671                      * opposite of each operation here.
  672                      */
  673                     switch (tm->op) {
  674                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
  675                             BUG_ON(tm->slot < n);
  676                             fallthrough;
  677                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING:
  678                     case BTRFS_MOD_LOG_KEY_REMOVE:
  679                             btrfs_set_node_key(eb, &tm->key, tm->slot);
  (gdb) quit

The following steps explain in more detail how it happens:

1) We have one tree mod log user (through fiemap or the logical ino ioctl),
   with a sequence number of 1, so we have fs_info->tree_mod_seq == 1.
   This is task A;

2) Another task is at ctree.c:balance_level() and we have eb X currently as
   the root of the tree, and we promote its single child, eb Y, as the new
   root.

   Then, at ctree.c:balance_level(), we call:

      ret = btrfs_tree_mod_log_insert_root(root->node, child, true);

3) At btrfs_tree_mod_log_insert_root() we create a tree mod log operation
   of type BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING, with a ->logical field
   pointing to ebX->start. We only have one item in eb X, so we create
   only one tree mod log operation, and store in the "tm_list" array;

4) Then, still at btrfs_tree_mod_log_insert_root(), we create a tree mod
   log element of operation type BTRFS_MOD_LOG_ROOT_REPLACE, ->logical set
   to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level
   set to the level of eb X and ->generation set to the generation of eb X;

5) Then btrfs_tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
   "tm_list" as argument. After that, tree_mod_log_free_eb() calls
   tree_mod_log_insert(). This inserts the mod log operation of type
   BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING from step 3 into the rbtree
   with a sequence number of 2 (and fs_info->tree_mod_seq set to 2);

6) Then, after inserting the "tm_list" single element into the tree mod
   log rbtree, the BTRFS_MOD_LOG_ROOT_REPLACE element is inserted, which
   gets the sequence number 3 (and fs_info->tree_mod_seq set to 3);

7) Back to ctree.c:balance_level(), we free eb X by calling
   btrfs_free_tree_block() on it. Because eb X was created in the current
   transaction, has no other references and writeback did not happen for
   it, we add it back to the free space cache/tree;

8) Later some other task B allocates the metadata extent from eb X, since
   it is marked as free space in the space cache/tree, and uses it as a
   node for some other btree;

9) The tree mod log user task calls btrfs_search_old_slot(), which calls
   btrfs_get_old_root(), and finally that calls tree_mod_log_oldest_root()
   with time_seq == 1 and eb_root == eb Y;

10) The first iteration of the while loop finds the tree mod log element
    with sequence number 3, for the logical address of eb Y and of type
    BTRFS_MOD_LOG_ROOT_REPLACE;

11) Because the operation type is BTRFS_MOD_LOG_ROOT_REPLACE, we don't
    break out of the loop, and set root_logical to point to
    tm->old_root.logical, which corresponds to the logical address of
    eb X;

12) On the next iteration of the while loop, the call to
    tree_mod_log_search_oldest() returns the smallest tree mod log element
    for the logical address of eb X, which has a sequence number of 2, an
    operation type of BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and
    corresponds to the old slot 0 of eb X (eb X had only 1 item in it
    before being freed at step 7);

13) We then break out of the while loop and return the tree mod log
    operation of type BTRFS_MOD_LOG_ROOT_REPLACE (eb Y), and not the one
    for slot 0 of eb X, to btrfs_get_old_root();

14) At btrfs_get_old_root(), we process the BTRFS_MOD_LOG_ROOT_REPLACE
    operation and set "logical" to the logical address of eb X, which was
    the old root. We then call tree_mod_log_search() passing it the logical
    address of eb X and time_seq == 1;

15) But before calling tree_mod_log_search(), task B locks eb X, adds a
    key to eb X, which results in adding a tree mod log operation of type
    BTRFS_MOD_LOG_KEY_ADD, with a sequence number of 4, to the tree mod
    log, and increments the number of items in eb X from 0 to 1.
    Now fs_info->tree_mod_seq has a value of 4;

16) Task A then calls tree_mod_log_search(), which returns the most recent
    tree mod log operation for eb X, which is the one just added by task B
    at the previous step, with a sequence number of 4, a type of
    BTRFS_MOD_LOG_KEY_ADD and for slot 0;

17) Before task A locks and clones eb X, task A adds another key to eb X,
    which results in adding a new BTRFS_MOD_LOG_KEY_ADD mod log operation,
    with a sequence number of 5, for slot 1 of eb X, increments the
    number of items in eb X from 1 to 2, and unlocks eb X.
    Now fs_info->tree_mod_seq has a value of 5;

18) Task A then locks eb X and clones it. The clone has a value of 2 for
    the number of items and the pointer "tm" points to the tree mod log
    operation with sequence number 4, not the most recent one with a
    sequence number of 5, so there is mismatch between the number of
    mod log operations that are going to be applied to the cloned version
    of eb X and the number of items in the clone;

19) Task A then calls tree_mod_log_rewind() with the clone of eb X, the
    tree mod log operation with sequence number 4 and a type of
    BTRFS_MOD_LOG_KEY_ADD, and time_seq == 1;

20) At tree_mod_log_rewind(), we set the local variable "n" with a value
    of 2, which is the number of items in the clone of eb X.

    Then in the first iteration of the while loop, we process the mod log
    operation with sequence number 4, which is targeted at slot 0 and has
    a type of BTRFS_MOD_LOG_KEY_ADD. This results in decrementing "n" from
    2 to 1.

    Then we pick the next tree mod log operation for eb X, which is the
    tree mod log operation with a sequence number of 2, a type of
    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and for slot 0, it is the one
    added in step 5 to the tree mod log tree.

    We go back to the top of the loop to process this mod log operation,
    and because its slot is 0 and "n" has a value of 1, we hit the BUG_ON:

        (...)
        switch (tm->op) {
        case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                BUG_ON(tm->slot < n);
                fallthrough;
	(...)

Fix this by checking for a more recent tree mod log operation after locking
and cloning the extent buffer of the old root node, and use it as the first
operation to apply to the cloned extent buffer when rewinding it.

Stable backport notes: due to moved code and renames, in =< 5.11 the
change should be applied to ctree.c:get_old_root.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/20210404040732.GZ32440@hungrycats.org/
Fixes: 834328a849 ("Btrfs: tree mod log's old roots could still be part of the tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11 14:47:33 +02:00
Josef Bacik
9c60c881d6 btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
[ Upstream commit 7a9213a93546e7eaef90e6e153af6b8fc7553f10 ]

A few BUG_ON()'s in replace_path are purely to keep us from making
logical mistakes, so replace them with ASSERT()'s.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11 14:47:22 +02:00
Josef Bacik
f32b84d7c9 btrfs: do proper error handling in btrfs_update_reloc_root
[ Upstream commit 592fbcd50c99b8adf999a2a54f9245caff333139 ]

We call btrfs_update_root in btrfs_update_reloc_root, which can fail for
all sorts of reasons, including IO errors.  Instead of panicing the box
lets return the error, now that all callers properly handle those
errors.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11 14:47:22 +02:00
Josef Bacik
224c654a2e btrfs: do proper error handling in create_reloc_root
[ Upstream commit 84c50ba5214c2f3c1be4a931d521ec19f55dfdc8 ]

We do memory allocations here, read blocks from disk, all sorts of
operations that could easily fail at any given point.  Instead of
panicing the box, simply return the error back up the chain, all callers
at this point have proper error handling.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11 14:47:22 +02:00
Filipe Manana
a4794be7b0 btrfs: fix race between transaction aborts and fsyncs leading to use-after-free
commit 061dde8245356d8864d29e25207aa4daa0be4d3c upstream.

There is a race between a task aborting a transaction during a commit,
a task doing an fsync and the transaction kthread, which leads to an
use-after-free of the log root tree. When this happens, it results in a
stack trace like the following:

  BTRFS info (device dm-0): forced readonly
  BTRFS warning (device dm-0): Skipping commit of aborted transaction.
  BTRFS: error (device dm-0) in cleanup_transaction:1958: errno=-5 IO failure
  BTRFS warning (device dm-0): lost page write due to IO error on /dev/mapper/error-test (-5)
  BTRFS warning (device dm-0): Skipping commit of aborted transaction.
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0xa4e8 len 4096 err no 10
  BTRFS error (device dm-0): error writing primary super block to device 1
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e000 len 4096 err no 10
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e008 len 4096 err no 10
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e010 len 4096 err no 10
  BTRFS: error (device dm-0) in write_all_supers:4110: errno=-5 IO failure (1 errors while writing supers)
  BTRFS: error (device dm-0) in btrfs_sync_log:3308: errno=-5 IO failure
  general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b68: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 2 PID: 2458471 Comm: fsstress Not tainted 5.12.0-rc5-btrfs-next-84 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  RIP: 0010:__mutex_lock+0x139/0xa40
  Code: c0 74 19 (...)
  RSP: 0018:ffff9f18830d7b00 EFLAGS: 00010202
  RAX: 6b6b6b6b6b6b6b68 RBX: 0000000000000001 RCX: 0000000000000002
  RDX: ffffffffb9c54d13 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff9f18830d7bc0 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff9f18830d7be0 R11: 0000000000000001 R12: ffff8c6cd199c040
  R13: ffff8c6c95821358 R14: 00000000fffffffb R15: ffff8c6cbcf01358
  FS:  00007fa9140c2b80(0000) GS:ffff8c6fac600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fa913d52000 CR3: 000000013d2b4003 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? __btrfs_handle_fs_error+0xde/0x146 [btrfs]
   ? btrfs_sync_log+0x7c1/0xf20 [btrfs]
   ? btrfs_sync_log+0x7c1/0xf20 [btrfs]
   btrfs_sync_log+0x7c1/0xf20 [btrfs]
   btrfs_sync_file+0x40c/0x580 [btrfs]
   do_fsync+0x38/0x70
   __x64_sys_fsync+0x10/0x20
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fa9142a55c3
  Code: 8b 15 09 (...)
  RSP: 002b:00007fff26278d48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
  RAX: ffffffffffffffda RBX: 0000563c83cb4560 RCX: 00007fa9142a55c3
  RDX: 00007fff26278cb0 RSI: 00007fff26278cb0 RDI: 0000000000000005
  RBP: 0000000000000005 R08: 0000000000000001 R09: 00007fff26278d5c
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000340
  R13: 00007fff26278de0 R14: 00007fff26278d96 R15: 0000563c83ca57c0
  Modules linked in: btrfs dm_zero dm_snapshot dm_thin_pool (...)
  ---[ end trace ee2f1b19327d791d ]---

The steps that lead to this crash are the following:

1) We are at transaction N;

2) We have two tasks with a transaction handle attached to transaction N.
   Task A and Task B. Task B is doing an fsync;

3) Task B is at btrfs_sync_log(), and has saved fs_info->log_root_tree
   into a local variable named 'log_root_tree' at the top of
   btrfs_sync_log(). Task B is about to call write_all_supers(), but
   before that...

4) Task A calls btrfs_commit_transaction(), and after it sets the
   transaction state to TRANS_STATE_COMMIT_START, an error happens before
   it waits for the transaction's 'num_writers' counter to reach a value
   of 1 (no one else attached to the transaction), so it jumps to the
   label "cleanup_transaction";

5) Task A then calls cleanup_transaction(), where it aborts the
   transaction, setting BTRFS_FS_STATE_TRANS_ABORTED on fs_info->fs_state,
   setting the ->aborted field of the transaction and the handle to an
   errno value and also setting BTRFS_FS_STATE_ERROR on fs_info->fs_state.

   After that, at cleanup_transaction(), it deletes the transaction from
   the list of transactions (fs_info->trans_list), sets the transaction
   to the state TRANS_STATE_COMMIT_DOING and then waits for the number
   of writers to go down to 1, as it's currently 2 (1 for task A and 1
   for task B);

6) The transaction kthread is running and sees that BTRFS_FS_STATE_ERROR
   is set in fs_info->fs_state, so it calls btrfs_cleanup_transaction().

   There it sees the list fs_info->trans_list is empty, and then proceeds
   into calling btrfs_drop_all_logs(), which frees the log root tree with
   a call to btrfs_free_log_root_tree();

7) Task B calls write_all_supers() and, shortly after, under the label
   'out_wake_log_root', it deferences the pointer stored in
   'log_root_tree', which was already freed in the previous step by the
   transaction kthread. This results in a use-after-free leading to a
   crash.

Fix this by deleting the transaction from the list of transactions at
cleanup_transaction() only after setting the transaction state to
TRANS_STATE_COMMIT_DOING and waiting for all existing tasks that are
attached to the transaction to release their transaction handles.
This makes the transaction kthread wait for all the tasks attached to
the transaction to be done with the transaction before dropping the
log roots and doing other cleanups.

Fixes: ef67963dac ("btrfs: drop logs when we've aborted a transaction")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:15 +02:00
Filipe Manana
97f30747b2 btrfs: fix metadata extent leak after failure to create subvolume
commit 67addf29004c5be9fa0383c82a364bb59afc7f84 upstream.

When creating a subvolume we allocate an extent buffer for its root node
after starting a transaction. We setup a root item for the subvolume that
points to that extent buffer and then attempt to insert the root item into
the root tree - however if that fails, due to ENOMEM for example, we do
not free the extent buffer previously allocated and we do not abort the
transaction (as at that point we did nothing that can not be undone).

This means that we effectively do not return the metadata extent back to
the free space cache/tree and we leave a delayed reference for it which
causes a metadata extent item to be added to the extent tree, in the next
transaction commit, without having backreferences. When this happens
'btrfs check' reports the following:

  $ btrfs check /dev/sdi
  Opening filesystem to check...
  Checking filesystem on /dev/sdi
  UUID: dce2cb9d-025f-4b05-a4bf-cee0ad3785eb
  [1/7] checking root items
  [2/7] checking extents
  ref mismatch on [30425088 16384] extent item 1, found 0
  backref 30425088 root 256 not referenced back 0x564a91c23d70
  incorrect global backref count on 30425088 found 1 wanted 0
  backpointer mismatch on [30425088 16384]
  owner ref check failed [30425088 16384]
  ERROR: errors found in extent allocation tree or chunk allocation
  [3/7] checking free space cache
  [4/7] checking fs roots
  [5/7] checking only csums items (without verifying data)
  [6/7] checking root refs
  [7/7] checking quota groups skipped (not enabled on this FS)
  found 212992 bytes used, error(s) found
  total csum bytes: 0
  total tree bytes: 131072
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 124669
  file data blocks allocated: 65536
   referenced 65536

So fix this by freeing the metadata extent if btrfs_insert_root() returns
an error.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:15 +02:00
Qu Wenruo
dba16ca6f3 btrfs: handle remount to no compress during compression
commit 1d8ba9e7e785b6625f4d8e978e8a284b144a7077 upstream.

[BUG]
When running btrfs/071 with inode_need_compress() removed from
compress_file_range(), we got the following crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
  RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
  Call Trace:
   ? submit_compressed_extents+0x450/0x450 [btrfs]
   async_cow_start+0x16/0x40 [btrfs]
   btrfs_work_helper+0xf2/0x3e0 [btrfs]
   process_one_work+0x278/0x5e0
   worker_thread+0x55/0x400
   ? process_one_work+0x5e0/0x5e0
   kthread+0x168/0x190
   ? kthread_create_worker_on_cpu+0x70/0x70
   ret_from_fork+0x22/0x30
  ---[ end trace 65faf4eae941fa7d ]---

This is already after the patch "btrfs: inode: fix NULL pointer
dereference if inode doesn't need compression."

[CAUSE]
@pages is firstly created by kcalloc() in compress_file_extent():
                pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);

Then passed to btrfs_compress_pages() to be utilized there:

                ret = btrfs_compress_pages(...
                                           pages,
                                           &nr_pages,
                                           ...);

btrfs_compress_pages() will initialize each page as output, in
zlib_compress_pages() we have:

                        pages[nr_pages] = out_page;
                        nr_pages++;

Normally this is completely fine, but there is a special case which
is in btrfs_compress_pages() itself:

        switch (type) {
        default:
                return -E2BIG;
        }

In this case, we didn't modify @pages nor @out_pages, leaving them
untouched, then when we cleanup pages, the we can hit NULL pointer
dereference again:

        if (pages) {
                for (i = 0; i < nr_pages; i++) {
                        WARN_ON(pages[i]->mapping);
                        put_page(pages[i]);
                }
        ...
        }

Since pages[i] are all initialized to zero, and btrfs_compress_pages()
doesn't change them at all, accessing pages[i]->mapping would lead to
NULL pointer dereference.

This is not possible for current kernel, as we check
inode_need_compress() before doing pages allocation.
But if we're going to remove that inode_need_compress() in
compress_file_extent(), then it's going to be a problem.

[FIX]
When btrfs_compress_pages() hits its default case, modify @out_pages to
0 to prevent such problem from happening.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212331
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:15 +02:00
Greg Kroah-Hartman
b9a61f9a56 Merge 5.10.27 into android12-5.10
Changes in 5.10.27
	mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
	mm/memcg: set memcg when splitting page
	mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
	net: stmmac: fix dma physical address of descriptor when display ring
	net: fec: ptp: avoid register access when ipg clock is disabled
	powerpc/4xx: Fix build errors from mfdcr()
	atm: eni: dont release is never initialized
	atm: lanai: dont run lanai_dev_close if not open
	Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
	ALSA: hda: ignore invalid NHLT table
	ixgbe: Fix memleak in ixgbe_configure_clsu32
	scsi: ufs: ufs-qcom: Disable interrupt in reset path
	blk-cgroup: Fix the recursive blkg rwstat
	net: tehuti: fix error return code in bdx_probe()
	net: intel: iavf: fix error return code of iavf_init_get_resources()
	sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
	gianfar: fix jumbo packets+napi+rx overrun crash
	cifs: ask for more credit on async read/write code paths
	gfs2: fix use-after-free in trans_drain
	cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
	gpiolib: acpi: Add missing IRQF_ONESHOT
	nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
	NFS: Correct size calculation for create reply length
	net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
	net: wan: fix error return code of uhdlc_init()
	net: davicom: Use platform_get_irq_optional()
	net: enetc: set MAC RX FIFO to recommended value
	atm: uPD98402: fix incorrect allocation
	atm: idt77252: fix null-ptr-dereference
	cifs: change noisy error message to FYI
	irqchip/ingenic: Add support for the JZ4760
	kbuild: add image_name to no-sync-config-targets
	kbuild: dummy-tools: fix inverted tests for gcc
	umem: fix error return code in mm_pci_probe()
	sparc64: Fix opcode filtering in handling of no fault loads
	habanalabs: Call put_pid() when releasing control device
	staging: rtl8192e: fix kconfig dependency on CRYPTO
	u64_stats,lockdep: Fix u64_stats_init() vs lockdep
	kselftest: arm64: Fix exit code of sve-ptrace
	regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck
	block: Fix REQ_OP_ZONE_RESET_ALL handling
	drm/amd/display: Revert dram_clock_change_latency for DCN2.1
	drm/amdgpu: fb BO should be ttm_bo_type_device
	drm/radeon: fix AGP dependency
	nvme: simplify error logic in nvme_validate_ns()
	nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
	nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
	nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
	nvme-core: check ctrl css before setting up zns
	nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
	nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
	nfs: we don't support removing system.nfs4_acl
	block: Suppress uevent for hidden device when removed
	mm/fork: clear PASID for new mm
	ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
	ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
	static_call: Pull some static_call declarations to the type headers
	static_call: Allow module use without exposing static_call_key
	static_call: Fix the module key fixup
	static_call: Fix static_call_set_init()
	KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
	btrfs: fix sleep while in non-sleep context during qgroup removal
	selinux: don't log MAC_POLICY_LOAD record on failed policy load
	selinux: fix variable scope issue in live sidtab conversion
	netsec: restore phy power state after controller reset
	platform/x86: intel-vbtn: Stop reporting SW_DOCK events
	psample: Fix user API breakage
	z3fold: prevent reclaim/free race for headless pages
	squashfs: fix inode lookup sanity checks
	squashfs: fix xattr id and id lookup sanity checks
	hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
	kasan: fix per-page tags for non-page_alloc pages
	gcov: fix clang-11+ support
	ACPI: video: Add missing callback back for Sony VPCEH3U1E
	ACPICA: Always create namespace nodes using acpi_ns_create_node()
	arm64: stacktrace: don't trace arch_stack_walk()
	arm64: dts: ls1046a: mark crypto engine dma coherent
	arm64: dts: ls1012a: mark crypto engine dma coherent
	arm64: dts: ls1043a: mark crypto engine dma coherent
	ARM: dts: at91: sam9x60: fix mux-mask for PA7 so it can be set to A, B and C
	ARM: dts: at91: sam9x60: fix mux-mask to match product's datasheet
	ARM: dts: at91-sama5d27_som1: fix phy address to 7
	integrity: double check iint_cache was initialized
	drm/etnaviv: Use FOLL_FORCE for userptr
	drm/amd/pm: workaround for audio noise issue
	drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
	drm/amdgpu: Add additional Sienna Cichlid PCI ID
	drm/i915: Fix the GT fence revocation runtime PM logic
	dm verity: fix DM_VERITY_OPTS_MAX value
	dm ioctl: fix out of bounds array access when no devices
	bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD
	ARM: OMAP2+: Fix smartreflex init regression after dropping legacy data
	soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva
	veth: Store queue_mapping independently of XDP prog presence
	bpf: Change inode_storage's lookup_elem return value from NULL to -EBADF
	libbpf: Fix INSTALL flag order
	net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets
	net/mlx5e: When changing XDP program without reset, take refs for XSK RQs
	net/mlx5e: Don't match on Geneve options in case option masks are all zero
	ipv6: fix suspecious RCU usage warning
	drop_monitor: Perform cleanup upon probe registration failure
	macvlan: macvlan_count_rx() needs to be aware of preemption
	net: sched: validate stab values
	net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
	igc: reinit_locked() should be called with rtnl_lock
	igc: Fix Pause Frame Advertising
	igc: Fix Supported Pause Frame Link Setting
	igc: Fix igc_ptp_rx_pktstamp()
	e1000e: add rtnl_lock() to e1000_reset_task
	e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
	net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
	net: phy: broadcom: Add power down exit reset state delay
	ftgmac100: Restart MAC HW once
	clk: qcom: gcc-sc7180: Use floor ops for the correct sdcc1 clk
	net: ipa: terminate message handler arrays
	net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()
	flow_dissector: fix byteorder of dissected ICMP ID
	selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed
	netfilter: ctnetlink: fix dump of the expect mask attribute
	net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"
	net: phylink: Fix phylink_err() function name error in phylink_major_config
	tipc: better validate user input in tipc_nl_retrieve_key()
	tcp: relookup sock for RST+ACK packets handled by obsolete req sock
	can: isotp: isotp_setsockopt(): only allow to set low level TX flags for CAN-FD
	can: isotp: TX-path: ensure that CAN frame flags are initialized
	can: peak_usb: add forgotten supported devices
	can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate
	can: kvaser_pciefd: Always disable bus load reporting
	can: c_can_pci: c_can_pci_remove(): fix use-after-free
	can: c_can: move runtime PM enable/disable to c_can_platform
	can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
	can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors
	mac80211: fix rate mask reset
	mac80211: Allow HE operation to be longer than expected.
	selftests/net: fix warnings on reuseaddr_ports_exhausted
	nfp: flower: fix unsupported pre_tunnel flows
	nfp: flower: add ipv6 bit to pre_tunnel control message
	nfp: flower: fix pre_tun mask id allocation
	ftrace: Fix modify_ftrace_direct.
	drm/msm/dsi: fix check-before-set in the 7nm dsi_pll code
	ionic: linearize tso skb with too many frags
	net/sched: cls_flower: fix only mask bit check in the validate_ct_state
	netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags
	netfilter: nftables: allow to update flowtable flags
	netfilter: flowtable: Make sure GC works periodically in idle system
	libbpf: Fix error path in bpf_object__elf_init()
	libbpf: Use SOCK_CLOEXEC when opening the netlink socket
	ARM: dts: imx6ull: fix ubi filesystem mount failed
	ipv6: weaken the v4mapped source check
	octeontx2-af: Formatting debugfs entry rsrc_alloc.
	octeontx2-af: Modify default KEX profile to extract TX packet fields
	octeontx2-af: Remove TOS field from MKEX TX
	octeontx2-af: Fix irq free in rvu teardown
	octeontx2-pf: Clear RSS enable flag on interace down
	octeontx2-af: fix infinite loop in unmapping NPC counter
	net: check all name nodes in __dev_alloc_name
	net: cdc-phonet: fix data-interface release on probe failure
	igb: check timestamp validity
	r8152: limit the RX buffer size of RTL8153A for USB 2.0
	net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes
	selinux: vsock: Set SID for socket returned by accept()
	selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value
	libbpf: Fix BTF dump of pointer-to-array-of-struct
	bpf: Fix umd memory leak in copy_process()
	can: isotp: tx-path: zero initialize outgoing CAN frames
	drm/msm: fix shutdown hook in case GPU components failed to bind
	drm/msm: Fix suspend/resume on i.MX5
	arm64: kdump: update ppos when reading elfcorehdr
	PM: runtime: Defer suspending suppliers
	net/mlx5: Add back multicast stats for uplink representor
	net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
	net/mlx5e: Offload tuple rewrite for non-CT flows
	net/mlx5e: Fix error path for ethtool set-priv-flag
	PM: EM: postpone creating the debugfs dir till fs_initcall
	net: bridge: don't notify switchdev for local FDB addresses
	octeontx2-af: Fix memory leak of object buf
	xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
	RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
	bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs
	net: Consolidate common blackhole dst ops
	net, bpf: Fix ip6ip6 crash with collect_md populated skbs
	igb: avoid premature Rx buffer reuse
	net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode
	net: axienet: Fix probe error cleanup
	net: phy: introduce phydev->port
	net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()
	net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S
	net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M
	Revert "netfilter: x_tables: Switch synchronization to RCU"
	netfilter: x_tables: Use correct memory barriers.
	dm table: Fix zoned model check and zone sectors check
	mm/mmu_notifiers: ensure range_end() is paired with range_start()
	Revert "netfilter: x_tables: Update remaining dereference to RCU"
	ACPI: scan: Rearrange memory allocation in acpi_device_add()
	ACPI: scan: Use unique number for instance_no
	perf auxtrace: Fix auxtrace queue conflict
	perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
	io_uring: fix provide_buffers sign extension
	block: recalculate segment count for multi-segment discards correctly
	scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
	scsi: qedi: Fix error return code of qedi_alloc_global_queues()
	scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
	smb3: fix cached file size problems in duplicate extents (reflink)
	cifs: Adjust key sizes and key generation routines for AES256 encryption
	locking/mutex: Fix non debug version of mutex_lock_io_nested()
	x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
	mm/memcg: fix 5.10 backport of splitting page memcg
	fs/cachefiles: Remove wait_bit_key layout dependency
	ch_ktls: fix enum-conversion warning
	can: dev: Move device back to init netns on owning netns delete
	r8169: fix DMA being used after buffer free if WoL is enabled
	net: dsa: b53: VLAN filtering is global to all users
	mac80211: fix double free in ibss_leave
	ext4: add reclaim checks to xattr code
	fs/ext4: fix integer overflow in s_log_groups_per_flex
	Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
	Revert "net: bonding: fix error return code of bond_neigh_init()"
	nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
	can: peak_usb: Revert "can: peak_usb: add forgotten supported devices"
	xen-blkback: don't leak persistent grants from xen_blkbk_map()
	Linux 5.10.27

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7eafe976fd6bf33db6db4adb8ebf2ff087294a23
2021-04-02 15:25:50 +02:00
Filipe Manana
3b87d0c583 btrfs: fix sleep while in non-sleep context during qgroup removal
commit 0bb788300990d3eb5582d3301a720f846c78925c upstream.

While removing a qgroup's sysfs entry we end up taking the kernfs_mutex,
through kobject_del(), while holding the fs_info->qgroup_lock spinlock,
producing the following trace:

  [821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
  [821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman
  [821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G        W         5.11.6 #15
  [821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020
  [821.843647] Call Trace:
  [821.843650]  dump_stack+0xa1/0xfb
  [821.843656]  ___might_sleep+0x144/0x160
  [821.843659]  mutex_lock+0x17/0x40
  [821.843662]  kernfs_remove_by_name_ns+0x1f/0x80
  [821.843666]  sysfs_remove_group+0x7d/0xe0
  [821.843668]  sysfs_remove_groups+0x28/0x40
  [821.843670]  kobject_del+0x2a/0x80
  [821.843672]  btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs]
  [821.843685]  __del_qgroup_rb+0x12/0x150 [btrfs]
  [821.843696]  btrfs_remove_qgroup+0x288/0x2a0 [btrfs]
  [821.843707]  btrfs_ioctl+0x3129/0x36a0 [btrfs]
  [821.843717]  ? __mod_lruvec_page_state+0x5e/0xb0
  [821.843719]  ? page_add_new_anon_rmap+0xbc/0x150
  [821.843723]  ? kfree+0x1b4/0x300
  [821.843725]  ? mntput_no_expire+0x55/0x330
  [821.843728]  __x64_sys_ioctl+0x5a/0xa0
  [821.843731]  do_syscall_64+0x33/0x70
  [821.843733]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [821.843736] RIP: 0033:0x4cd3fb
  [821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  [821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb
  [821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f
  [821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000
  [821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d
  [821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049

Fix this by removing the qgroup sysfs entry while not holding the spinlock,
since the spinlock is only meant for protection of the qgroup rbtree.

Reported-by: Stuart Shelton <srcshelton@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/
Fixes: 49e5fb4621 ("btrfs: qgroup: export qgroups in sysfs")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30 14:31:53 +02:00
Greg Kroah-Hartman
57b60a3a15 Merge 5.10.26 into android12-5.10-lts
Changes in 5.10.26
	ASoC: ak4458: Add MODULE_DEVICE_TABLE
	ASoC: ak5558: Add MODULE_DEVICE_TABLE
	spi: cadence: set cqspi to the driver_data field of struct device
	ALSA: dice: fix null pointer dereference when node is disconnected
	ALSA: hda/realtek: apply pin quirk for XiaomiNotebook Pro
	ALSA: hda: generic: Fix the micmute led init state
	ALSA: hda/realtek: Apply headset-mic quirks for Xiaomi Redmibook Air
	ALSA: hda/realtek: fix mute/micmute LEDs for HP 840 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP 440 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP 850 G8
	Revert "PM: runtime: Update device status before letting suppliers suspend"
	s390/vtime: fix increased steal time accounting
	s390/pci: refactor zpci_create_device()
	s390/pci: remove superfluous zdev->zbus check
	s390/pci: fix leak of PCI device structure
	zonefs: Fix O_APPEND async write handling
	zonefs: prevent use of seq files as swap file
	zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()
	btrfs: fix race when cloning extent buffer during rewind of an old root
	btrfs: fix slab cache flags for free space tree bitmap
	vhost-vdpa: fix use-after-free of v->config_ctx
	vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() fails
	drm/amd/display: Correct algorithm for reversed gamma
	ASoC: fsl_ssi: Fix TDM slot setup for I2S mode
	ASoC: Intel: bytcr_rt5640: Fix HP Pavilion x2 10-p0XX OVCD current threshold
	ASoC: SOF: Intel: unregister DMIC device on probe error
	ASoC: SOF: intel: fix wrong poll bits in dsp power down
	ASoC: qcom: sdm845: Fix array out of bounds access
	ASoC: qcom: sdm845: Fix array out of range on rx slim channels
	ASoC: codecs: wcd934x: add a sanity check in set channel map
	ASoC: qcom: lpass-cpu: Fix lpass dai ids parse
	ASoC: simple-card-utils: Do not handle device clock
	afs: Fix accessing YFS xattrs on a non-YFS server
	afs: Stop listxattr() from listing "afs.*" attributes
	ALSA: usb-audio: Fix unintentional sign extension issue
	nvme: fix Write Zeroes limitations
	nvme-tcp: fix misuse of __smp_processor_id with preemption enabled
	nvme-tcp: fix possible hang when failing to set io queues
	nvme-tcp: fix a NULL deref when receiving a 0-length r2t PDU
	nvmet: don't check iosqes,iocqes for discovery controllers
	nfsd: Don't keep looking up unhashed files in the nfsd file cache
	nfsd: don't abort copies early
	NFSD: Repair misuse of sv_lock in 5.10.16-rt30.
	NFSD: fix dest to src mount in inter-server COPY
	svcrdma: disable timeouts on rdma backchannel
	vfio: IOMMU_API should be selected
	vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation
	sunrpc: fix refcount leak for rpc auth modules
	i915/perf: Start hrtimer only if sampling the OA buffer
	pstore: Fix warning in pstore_kill_sb()
	io_uring: ensure that SQPOLL thread is started for exit
	net/qrtr: fix __netdev_alloc_skb call
	kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again
	cifs: fix allocation size on newly created files
	riscv: Correct SPARSEMEM configuration
	scsi: lpfc: Fix some error codes in debugfs
	scsi: myrs: Fix a double free in myrs_cleanup()
	scsi: ufs: ufs-mediatek: Correct operator & -> &&
	RISC-V: correct enum sbi_ext_rfence_fid
	counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED
	gpiolib: Assign fwnode to parent's if no primary one provided
	nvme-rdma: fix possible hang when failing to set io queues
	ibmvnic: add some debugs
	ibmvnic: serialize access to work queue on remove
	tty: serial: stm32-usart: Remove set but unused 'cookie' variables
	serial: stm32: fix DMA initialization error handling
	bpf: Declare __bpf_free_used_maps() unconditionally
	RDMA/rtrs: Remove unnecessary argument dir of rtrs_iu_free
	RDMA/rtrs-srv: Jump to dereg_mr label if allocate iu fails
	RDMA/rtrs: Introduce rtrs_post_send
	RDMA/rtrs: Fix KASAN: stack-out-of-bounds bug
	module: merge repetitive strings in module_sig_check()
	module: avoid *goto*s in module_sig_check()
	module: harden ELF info handling
	scsi: pm80xx: Make mpi_build_cmd locking consistent
	scsi: pm80xx: Make running_req atomic
	scsi: pm80xx: Fix pm8001_mpi_get_nvmd_resp() race condition
	scsi: pm8001: Neaten debug logging macros and uses
	scsi: libsas: Remove notifier indirection
	scsi: libsas: Introduce a _gfp() variant of event notifiers
	scsi: mvsas: Pass gfp_t flags to libsas event notifiers
	scsi: isci: Pass gfp_t flags in isci_port_link_down()
	scsi: isci: Pass gfp_t flags in isci_port_link_up()
	scsi: isci: Pass gfp_t flags in isci_port_bc_change_received()
	RDMA/mlx5: Allow creating all QPs even when non RDMA profile is used
	powerpc/sstep: Fix load-store and update emulation
	powerpc/sstep: Fix darn emulation
	i40e: Fix endianness conversions
	net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081
	MIPS: compressed: fix build with enabled UBSAN
	drm/amd/display: turn DPMS off on connector unplug
	iwlwifi: Add a new card for MA family
	io_uring: fix inconsistent lock state
	media: cedrus: h264: Support profile controls
	ibmvnic: remove excessive irqsave
	s390/qeth: schedule TX NAPI on QAOB completion
	drm/amd/pm: fulfill the Polaris implementation for get_clock_by_type_with_latency()
	io_uring: don't attempt IO reissue from the ring exit path
	io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
	net: bonding: fix error return code of bond_neigh_init()
	regulator: pca9450: Add SD_VSEL GPIO for LDO5
	regulator: pca9450: Enable system reset on WDOG_B assertion
	regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting
	gfs2: Add common helper for holding and releasing the freeze glock
	gfs2: move freeze glock outside the make_fs_rw and _ro functions
	gfs2: bypass signal_our_withdraw if no journal
	powerpc: Force inlining of cpu_has_feature() to avoid build failure
	usb-storage: Add quirk to defeat Kindle's automatic unload
	usbip: Fix incorrect double assignment to udc->ud.tcp_rx
	usb: gadget: configfs: Fix KASAN use-after-free
	usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct
	usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy-
	usb: dwc3: gadget: Allow runtime suspend if UDC unbinded
	usb: dwc3: gadget: Prevent EP queuing while stopping transfers
	thunderbolt: Initialize HopID IDAs in tb_switch_alloc()
	thunderbolt: Increase runtime PM reference count on DP tunnel discovery
	iio:adc:stm32-adc: Add HAS_IOMEM dependency
	iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel
	iio: adis16400: Fix an error code in adis16400_initial_setup()
	iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler
	iio: adc: ab8500-gpadc: Fix off by 10 to 3
	iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask
	iio: adc: adi-axi-adc: add proper Kconfig dependencies
	iio: hid-sensor-humidity: Fix alignment issue of timestamp channel
	iio: hid-sensor-prox: Fix scale not correct issue
	iio: hid-sensor-temperature: Fix issues of timestamp channel
	counter: stm32-timer-cnt: fix ceiling write max value
	counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register
	PCI: rpadlpar: Fix potential drc_name corruption in store functions
	perf/x86/intel: Fix a crash caused by zero PEBS status
	perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
	x86/ioapic: Ignore IRQ2 again
	kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
	x86: Move TS_COMPAT back to asm/thread_info.h
	x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
	efivars: respect EFI_UNSUPPORTED return from firmware
	ext4: fix error handling in ext4_end_enable_verity()
	ext4: find old entry again if failed to rename whiteout
	ext4: stop inode update before return
	ext4: do not try to set xattr into ea_inode if value is empty
	ext4: fix potential error in ext4_do_update_inode
	ext4: fix rename whiteout with fast commit
	MAINTAINERS: move some real subsystems off of the staging mailing list
	MAINTAINERS: move the staging subsystem to lists.linux.dev
	static_call: Fix static_call_update() sanity check
	efi: use 32-bit alignment for efi_guid_t literals
	firmware/efi: Fix a use after bug in efi_mem_reserve_persistent
	genirq: Disable interrupts for force threaded handlers
	x86/apic/of: Fix CPU devicetree-node lookups
	cifs: Fix preauth hash corruption
	Linux 5.10.26

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6f6bdd1dc46dc744c848e778f9edd0be558b46ac
2021-03-25 17:15:27 +01:00
David Sterba
2c8d6a9474 btrfs: fix slab cache flags for free space tree bitmap
commit 34e49994d0dcdb2d31d4d2908d04f4e9ce57e4d7 upstream.

The free space tree bitmap slab cache is created with SLAB_RED_ZONE but
that's a debugging flag and not always enabled. Also the other slabs are
created with at least SLAB_MEM_SPREAD that we want as well to average
the memory placement cost.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 3acd48507d ("btrfs: fix allocation of free space cache v1 bitmap pages")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-25 09:04:06 +01:00
Filipe Manana
38ffe9eaeb btrfs: fix race when cloning extent buffer during rewind of an old root
commit dbcc7d57bffc0c8cac9dac11bec548597d59a6a5 upstream.

While resolving backreferences, as part of a logical ino ioctl call or
fiemap, we can end up hitting a BUG_ON() when replaying tree mod log
operations of a root, triggering a stack trace like the following:

  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.c:1210!
  invalid opcode: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G        W         5.11.0-2d11c0084b02-misc-next+ #89
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0
  Code: 05 48 8d 74 10 (...)
  RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297
  RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6
  RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c
  RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20
  R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee
  R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c
  FS:  00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0
  Call Trace:
   btrfs_search_old_slot+0x265/0x10d0
   ? lock_acquired+0xbb/0x600
   ? btrfs_search_slot+0x1090/0x1090
   ? free_extent_buffer.part.61+0xd7/0x140
   ? free_extent_buffer+0x13/0x20
   resolve_indirect_refs+0x3e9/0xfc0
   ? lock_downgrade+0x3d0/0x3d0
   ? __kasan_check_read+0x11/0x20
   ? add_prelim_ref.part.11+0x150/0x150
   ? lock_downgrade+0x3d0/0x3d0
   ? __kasan_check_read+0x11/0x20
   ? lock_acquired+0xbb/0x600
   ? __kasan_check_write+0x14/0x20
   ? do_raw_spin_unlock+0xa8/0x140
   ? rb_insert_color+0x30/0x360
   ? prelim_ref_insert+0x12d/0x430
   find_parent_nodes+0x5c3/0x1830
   ? resolve_indirect_refs+0xfc0/0xfc0
   ? lock_release+0xc8/0x620
   ? fs_reclaim_acquire+0x67/0xf0
   ? lock_acquire+0xc7/0x510
   ? lock_downgrade+0x3d0/0x3d0
   ? lockdep_hardirqs_on_prepare+0x160/0x210
   ? lock_release+0xc8/0x620
   ? fs_reclaim_acquire+0x67/0xf0
   ? lock_acquire+0xc7/0x510
   ? poison_range+0x38/0x40
   ? unpoison_range+0x14/0x40
   ? trace_hardirqs_on+0x55/0x120
   btrfs_find_all_roots_safe+0x142/0x1e0
   ? find_parent_nodes+0x1830/0x1830
   ? btrfs_inode_flags_to_xflags+0x50/0x50
   iterate_extent_inodes+0x20e/0x580
   ? tree_backref_for_extent+0x230/0x230
   ? lock_downgrade+0x3d0/0x3d0
   ? read_extent_buffer+0xdd/0x110
   ? lock_downgrade+0x3d0/0x3d0
   ? __kasan_check_read+0x11/0x20
   ? lock_acquired+0xbb/0x600
   ? __kasan_check_write+0x14/0x20
   ? _raw_spin_unlock+0x22/0x30
   ? __kasan_check_write+0x14/0x20
   iterate_inodes_from_logical+0x129/0x170
   ? iterate_inodes_from_logical+0x129/0x170
   ? btrfs_inode_flags_to_xflags+0x50/0x50
   ? iterate_extent_inodes+0x580/0x580
   ? __vmalloc_node+0x92/0xb0
   ? init_data_container+0x34/0xb0
   ? init_data_container+0x34/0xb0
   ? kvmalloc_node+0x60/0x80
   btrfs_ioctl_logical_to_ino+0x158/0x230
   btrfs_ioctl+0x205e/0x4040
   ? __might_sleep+0x71/0xe0
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? getrusage+0x4b6/0x9c0
   ? __kasan_check_read+0x11/0x20
   ? lock_release+0xc8/0x620
   ? __might_fault+0x64/0xd0
   ? lock_acquire+0xc7/0x510
   ? lock_downgrade+0x3d0/0x3d0
   ? lockdep_hardirqs_on_prepare+0x210/0x210
   ? lockdep_hardirqs_on_prepare+0x210/0x210
   ? __kasan_check_read+0x11/0x20
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? lock_downgrade+0x3d0/0x3d0
   ? lockdep_hardirqs_on_prepare+0x210/0x210
   ? __kasan_check_read+0x11/0x20
   ? lock_release+0xc8/0x620
   ? __task_pid_nr_ns+0xd3/0x250
   ? lock_acquire+0xc7/0x510
   ? __fget_files+0x160/0x230
   ? __fget_light+0xf2/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fd1976e2427
  Code: 00 00 90 48 8b 05 (...)
  RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427
  RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004
  RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120
  R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004
  R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8
  Modules linked in:
  ---[ end trace ec8931a1c36e57be ]---

  (gdb) l *(__tree_mod_log_rewind+0x3b1)
  0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210).
  1205                     * the modification. as we're going backwards, we do the
  1206                     * opposite of each operation here.
  1207                     */
  1208                    switch (tm->op) {
  1209                    case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
  1210                            BUG_ON(tm->slot < n);
  1211                            fallthrough;
  1212                    case MOD_LOG_KEY_REMOVE_WHILE_MOVING:
  1213                    case MOD_LOG_KEY_REMOVE:
  1214                            btrfs_set_node_key(eb, &tm->key, tm->slot);

Here's what happens to hit that BUG_ON():

1) We have one tree mod log user (through fiemap or the logical ino ioctl),
   with a sequence number of 1, so we have fs_info->tree_mod_seq == 1;

2) Another task is at ctree.c:balance_level() and we have eb X currently as
   the root of the tree, and we promote its single child, eb Y, as the new
   root.

   Then, at ctree.c:balance_level(), we call:

      tree_mod_log_insert_root(eb X, eb Y, 1);

3) At tree_mod_log_insert_root() we create tree mod log elements for each
   slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each
   with a ->logical pointing to ebX->start. These are placed in an array
   named tm_list.
   Lets assume there are N elements (N pointers in eb X);

4) Then, still at tree_mod_log_insert_root(), we create a tree mod log
   element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to
   ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set
   to the level of eb X and ->generation set to the generation of eb X;

5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
   tm_list as argument. After that, tree_mod_log_free_eb() calls
   __tree_mod_log_insert() for each member of tm_list in reverse order,
   from highest slot in eb X, slot N - 1, to slot 0 of eb X;

6) __tree_mod_log_insert() sets the sequence number of each given tree mod
   log operation - it increments fs_info->tree_mod_seq and sets
   fs_info->tree_mod_seq as the sequence number of the given tree mod log
   operation.

   This means that for the tm_list created at tree_mod_log_insert_root(),
   the element corresponding to slot 0 of eb X has the highest sequence
   number (1 + N), and the element corresponding to the last slot has the
   lowest sequence number (2);

7) Then, after inserting tm_list's elements into the tree mod log rbtree,
   the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest
   sequence number, which is N + 2;

8) Back to ctree.c:balance_level(), we free eb X by calling
   btrfs_free_tree_block() on it. Because eb X was created in the current
   transaction, has no other references and writeback did not happen for
   it, we add it back to the free space cache/tree;

9) Later some other task T allocates the metadata extent from eb X, since
   it is marked as free space in the space cache/tree, and uses it as a
   node for some other btree;

10) The tree mod log user task calls btrfs_search_old_slot(), which calls
    get_old_root(), and finally that calls __tree_mod_log_oldest_root()
    with time_seq == 1 and eb_root == eb Y;

11) First iteration of the while loop finds the tree mod log element with
    sequence number N + 2, for the logical address of eb Y and of type
    MOD_LOG_ROOT_REPLACE;

12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out
    of the loop, and set root_logical to point to tm->old_root.logical
    which corresponds to the logical address of eb X;

13) On the next iteration of the while loop, the call to
    tree_mod_log_search_oldest() returns the smallest tree mod log element
    for the logical address of eb X, which has a sequence number of 2, an
    operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to
    the old slot N - 1 of eb X (eb X had N items in it before being freed);

14) We then break out of the while loop and return the tree mod log operation
    of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of
    eb X, to get_old_root();

15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation
    and set "logical" to the logical address of eb X, which was the old
    root. We then call tree_mod_log_search() passing it the logical
    address of eb X and time_seq == 1;

16) Then before calling tree_mod_log_search(), task T adds a key to eb X,
    which results in adding a tree mod log operation of type
    MOD_LOG_KEY_ADD to the tree mod log - this is done at
    ctree.c:insert_ptr() - but after adding the tree mod log operation
    and before updating the number of items in eb X from 0 to 1...

17) The task at get_old_root() calls tree_mod_log_search() and gets the
    tree mod log operation of type MOD_LOG_KEY_ADD just added by task T.
    Then it enters the following if branch:

    if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
       (...)
    } (...)

    Calls read_tree_block() for eb X, which gets a reference on eb X but
    does not lock it - task T has it locked.
    Then it clones eb X while it has nritems set to 0 in its header, before
    task T sets nritems to 1 in eb X's header. From hereupon we use the
    clone of eb X which no other task has access to;

18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD
    mod log operation we just got from tree_mod_log_search() in the
    previous step and the cloned version of eb X;

19) At __tree_mod_log_rewind(), we set the local variable "n" to the number
    of items set in eb X's clone, which is 0. Then we enter the while loop,
    and in its first iteration we process the MOD_LOG_KEY_ADD operation,
    which just decrements "n" from 0 to (u32)-1, since "n" is declared with
    a type of u32. At the end of this iteration we call rb_next() to find the
    next tree mod log operation for eb X, that gives us the mod log operation
    of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence
    number of N + 1 (steps 3 to 6);

20) Then we go back to the top of the while loop and trigger the following
    BUG_ON():

        (...)
        switch (tm->op) {
        case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                 BUG_ON(tm->slot < n);
                 fallthrough;
        (...)

    Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0.

Fix this by taking a read lock on the extent buffer before cloning it at
ctree.c:get_old_root(). This should be done regardless of the extent
buffer having been freed and reused, as a concurrent task might be
modifying it (while holding a write lock on it).

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/
Fixes: 834328a849 ("Btrfs: tree mod log's old roots could still be part of the tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-25 09:04:06 +01:00
Greg Kroah-Hartman
3c8179ccc6 Merge 5.10.23 into android12-5.10
Changes in 5.10.23
	ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling
	ASoC: SOF: Intel: broadwell: fix mutual exclusion with catpt driver
	nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
	parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
	arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+
	btrfs: export and rename qgroup_reserve_meta
	btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
	iommu/amd: Fix sleeping in atomic in increase_address_space()
	Bluetooth: btqca: Add valid le states quirk
	mwifiex: pcie: skip cancel_work_sync() on reset failure path
	ASoC: Intel: sof_sdw: add quirk for new TigerLake-SDCA device
	bus: ti-sysc: Implement GPMC debug quirk to drop platform data
	platform/x86: acer-wmi: Cleanup ACER_CAP_FOO defines
	platform/x86: acer-wmi: Cleanup accelerometer device handling
	platform/x86: acer-wmi: Add new force_caps module parameter
	platform/x86: acer-wmi: Add ACER_CAP_SET_FUNCTION_MODE capability flag
	platform/x86: acer-wmi: Add support for SW_TABLET_MODE on Switch devices
	platform/x86: acer-wmi: Add ACER_CAP_KBD_DOCK quirk for the Aspire Switch 10E SW3-016
	HID: mf: add support for 0079:1846 Mayflash/Dragonrise USB Gamecube Adapter
	media: cx23885: add more quirks for reset DMA on some AMD IOMMU
	ACPI: video: Add DMI quirk for GIGABYTE GB-BXBT-2807
	ASoC: Intel: bytcr_rt5640: Add quirk for ARCHOS Cesium 140
	usb: cdns3: host: add .suspend_quirk for xhci-plat.c
	usb: cdns3: host: add xhci_plat_priv quirk XHCI_SKIP_PHY_INIT
	usb: cdns3: add quirk for enable runtime pm by default
	usb: cdns3: fix NULL pointer dereference on no platform data
	PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller
	KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A32
	scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
	scsi: ufs: Add a quirk to permit overriding UniPro defaults
	misc: eeprom_93xx46: Add quirk to support Microchip 93LC46B eeprom
	scsi: ufs: Introduce a quirk to allow only page-aligned sg entries
	scsi: ufs: ufs-exynos: Apply vendor-specific values for three timeouts
	scsi: ufs: ufs-exynos: Use UFSHCD_QUIRK_ALIGN_SG_WITH_PAGE_SIZE
	drm/msm/a5xx: Remove overwriting A5XX_PC_DBG_ECO_CNTL register
	mmc: sdhci-of-dwcmshc: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN
	HID: i2c-hid: Add I2C_HID_QUIRK_NO_IRQ_AFTER_RESET for ITE8568 EC on Voyo Winpad A15
	ALSA: usb-audio: Add DJM750 to Pioneer mixer quirk
	ALSA: usb-audio: add mixer quirks for Pioneer DJM-900NXS2
	PCI: cadence: Retrain Link to work around Gen2 training defect
	ASoC: Intel: sof_sdw: reorganize quirks by generation
	ASoC: Intel: sof_sdw: add quirk for HP Spectre x360 convertible
	scsi: ufs: Fix a duplicate dev quirk number
	KVM: SVM: Clear the CR4 register on reset
	nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
	nvme-pci: add quirks for Lexar 256GB SSD
	Linux 5.10.23

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I99cf2bf0177194c562bc3e6c599a1909a0756c50
2021-03-11 19:36:03 +01:00
Greg Kroah-Hartman
6bd3d0211c Merge 5.10.22 into android12-5.10
Changes in 5.10.22
	ALSA: hda/realtek: Enable headset mic of Acer SWIFT with ALC256
	ALSA: usb-audio: use Corsair Virtuoso mapping for Corsair Virtuoso SE
	ALSA: usb-audio: Drop bogus dB range in too low level
	tpm, tpm_tis: Decorate tpm_tis_gen_interrupt() with request_locality()
	tpm, tpm_tis: Decorate tpm_get_timeouts() with request_locality()
	btrfs: avoid double put of block group when emptying cluster
	btrfs: fix raid6 qstripe kmap
	btrfs: fix race between writes to swap files and scrub
	btrfs: fix race between swap file activation and snapshot creation
	btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
	btrfs: fix race between extent freeing/allocation when using bitmaps
	btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
	btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
	btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
	btrfs: fix warning when creating a directory with smack enabled
	PM: runtime: Update device status before letting suppliers suspend
	ring-buffer: Force before_stamp and write_stamp to be different on discard
	io_uring: ignore double poll add on the same waitqueue head
	dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
	dm verity: fix FEC for RS roots unaligned to block size
	drm/amdgpu:disable VCN for Navi12 SKU
	drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
	crypto - shash: reduce minimum alignment of shash_desc structure
	arm64: mm: Move reserve_crashkernel() into mem_init()
	arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
	of/address: Introduce of_dma_get_max_cpu_address()
	of: unittest: Add test for of_dma_get_max_cpu_address()
	arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
	arm64: mm: Set ZONE_DMA size based on early IORT scan
	mm: Remove examples from enum zone_type comment
	ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits
	RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
	RDMA/rxe: Fix missing kconfig dependency on CRYPTO
	IB/mlx5: Add missing error code
	ALSA: hda: intel-nhlt: verify config type
	ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
	rsxx: Return -EFAULT if copy_to_user() fails
	iommu/vt-d: Fix status code for Allocate/Free PASID command
	Revert "arm64: dts: amlogic: add missing ethernet reset ID"
	of: unittest: Fix build on architectures without CONFIG_OF_ADDRESS
	tomoyo: recognize kernel threads correctly
	r8169: fix resuming from suspend on RTL8105e if machine runs on battery
	Linux 5.10.22

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6adb109dd12fde9737f46fe7b496c94999e5c335
2021-03-11 19:35:01 +01:00
Nikolay Borisov
bf6dd437c3 btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
commit 4d14c5cde5c268a2bc26addecf09489cb953ef64 upstream

Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965360 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 14:17:22 +01:00
Nikolay Borisov
cf9317ceb5 btrfs: export and rename qgroup_reserve_meta
commit 80e9baed722c853056e0c5374f51524593cb1031 upstream

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 14:17:22 +01:00
Filipe Manana
ae971992e9 btrfs: fix warning when creating a directory with smack enabled
commit fd57a98d6f0c98fa295813087f13afb26c224e73 upstream.

When we have smack enabled, during the creation of a directory smack may
attempt to add a "smack transmute" xattr on the inode, which results in
the following warning and trace:

  WARNING: CPU: 3 PID: 2548 at fs/btrfs/transaction.c:537 start_transaction+0x489/0x4f0
  Modules linked in: nft_objref nf_conntrack_netbios_ns (...)
  CPU: 3 PID: 2548 Comm: mkdir Not tainted 5.9.0-rc2smack+ #81
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  RIP: 0010:start_transaction+0x489/0x4f0
  Code: e9 be fc ff ff (...)
  RSP: 0018:ffffc90001887d10 EFLAGS: 00010202
  RAX: ffff88816f1e0000 RBX: 0000000000000201 RCX: 0000000000000003
  RDX: 0000000000000201 RSI: 0000000000000002 RDI: ffff888177849000
  RBP: ffff888177849000 R08: 0000000000000001 R09: 0000000000000004
  R10: ffffffff825e8f7a R11: 0000000000000003 R12: ffffffffffffffe2
  R13: 0000000000000000 R14: ffff88803d884270 R15: ffff8881680d8000
  FS:  00007f67317b8440(0000) GS:ffff88817bcc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f67247a22a8 CR3: 000000004bfbc002 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? slab_free_freelist_hook+0xea/0x1b0
   ? trace_hardirqs_on+0x1c/0xe0
   btrfs_setxattr_trans+0x3c/0xf0
   __vfs_setxattr+0x63/0x80
   smack_d_instantiate+0x2d3/0x360
   security_d_instantiate+0x29/0x40
   d_instantiate_new+0x38/0x90
   btrfs_mkdir+0x1cf/0x1e0
   vfs_mkdir+0x14f/0x200
   do_mkdirat+0x6d/0x110
   do_syscall_64+0x2d/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f673196ae6b
  Code: 8b 05 11 (...)
  RSP: 002b:00007ffc3c679b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
  RAX: ffffffffffffffda RBX: 00000000000001ff RCX: 00007f673196ae6b
  RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffc3c67a30d
  RBP: 00007ffc3c67a30d R08: 00000000000001ff R09: 0000000000000000
  R10: 000055d3e39fe930 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007ffc3c679cd8 R14: 00007ffc3c67a30d R15: 00007ffc3c679ce0
  irq event stamp: 11029
  hardirqs last  enabled at (11037): [<ffffffff81153fe6>] console_unlock+0x486/0x670
  hardirqs last disabled at (11044): [<ffffffff81153c01>] console_unlock+0xa1/0x670
  softirqs last  enabled at (8864): [<ffffffff81e0102f>] asm_call_on_stack+0xf/0x20
  softirqs last disabled at (8851): [<ffffffff81e0102f>] asm_call_on_stack+0xf/0x20

This happens because at btrfs_mkdir() we call d_instantiate_new() while
holding a transaction handle, which results in the following call chain:

  btrfs_mkdir()
     trans = btrfs_start_transaction(root, 5);

     d_instantiate_new()
        smack_d_instantiate()
            __vfs_setxattr()
                btrfs_setxattr_trans()
                   btrfs_start_transaction()
                      start_transaction()
                         WARN_ON()
                           --> a tansaction start has TRANS_EXTWRITERS
                               set in its type
                         h->orig_rsv = h->block_rsv
                         h->block_rsv = NULL

     btrfs_end_transaction(trans)

Besides the warning triggered at start_transaction, we set the handle's
block_rsv to NULL which may cause some surprises later on.

So fix this by making btrfs_setxattr_trans() not start a transaction when
we already have a handle on one, stored in current->journal_info, and use
that handle. We are good to use the handle because at btrfs_mkdir() we did
reserve space for the xattr and the inode item.

Reported-by: Casey Schaufler <casey@schaufler-ca.com>
CC: stable@vger.kernel.org # 5.4+
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://lore.kernel.org/linux-btrfs/434d856f-bd7b-4889-a6ec-e81aaebfa735@schaufler-ca.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:12 +01:00
Nikolay Borisov
e6ba61aaff btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
commit 4f6a49de64fd1b1dba5229c02047376da7cf24fd upstream.

If btrfs_qgroup_reserve_data returns an error (i.e quota limit reached)
the handling logic directly goes to the 'out' label without first
unlocking the extent range between lockstart, lockend. This results in
deadlocks as other processes try to lock the same extent.

Fixes: a7f8b1c2ac ("btrfs: file: reserve qgroup space after the hole punch range is locked")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Nikolay Borisov
37ffce9668 btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
commit 0f9c03d824f6f522d3bc43629635c9765546ebc5 upstream.

Following commit f218ea6c47 ("btrfs: delayed-inode: Remove wrong
qgroup meta reservation calls") this function now reserves num_bytes,
rather than the fixed amount of nodesize. As such this requires the
same amount to be freed in case of failure. Fix this by adjusting
the amount we are freeing.

Fixes: f218ea6c47 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Dan Carpenter
a64ad80223 btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
commit 5011c5a663b9c6d6aff3d394f11049b371199627 upstream.

The problem is we're copying "inherit" from user space but we don't
necessarily know that we're copying enough data for a 64 byte
struct.  Then the next problem is that 'inherit' has a variable size
array at the end, and we have to verify that array is the size we
expected.

Fixes: 6f72c7e20d ("Btrfs: add qgroup inheritance")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Nikolay Borisov
e82407d249 btrfs: fix race between extent freeing/allocation when using bitmaps
commit 3c17916510428dbccdf657de050c34e208347089 upstream.

During allocation the allocator will try to allocate an extent using
cluster policy. Once the current cluster is exhausted it will remove the
entry under btrfs_free_cluster::lock and subsequently acquire
btrfs_free_space_ctl::tree_lock to dispose of the already-deleted entry
and adjust btrfs_free_space_ctl::total_bitmap. This poses a problem
because there exists a race condition between removing the entry under
one lock and doing the necessary accounting holding a different lock
since extent freeing only uses the 2nd lock. This can result in the
following situation:

T1:                                    T2:
btrfs_alloc_from_cluster               insert_into_bitmap <holds tree_lock>
 if (entry->bytes == 0)                   if (block_group && !list_empty(&block_group->cluster_list)) {
    rb_erase(entry)

 spin_unlock(&cluster->lock);
   (total_bitmaps is still 4)           spin_lock(&cluster->lock);
                                         <doesn't find entry in cluster->root>
 spin_lock(&ctl->tree_lock);             <goes to new_bitmap label, adds
<blocked since T2 holds tree_lock>       <a new entry and calls add_new_bitmap>
					    recalculate_thresholds  <crashes,
                                              due to total_bitmaps
					      becoming 5 and triggering
					      an ASSERT>

To fix this ensure that once depleted, the cluster entry is deleted when
both cluster lock and tree locks are held in the allocator (T1), this
ensures that even if there is a race with a concurrent
insert_into_bitmap call it will correctly find the entry in the cluster
and add the new space to it.

CC: <stable@vger.kernel.org> # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Filipe Manana
1559d94fec btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
commit 3660d0bcdb82807d434da9d2e57d88b37331182d upstream.

When using the NO_HOLES feature, if we clone a file range that spans only
a hole into a range that is at or beyond the current i_size of the
destination file, we end up not setting the full sync runtime flag on the
inode. As a result, if we then fsync the destination file and have a power
failure, after log replay we can end up exposing stale data instead of
having a hole for that range.

The conditions for this to happen are the following:

1) We have a file with a size of, for example, 1280K;

2) There is a written (non-prealloc) extent for the file range from 1024K
   to 1280K with a length of 256K;

3) This particular file extent layout is durably persisted, so that the
   existing superblock persisted on disk points to a subvolume root where
   the file has that exact file extent layout and state;

4) The file is truncated to a smaller size, to an offset lower than the
   start offset of its last extent, for example to 800K. The truncate sets
   the full sync runtime flag on the inode;

6) Fsync the file to log it and clear the full sync runtime flag;

7) Clone a region that covers only a hole (implicit hole due to NO_HOLES)
   into the file with a destination offset that starts at or beyond the
   256K file extent item we had - for example to offset 1024K;

8) Since the clone operation does not find extents in the source range,
   we end up in the if branch at the bottom of btrfs_clone() where we
   punch a hole for the file range starting at offset 1024K by calling
   btrfs_replace_file_extents(). There we end up not setting the full
   sync flag on the inode, because we don't know we are being called in
   a clone context (and not fallocate's punch hole operation), and
   neither do we create an extent map to represent a hole because the
   requested range is beyond eof;

9) A further fsync to the file will be a fast fsync, since the clone
   operation did not set the full sync flag, and therefore it relies on
   modified extent maps to correctly log the file layout. But since
   it does not find any extent map marking the range from 1024K (the
   previous eof) to the new eof, it does not log a file extent item
   for that range representing the hole;

10) After a power failure no hole for the range starting at 1024K is
   punched and we end up exposing stale data from the old 256K extent.

Turning this into exact steps:

  $ mkfs.btrfs -f -O no-holes /dev/sdi
  $ mount /dev/sdi /mnt

  # Create our test file with 3 extents of 256K and a 256K hole at offset
  # 256K. The file has a size of 1280K.
  $ xfs_io -f -s \
              -c "pwrite -S 0xab -b 256K 0 256K" \
              -c "pwrite -S 0xcd -b 256K 512K 256K" \
              -c "pwrite -S 0xef -b 256K 768K 256K" \
              -c "pwrite -S 0x73 -b 256K 1024K 256K" \
              /mnt/sdi/foobar

  # Make sure it's durably persisted. We want the last committed super
  # block to point to this particular file extent layout.
  sync

  # Now truncate our file to a smaller size, falling within a position of
  # the second extent. This sets the full sync runtime flag on the inode.
  # Then fsync the file to log it and clear the full sync flag from the
  # inode. The third extent is no longer part of the file and therefore
  # it is not logged.
  $ xfs_io -c "truncate 800K" -c "fsync" /mnt/foobar

  # Now do a clone operation that only clones the hole and sets back the
  # file size to match the size it had before the truncate operation
  # (1280K).
  $ xfs_io \
        -c "reflink /mnt/foobar 256K 1024K 256K" \
        -c "fsync" \
        /mnt/foobar

  # File data before power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1310720

  <power fail>

  # Mount the fs again to replay the log tree.
  $ mount /dev/sdi /mnt

  # File data after power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1048576 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73
  *
  1310720

The range from 1024K to 1280K should correspond to a hole but instead it
points to stale data, to the 256K extent that should not exist after the
truncate operation.

The issue does not exists when not using NO_HOLES, because for that case
we use file extent items to represent holes, these are found and copied
during the loop that iterates over extents at btrfs_clone(), and that
causes btrfs_replace_file_extents() to be called with a non-NULL
extent_info argument and therefore set the full sync runtime flag on the
inode.

So fix this by making the code that deals with a trailing hole during
cloning, at btrfs_clone(), to set the full sync flag on the inode, if the
range starts at or beyond the current i_size.

A test case for fstests will follow soon.

Backporting notes: for kernel 5.4 the change goes to ioctl.c into
btrfs_clone before the last call to btrfs_punch_hole_range.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Filipe Manana
6fc9e5866c btrfs: fix race between swap file activation and snapshot creation
commit dd0734f2a866f9d619d4abf97c3d71bcdee40ea9 upstream.

When creating a snapshot we check if the current number of swap files, in
the root, is non-zero, and if it is, we error out and warn that we can not
create the snapshot because there are active swap files.

However this is racy because when a task started activation of a swap
file, another task might have started already snapshot creation and might
have seen the counter for the number of swap files as zero. This means
that after the swap file is activated we may end up with a snapshot of the
same root successfully created, and therefore when the first write to the
swap file happens it has to fall back into COW mode, which should never
happen for active swap files.

Basically what can happen is:

1) Task A starts snapshot creation and enters ioctl.c:create_snapshot().
   There it sees that root->nr_swapfiles has a value of 0 so it continues;

2) Task B enters btrfs_swap_activate(). It is not aware that another task
   started snapshot creation but it did not finish yet. It increments
   root->nr_swapfiles from 0 to 1;

3) Task B checks that the file meets all requirements to be an active
   swap file - it has NOCOW set, there are no snapshots for the inode's
   root at the moment, no file holes, no reflinked extents, etc;

4) Task B returns success and now the file is an active swap file;

5) Task A commits the transaction to create the snapshot and finishes.
   The swap file's extents are now shared between the original root and
   the snapshot;

6) A write into an extent of the swap file is attempted - there is a
   snapshot of the file's root, so we fall back to COW mode and therefore
   the physical location of the extent changes on disk.

So fix this by taking the snapshot lock during swap file activation before
locking the extent range, as that is the order in which we lock these
during buffered writes.

Fixes: ed46ff3d42 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Filipe Manana
501fdd1cef btrfs: fix race between writes to swap files and scrub
commit 195a49eaf655eb914896c92cecd96bc863c9feb3 upstream.

When we active a swap file, at btrfs_swap_activate(), we acquire the
exclusive operation lock to prevent the physical location of the swap
file extents to be changed by operations such as balance and device
replace/resize/remove. We also call there can_nocow_extent() which,
among other things, checks if the block group of a swap file extent is
currently RO, and if it is we can not use the extent, since a write
into it would result in COWing the extent.

However we have no protection against a scrub operation running after we
activate the swap file, which can result in the swap file extents to be
COWed while the scrub is running and operating on the respective block
group, because scrub turns a block group into RO before it processes it
and then back again to RW mode after processing it. That means an attempt
to write into a swap file extent while scrub is processing the respective
block group, will result in COWing the extent, changing its physical
location on disk.

Fix this by making sure that block groups that have extents that are used
by active swap files can not be turned into RO mode, therefore making it
not possible for a scrub to turn them into RO mode. When a scrub finds a
block group that can not be turned to RO due to the existence of extents
used by swap files, it proceeds to the next block group and logs a warning
message that mentions the block group was skipped due to active swap
files - this is the same approach we currently use for balance.

Fixes: ed46ff3d42 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:11 +01:00
Ira Weiny
b2a4876132 btrfs: fix raid6 qstripe kmap
commit d70cef0d46729808dc53f145372c02b145c92604 upstream.

When a qstripe is required an extra page is allocated and mapped.  There
were 3 problems:

1) There is no corresponding call of kunmap() for the qstripe page.
2) There is no reason to map the qstripe page more than once if the
   number of bits set in rbio->dbitmap is greater than one.
3) There is no reason to map the parity page and unmap it each time
   through the loop.

The page memory can continue to be reused with a single mapping on each
iteration by raid6_call.gen_syndrome() without remapping.  So map the
page for the duration of the loop.

Similarly, improve the algorithm by mapping the parity page just 1 time.

Fixes: 5a6ac9eacb ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4.x: c17af96554: btrfs: raid56: simplify tracking of Q stripe presence
CC: stable@vger.kernel.org # 4.4.x
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:10 +01:00
Josef Bacik
a01415e5e8 btrfs: avoid double put of block group when emptying cluster
commit 95c85fba1f64c3249c67f0078a29f8a125078189 upstream.

It's wrong calling btrfs_put_block_group in
__btrfs_return_cluster_to_free_space if the block group passed is
different than the block group the cluster represents. As this means the
cluster doesn't have a reference to the passed block group. This results
in double put and a use-after-free bug.

Fix this by simply bailing if the block group we passed in does not
match the block group on the cluster.

Fixes: fa9c0d795f ("Btrfs: rework allocation clustering")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09 11:11:10 +01:00
Greg Kroah-Hartman
28454baf9c This is the 5.10.21 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBEulcACgkQONu9yGCS
 aT5+8BAAtgxUCbSKsQOd8wObQSeasrc0DD3r3pAmf08gxVRAwtzupxmeJvNcxU6G
 XSIeKJZ0c/3JVQhh7390mZKS5IYtEFWaBjbbkwogrx9Vsn12ewDKEjfTqeOjli7B
 iaOI8qaRN/zYhEgi2J6wWn81WBKlmjWh+bB2LnX7LzOATguVehMAmLsF6KHrHBne
 ERml4aRXxrj2EwiG55BqDY1jMD6sPPECEqMhQYdSeEJR9jCkvUdpa0SCQ7UwBOav
 nS/Rmxnx35LjIPUIO4Jk+CLQJnFlsCJjjdW222yJIgFW5NeBbuN4xyONVWwXUddj
 X6prMXjKMf1fCfxQ124JwiIp98mSz+1nIhu7zMQuALbDh/S1GMufKEq/cHg57Jlh
 vliQcO4kv9Vq0GPcn667P1OAL+DXAgSyTk8Hajv+do9Cb5c3897LpuiDPgadh+F6
 c6xO6MT8zX5L54xRu2ITtaFh8LRaCgMHEFrDcBwIKGvlGwrzIvYxxuoAFIyiN/yL
 ZIfZWbs1y8XHsHC69f0B7MTC99+TPr/M+a1gnQ9c9B7HBlzGEjoTTgFFhkvKs1lr
 dLe63Z2NF5ZQhHOSW47lqBkEHhYVW+cSQdPcEUFJu5qs/xuzXyzVZWf0443nHOMN
 RLgMvIR1DgrLSoWIl8A0cfBVWd2+FUa7IBBtrz6P+RyRLe+ybQ8=
 =ZowT
 -----END PGP SIGNATURE-----

Merge 5.10.21 into android12-5.10

Changes in 5.10.21
	net: usb: qmi_wwan: support ZTE P685M modem
	Input: elantech - fix protocol errors for some trackpoints in SMBus mode
	Input: elan_i2c - add new trackpoint report type 0x5F
	drm/virtio: use kvmalloc for large allocations
	x86/build: Treat R_386_PLT32 relocation as R_386_PC32
	JFS: more checks for invalid superblock
	sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
	udlfb: Fix memory leak in dlfb_usb_probe
	media: mceusb: sanity check for prescaler value
	erofs: fix shift-out-of-bounds of blkszbits
	media: v4l2-ctrls.c: fix shift-out-of-bounds in std_validate
	xfs: Fix assert failure in xfs_setattr_size()
	net/af_iucv: remove WARN_ONCE on malformed RX packets
	smackfs: restrict bytes count in smackfs write functions
	tomoyo: ignore data race while checking quota
	net: fix up truesize of cloned skb in skb_prepare_for_shift()
	riscv: Get rid of MAX_EARLY_MAPPING_SIZE
	nbd: handle device refs for DESTROY_ON_DISCONNECT properly
	mm/hugetlb.c: fix unnecessary address expansion of pmd sharing
	RDMA/rtrs: Do not signal for heatbeat
	RDMA/rtrs-clt: Use bitmask to check sess->flags
	RDMA/rtrs-srv: Do not signal REG_MR
	tcp: fix tcp_rmem documentation
	mptcp: do not wakeup listener for MPJ subflows
	net: bridge: use switchdev for port flags set through sysfs too
	net/sched: cls_flower: Reject invalid ct_state flags rules
	net: dsa: tag_rtl4_a: Support also egress tags
	net: ag71xx: remove unnecessary MTU reservation
	net: hsr: add support for EntryForgetTime
	net: psample: Fix netlink skb length with tunnel info
	net: fix dev_ifsioc_locked() race condition
	dt-bindings: ethernet-controller: fix fixed-link specification
	dt-bindings: net: btusb: DT fix s/interrupt-name/interrupt-names/
	ASoC: qcom: Remove useless debug print
	rsi: Fix TX EAPOL packet handling against iwlwifi AP
	rsi: Move card interrupt handling to RX thread
	EDAC/amd64: Do not load on family 0x15, model 0x13
	staging: fwserial: Fix error handling in fwserial_create
	x86/reboot: Add Zotac ZBOX CI327 nano PCI reboot quirk
	vt/consolemap: do font sum unsigned
	wlcore: Fix command execute failure 19 for wl12xx
	Bluetooth: hci_h5: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for btrtl
	Bluetooth: btusb: fix memory leak on suspend and resume
	mt76: mt7615: reset token when mac_reset happens
	pktgen: fix misuse of BUG_ON() in pktgen_thread_worker()
	ath10k: fix wmi mgmt tx queue full due to race condition
	net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant
	Bluetooth: Add new HCI_QUIRK_NO_SUSPEND_NOTIFIER quirk
	Bluetooth: Fix null pointer dereference in amp_read_loc_assoc_final_data
	staging: most: sound: add sanity check for function argument
	staging: bcm2835-audio: Replace unsafe strcpy() with strscpy()
	brcmfmac: Add DMI nvram filename quirk for Predia Basic tablet
	brcmfmac: Add DMI nvram filename quirk for Voyo winpad A15 tablet
	drm/hisilicon: Fix use-after-free
	crypto: tcrypt - avoid signed overflow in byte count
	fs: make unlazy_walk() error handling consistent
	drm/amdgpu: Add check to prevent IH overflow
	PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse
	ASoC: Intel: bytcr_rt5640: Add new BYT_RT5640_NO_SPEAKERS quirk-flag
	drm/amd/display: Guard against NULL pointer deref when get_i2c_info fails
	drm/amd/amdgpu: add error handling to amdgpu_virt_read_pf2vf_data
	media: uvcvideo: Allow entities with no pads
	f2fs: handle unallocated section and zone on pinned/atgc
	f2fs: fix to set/clear I_LINKABLE under i_lock
	nvme-core: add cancel tagset helpers
	nvme-rdma: add clean action for failed reconnection
	nvme-tcp: add clean action for failed reconnection
	ASoC: Intel: Add DMI quirk table to soc_intel_is_byt_cr()
	btrfs: fix error handling in commit_fs_roots
	perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[]
	ASoC: Intel: sof-sdw: indent and add quirks consistently
	ASoC: Intel: sof_sdw: detect DMIC number based on mach params
	parisc: Bump 64-bit IRQ stack size to 64 KB
	sched/features: Fix hrtick reprogramming
	ASoC: Intel: bytcr_rt5640: Add quirk for the Estar Beauty HD MID 7316R tablet
	ASoC: Intel: bytcr_rt5640: Add quirk for the Voyo Winpad A15 tablet
	ASoC: Intel: bytcr_rt5651: Add quirk for the Jumper EZpad 7 tablet
	ASoC: Intel: bytcr_rt5640: Add quirk for the Acer One S1002 tablet
	scsi: iscsi: Restrict sessions and handles to admin capabilities
	scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
	scsi: iscsi: Verify lengths on passthrough PDUs
	Xen/gnttab: handle p2m update errors on a per-slot basis
	xen-netback: respect gnttab_map_refs()'s return value
	xen: fix p2m size in dom0 for disabled memory hotplug case
	zsmalloc: account the number of compacted pages correctly
	remoteproc/mediatek: Fix kernel test robot warning
	swap: fix swapfile read/write offset
	powerpc/sstep: Check instruction validity against ISA version before emulation
	powerpc/sstep: Fix incorrect return from analyze_instr()
	tty: fix up iterate_tty_read() EOVERFLOW handling
	tty: fix up hung_up_tty_read() conversion
	tty: clean up legacy leftovers from n_tty line discipline
	tty: teach n_tty line discipline about the new "cookie continuations"
	tty: teach the n_tty ICANON case about the new "cookie continuations" too
	media: v4l: ioctl: Fix memory leak in video_usercopy
	ALSA: hda/realtek: Add quirk for Clevo NH55RZQ
	ALSA: hda/realtek: Add quirk for Intel NUC 10
	ALSA: hda/realtek: Apply dual codec quirks for MSI Godlike X570 board
	net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround
	net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
	Linux 5.10.21

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I52b1105b73d893779b3886b577accfabe9f83a16
2021-03-07 12:53:30 +01:00
Josef Bacik
5aa2717b6b btrfs: fix error handling in commit_fs_roots
[ Upstream commit 4f4317c13a40194940acf4a71670179c4faca2b5 ]

While doing error injection I would sometimes get a corrupt file system.
This is because I was injecting errors at btrfs_search_slot, but would
only do it one time per stack.  This uncovered a problem in
commit_fs_roots, where if we get an error we would just break.  However
we're in a nested loop, the first loop being a loop to find all the
dirty fs roots, and then subsequent root updates would succeed clearing
the error value.

This isn't likely to happen in real scenarios, however we could
potentially get a random ENOMEM once and then not again, and we'd end up
with a corrupted file system.  Fix this by moving the error checking
around a bit to the main loop, as this is the only place where something
will fail, and return the error as soon as it occurs.

With this patch my reproducer no longer corrupts the file system.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-07 12:34:13 +01:00
Greg Kroah-Hartman
d8c7f0a3cd Merge 5.10.20 into android12-5.10
Changes in 5.10.20
	vmlinux.lds.h: add DWARF v5 sections
	vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
	debugfs: be more robust at handling improper input in debugfs_lookup()
	debugfs: do not attempt to create a new file before the filesystem is initalized
	scsi: libsas: docs: Remove notify_ha_event()
	scsi: qla2xxx: Fix mailbox Ch erroneous error
	kdb: Make memory allocations more robust
	w1: w1_therm: Fix conversion result for negative temperatures
	PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
	PCI: Decline to resize resources if boot config must be preserved
	virt: vbox: Do not use wait_event_interruptible when called from kernel context
	bfq: Avoid false bfq queue merging
	ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
	MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
	vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
	random: fix the RNDRESEEDCRNG ioctl
	ALSA: pcm: Call sync_stop at disconnection
	ALSA: pcm: Assure sync with the pending stop operation at suspend
	ALSA: pcm: Don't call sync_stop if it hasn't been stopped
	drm/i915/gt: One more flush for Baytrail clear residuals
	ath10k: Fix error handling in case of CE pipe init failure
	Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
	Bluetooth: hci_uart: Fix a race for write_work scheduling
	Bluetooth: Fix initializing response id after clearing struct
	arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio
	arm64: dts: renesas: beacon: Fix audio-1.8V pin enable
	ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
	ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
	ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
	ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
	ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
	ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
	arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
	arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
	memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops
	Bluetooth: hci_qca: Fix memleak in qca_controller_memdump
	staging: vchiq: Fix bulk userdata handling
	staging: vchiq: Fix bulk transfers on 64-bit builds
	arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible
	net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
	bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
	bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
	firmware: arm_scmi: Fix call site of scmi_notification_exit
	arm64: dts: allwinner: A64: properly connect USB PHY to port 0
	arm64: dts: allwinner: H6: properly connect USB PHY to port 0
	arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
	arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency
	arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
	arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors
	cpufreq: brcmstb-avs-cpufreq: Free resources in error path
	cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
	arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node
	ACPICA: Fix exception code class checks
	usb: gadget: u_audio: Free requests only after callback
	arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node
	soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model()
	soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function
	staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet
	Bluetooth: drop HCI device reference before return
	Bluetooth: Put HCI device if inquiry procedure interrupts
	memory: ti-aemif: Drop child node when jumping out loop
	ARM: dts: Configure missing thermal interrupt for 4430
	usb: dwc2: Do not update data length if it is 0 on inbound transfers
	usb: dwc2: Abort transaction after errors with unknown reason
	usb: dwc2: Make "trimming xfer length" a debug message
	staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
	x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
	arm64: dts: renesas: beacon: Fix EEPROM compatible value
	can: mcp251xfd: mcp251xfd_probe(): fix errata reference
	ARM: dts: armada388-helios4: assign pinctrl to LEDs
	ARM: dts: armada388-helios4: assign pinctrl to each fan
	arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware
	opp: Correct debug message in _opp_add_static_v2()
	Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv
	soc: qcom: ocmem: don't return NULL in of_get_ocmem
	arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
	arm64: dts: meson: fix broken wifi node for Khadas VIM3L
	iwlwifi: mvm: set enabled in the PPAG command properly
	ARM: s3c: fix fiq for clang IAS
	optee: simplify i2c access
	staging: wfx: fix possible panic with re-queued frames
	ARM: at91: use proper asm syntax in pm_suspend
	ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
	ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
	ath11k: fix a locking bug in ath11k_mac_op_start()
	soc: aspeed: snoop: Add clock control logic
	iwlwifi: mvm: fix the type we use in the PPAG table validity checks
	iwlwifi: mvm: store PPAG enabled/disabled flag properly
	iwlwifi: mvm: send stored PPAG command instead of local
	iwlwifi: mvm: assign SAR table revision to the command later
	iwlwifi: mvm: don't check if CSA event is running before removing
	bpf_lru_list: Read double-checked variable once without lock
	iwlwifi: pnvm: set the PNVM again if it was already loaded
	iwlwifi: pnvm: increment the pointer before checking the TLV
	ath9k: fix data bus crash when setting nf_override via debugfs
	selftests/bpf: Convert test_xdp_redirect.sh to bash
	ibmvnic: Set to CLOSED state even on error
	bnxt_en: reverse order of TX disable and carrier off
	bnxt_en: Fix devlink info's stored fw.psid version format.
	xen/netback: fix spurious event detection for common event case
	dpaa2-eth: fix memory leak in XDP_REDIRECT
	net: phy: consider that suspend2ram may cut off PHY power
	net/mlx5e: Don't change interrupt moderation params when DIM is enabled
	net/mlx5e: Change interrupt moderation channel params also when channels are closed
	net/mlx5: Fix health error state handling
	net/mlx5e: Replace synchronize_rcu with synchronize_net
	net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
	net/mlx5: Disable devlink reload for multi port slave device
	net/mlx5: Disallow RoCE on multi port slave device
	net/mlx5: Disallow RoCE on lag device
	net/mlx5: Disable devlink reload for lag devices
	net/mlx5e: CT: manage the lifetime of the ct entry object
	net/mlx5e: Check tunnel offload is required before setting SWP
	mac80211: fix potential overflow when multiplying to u32 integers
	libbpf: Ignore non function pointer member in struct_ops
	bpf: Fix an unitialized value in bpf_iter
	bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation
	bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
	selftests: mptcp: fix ACKRX debug message
	tcp: fix SO_RCVLOWAT related hangs under mem pressure
	net: axienet: Handle deferred probe on clock properly
	cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
	b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
	bpf: Clear subreg_def for global function return values
	ibmvnic: add memory barrier to protect long term buffer
	ibmvnic: skip send_request_unmap for timeout reset
	net: dsa: felix: perform teardown in reverse order of setup
	net: dsa: felix: don't deinitialize unused ports
	net: phy: mscc: adding LCPLL reset to VSC8514
	net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
	net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
	net: amd-xgbe: Reset link when the link never comes back
	net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
	net: mvneta: Remove per-cpu queue mapping for Armada 3700
	net: enetc: fix destroyed phylink dereference during unbind
	tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
	tty: implement read_iter
	fbdev: aty: SPARC64 requires FB_ATY_CT
	drm/gma500: Fix error return code in psb_driver_load()
	gma500: clean up error handling in init
	drm/fb-helper: Add missed unlocks in setcmap_legacy()
	drm/panel: mantix: Tweak init sequence
	drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check
	crypto: sun4i-ss - linearize buffers content must be kept
	crypto: sun4i-ss - fix kmap usage
	crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
	hwrng: ingenic - Fix a resource leak in an error handling path
	media: allegro: Fix use after free on error
	kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state()
	drm: rcar-du: Fix PM reference leak in rcar_cmm_enable()
	drm: rcar-du: Fix crash when using LVDS1 clock for CRTC
	drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node
	drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
	MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
	MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
	drm/virtio: make sure context is created in gem open
	drm/fourcc: fix Amlogic format modifier masks
	media: ipu3-cio2: Build only for x86
	media: i2c: ov5670: Fix PIXEL_RATE minimum value
	media: imx: Unregister csc/scaler only if registered
	media: imx: Fix csc/scaler unregister
	media: mtk-vcodec: fix error return code in vdec_vp9_decode()
	media: camss: missing error code in msm_video_register()
	media: vsp1: Fix an error handling path in the probe function
	media: em28xx: Fix use-after-free in em28xx_alloc_urbs
	media: media/pci: Fix memleak in empress_init
	media: tm6000: Fix memleak in tm6000_start_stream
	media: aspeed: fix error return code in aspeed_video_setup_video()
	ASoC: cs42l56: fix up error handling in probe
	ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai
	evm: Fix memleak in init_desc
	crypto: bcm - Rename struct device_private to bcm_device_private
	sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue
	drm/sun4i: tcon: fix inverted DCLK polarity
	media: imx7: csi: Fix regression for parallel cameras on i.MX6UL
	media: imx7: csi: Fix pad link validation
	media: ti-vpe: cal: fix write to unallocated memory
	MIPS: properly stop .eh_frame generation
	MIPS: Compare __SYNC_loongson3_war against 0
	drm/tegra: Fix reference leak when pm_runtime_get_sync() fails
	drm/amdgpu: toggle on DF Cstate after finishing xgmi injection
	bsg: free the request before return error code
	macintosh/adb-iop: Use big-endian autopoll mask
	drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
	drm/amd/display: Fix HDMI deep color output for DCE 6-11.
	media: software_node: Fix refcounts in software_node_get_next_child()
	media: lmedm04: Fix misuse of comma
	media: vidtv: psi: fix missing crc for PMT
	media: atomisp: Fix a buffer overflow in debug code
	media: qm1d1c0042: fix error return code in qm1d1c0042_init()
	media: cx25821: Fix a bug when reallocating some dma memory
	media: mtk-vcodec: fix argument used when DEBUG is defined
	media: pxa_camera: declare variable when DEBUG is defined
	media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
	sched/eas: Don't update misfit status if the task is pinned
	f2fs: compress: fix potential deadlock
	ASoC: qcom: lpass-cpu: Remove bit clock state check
	ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend
	perf/arm-cmn: Fix PMU instance naming
	perf/arm-cmn: Move IRQs when migrating context
	mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions()
	crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
	crypto: talitos - Fix ctr(aes) on SEC1
	drm/nouveau: bail out of nouveau_channel_new if channel init fails
	mm: proc: Invalidate TLB after clearing soft-dirty page state
	ata: ahci_brcm: Add back regulators management
	ASoC: cpcap: fix microphone timeslot mask
	ASoC: codecs: add missing max_register in regmap config
	mtd: parsers: afs: Fix freeing the part name memory in failure
	f2fs: fix to avoid inconsistent quota data
	drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
	f2fs: fix a wrong condition in __submit_bio
	ASoC: qcom: Fix typo error in HDMI regmap config callbacks
	KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
	drm/mediatek: Check if fb is null
	Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E
	locking/lockdep: Avoid unmatched unlock
	ASoC: qcom: lpass: Fix i2s ctl register bit map
	ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown
	ASoC: SOF: debug: Fix a potential issue on string buffer termination
	btrfs: clarify error returns values in __load_free_space_cache
	btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
	KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
	s390/zcrypt: return EIO when msg retry limit reached
	drm/vc4: hdmi: Move hdmi reset to bind
	drm/vc4: hdmi: Fix register offset with longer CEC messages
	drm/vc4: hdmi: Fix up CEC registers
	drm/vc4: hdmi: Restore cec physical address on reconnect
	drm/vc4: hdmi: Compute the CEC clock divider from the clock rate
	drm/vc4: hdmi: Update the CEC clock divider on HSM rate change
	drm/lima: fix reference leak in lima_pm_busy
	drm/dp_mst: Don't cache EDIDs for physical ports
	hwrng: timeriomem - Fix cooldown period calculation
	crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
	io_uring: fix possible deadlock in io_uring_poll
	nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
	nvmet-tcp: fix potential race of tcp socket closing accept_work
	nvme-multipath: set nr_zones for zoned namespaces
	nvmet: remove extra variable in identify ns
	nvmet: set status to 0 in case for invalid nsid
	ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk
	ima: Free IMA measurement buffer on error
	ima: Free IMA measurement buffer after kexec syscall
	ASoC: simple-card-utils: Fix device module clock
	fs/jfs: fix potential integer overflow on shift of a int
	jffs2: fix use after free in jffs2_sum_write_data()
	ubifs: Fix memleak in ubifs_init_authentication
	ubifs: replay: Fix high stack usage, again
	ubifs: Fix error return code in alloc_wbufs()
	irqchip/imx: IMX_INTMUX should not default to y, unconditionally
	smp: Process pending softirqs in flush_smp_call_function_from_idle()
	drm/amdgpu/display: remove hdcp_srm sysfs on device removal
	capabilities: Don't allow writing ambiguous v3 file capabilities
	HSI: Fix PM usage counter unbalance in ssi_hw_init
	power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression
	clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
	clk: meson: clk-pll: make "ret" a signed integer
	clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate()
	selftests/powerpc: Make the test check in eeh-basic.sh posix compliant
	regulator: qcom-rpmh-regulator: add pm8009-1 chip revision
	arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators
	quota: Fix memory leak when handling corrupted quota file
	i2c: iproc: handle only slave interrupts which are enabled
	i2c: iproc: update slave isr mask (ISR_MASK_SLAVE)
	i2c: iproc: handle master read request
	spi: cadence-quadspi: Abort read if dummy cycles required are too many
	clk: sunxi-ng: h6: Fix CEC clock
	clk: renesas: r8a779a0: Remove non-existent S2 clock
	clk: renesas: r8a779a0: Fix parent of CBFUSA clock
	HID: core: detect and skip invalid inputs to snto32()
	RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
	dmaengine: fsldma: Fix a resource leak in the remove function
	dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
	dmaengine: owl-dma: Fix a resource leak in the remove function
	dmaengine: hsu: disable spurious interrupt
	mfd: bd9571mwv: Use devm_mfd_add_devices()
	power: supply: cpcap-charger: Fix missing power_supply_put()
	power: supply: cpcap-battery: Fix missing power_supply_put()
	power: supply: cpcap-charger: Fix power_supply_put on null battery pointer
	fdt: Properly handle "no-map" field in the memory region
	of/fdt: Make sure no-map does not remove already reserved regions
	RDMA/rtrs: Extend ibtrs_cq_qp_create
	RDMA/rtrs-srv: Release lock before call into close_sess
	RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect
	RDMA/rtrs-clt: Set mininum limit when create QP
	RDMA/rtrs: Call kobject_put in the failure path
	RDMA/rtrs-srv: Fix missing wr_cqe
	RDMA/rtrs-clt: Refactor the failure cases in alloc_clt
	RDMA/rtrs-srv: Init wr_cnt as 1
	power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
	rtc: s5m: select REGMAP_I2C
	dmaengine: idxd: set DMA channel to be private
	power: supply: fix sbs-charger build, needs REGMAP_I2C
	clocksource/drivers/ixp4xx: Select TIMER_OF when needed
	clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
	spi: imx: Don't print error on -EPROBEDEFER
	RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
	IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
	clk: sunxi-ng: h6: Fix clock divider range on some clocks
	platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT
	platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask
	regulator: axp20x: Fix reference cout leak
	watch_queue: Drop references to /dev/watch_queue
	certs: Fix blacklist flag type confusion
	regulator: s5m8767: Fix reference count leak
	spi: atmel: Put allocated master before return
	regulator: s5m8767: Drop regulators OF node reference
	power: supply: axp20x_usb_power: Init work before enabling IRQs
	power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable
	regulator: core: Avoid debugfs: Directory ... already present! error
	isofs: release buffer head before return
	watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready
	auxdisplay: ht16k33: Fix refresh rate handling
	objtool: Fix error handling for STD/CLD warnings
	objtool: Fix retpoline detection in asm code
	objtool: Fix ".cold" section suffix check for newer versions of GCC
	scsi: lpfc: Fix ancient double free
	iommu: Switch gather->end to the inclusive end
	IB/umad: Return EIO in case of when device disassociated
	IB/umad: Return EPOLLERR in case of when device disassociated
	KVM: PPC: Make the VMX instruction emulation routines static
	powerpc/47x: Disable 256k page size
	powerpc/time: Enable sched clock for irqtime
	mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function
	mmc: sdhci-sprd: Fix some resource leaks in the remove function
	mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
	mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
	ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
	i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct
	amba: Fix resource leak for drivers without .remove
	iommu: Move iotlb_sync_map out from __iommu_map
	iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
	IB/mlx5: Return appropriate error code instead of ENOMEM
	IB/cm: Avoid a loop when device has 255 ports
	tracepoint: Do not fail unregistering a probe due to memory failure
	rtc: zynqmp: depend on HAS_IOMEM
	perf tools: Fix DSO filtering when not finding a map for a sampled address
	perf vendor events arm64: Fix Ampere eMag event typo
	RDMA/rxe: Fix coding error in rxe_recv.c
	RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt
	RDMA/rxe: Correct skb on loopback path
	spi: stm32: properly handle 0 byte transfer
	mfd: altera-sysmgr: Fix physical address storing more
	mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
	powerpc/pseries/dlpar: handle ibm, configure-connector delay status
	powerpc/8xx: Fix software emulation interrupt
	clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
	kunit: tool: fix unit test cleanup handling
	kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
	RDMA/hns: Fixed wrong judgments in the goto branch
	RDMA/siw: Fix calculation of tx_valid_cpus size
	RDMA/hns: Fix type of sq_signal_bits
	RDMA/hns: Disable RQ inline by default
	clk: divider: fix initialization with parent_hw
	spi: pxa2xx: Fix the controller numbering for Wildcat Point
	powerpc/uaccess: Avoid might_fault() when user access is enabled
	powerpc/kuap: Restore AMR after replaying soft interrupts
	regulator: qcom-rpmh: fix pm8009 ldo7
	clk: aspeed: Fix APLL calculate formula from ast2600-A2
	selftests/ftrace: Update synthetic event syntax errors
	perf symbols: Use (long) for iterator for bfd symbols
	regulator: bd718x7, bd71828, Fix dvs voltage levels
	spi: dw: Avoid stack content exposure
	spi: Skip zero-length transfers in spi_transfer_one_message()
	printk: avoid prb_first_valid_seq() where possible
	perf symbols: Fix return value when loading PE DSO
	nfsd: register pernet ops last, unregister first
	svcrdma: Hold private mutex while invoking rdma_accept()
	ceph: fix flush_snap logic after putting caps
	RDMA/hns: Fixes missing error code of CMDQ
	RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
	RDMA/rtrs-srv: Fix stack-out-of-bounds
	RDMA/rtrs: Only allow addition of path to an already established session
	RDMA/rtrs-srv: fix memory leak by missing kobject free
	RDMA/rtrs-srv-sysfs: fix missing put_device
	RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
	Input: sur40 - fix an error code in sur40_probe()
	perf record: Fix continue profiling after draining the buffer
	perf intel-pt: Fix missing CYC processing in PSB
	perf intel-pt: Fix premature IPC
	perf intel-pt: Fix IPC with CYC threshold
	perf test: Fix unaligned access in sample parsing test
	Input: elo - fix an error code in elo_connect()
	sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
	sparc: fix led.c driver when PROC_FS is not enabled
	Input: zinitix - fix return type of zinitix_init_touch()
	ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled
	misc: eeprom_93xx46: Fix module alias to enable module autoprobe
	phy: rockchip-emmc: emmc_phy_init() always return 0
	phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe()
	misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
	PCI: rcar: Always allocate MSI addresses in 32bit space
	soundwire: cadence: fix ACK/NAK handling
	pwm: rockchip: Enable APB clock during register access while probing
	pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
	pwm: rockchip: Eliminate potential race condition when probing
	PCI: xilinx-cpm: Fix reference count leak on error path
	VMCI: Use set_page_dirty_lock() when unregistering guest memory
	PCI: Align checking of syscall user config accessors
	mei: hbm: call mei_set_devstate() on hbm stop response
	drm/msm: Fix MSM_INFO_GET_IOVA with carveout
	drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
	drm/msm/mdp5: Fix wait-for-commit for cmd panels
	drm/msm: Fix race of GPU init vs timestamp power management.
	drm/msm: Fix races managing the OOB state for timestamp vs timestamps.
	drm/msm/dp: trigger unplug event in msm_dp_display_disable
	vfio/iommu_type1: Populate full dirty when detach non-pinned group
	vfio/iommu_type1: Fix some sanity checks in detach group
	vfio-pci/zdev: fix possible segmentation fault issue
	ext4: fix potential htree index checksum corruption
	phy: USB_LGM_PHY should depend on X86
	coresight: etm4x: Skip accessing TRCPDCR in save/restore
	nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
	nvmem: core: skip child nodes not matching binding
	soundwire: bus: use sdw_update_no_pm when initializing a device
	soundwire: bus: use sdw_write_no_pm when setting the bus scale registers
	soundwire: export sdw_write/read_no_pm functions
	soundwire: bus: fix confusion on device used by pm_runtime
	misc: fastrpc: fix incorrect usage of dma_map_sgtable
	remoteproc/mediatek: acknowledge watchdog IRQ after handled
	regmap: sdw: use _no_pm functions in regmap_read/write
	ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
	mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL
	device-dax: Fix default return code of range_parse()
	PCI: pci-bridge-emul: Fix array overruns, improve safety
	PCI: cadence: Fix DMA range mapping early return error
	i40e: Fix flow for IPv6 next header (extension header)
	i40e: Add zero-initialization of AQ command structures
	i40e: Fix overwriting flow control settings during driver loading
	i40e: Fix addition of RX filters after enabling FW LLDP agent
	i40e: Fix VFs not created
	Take mmap lock in cacheflush syscall
	nios2: fixed broken sys_clone syscall
	i40e: Fix add TC filter for IPv6
	octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()
	pwm: iqs620a: Fix overflow and optimize calculations
	vfio/type1: Use follow_pte()
	ice: report correct max number of TCs
	ice: Account for port VLAN in VF max packet size calculation
	ice: Fix state bits on LLDP mode switch
	ice: update the number of available RSS queues
	net: stmmac: fix CBS idleslope and sendslope calculation
	net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
	PCI: rockchip: Make 'ep-gpios' DT property optional
	vxlan: move debug check after netdev unregister
	wireguard: device: do not generate ICMP for non-IP packets
	wireguard: kconfig: use arm chacha even with no neon
	ocfs2: fix a use after free on error
	mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
	mm: memcontrol: fix slub memory accounting
	mm/memory.c: fix potential pte_unmap_unlock pte error
	mm/hugetlb: fix potential double free in hugetlb_register_node() error path
	mm/hugetlb: suppress wrong warning info when alloc gigantic page
	mm/compaction: fix misbehaviors of fast_find_migrateblock()
	r8169: fix jumbo packet handling on RTL8168e
	NFSv4: Fixes for nfs4_bitmask_adjust()
	KVM: SVM: Intercept INVPCID when it's disabled to inject #UD
	KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
	arm64: Add missing ISB after invalidating TLB in __primary_switch
	i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
	i2c: exynos5: Preserve high speed master code
	mm,thp,shmem: make khugepaged obey tmpfs mount flags
	mm: fix memory_failure() handling of dax-namespace metadata
	mm/rmap: fix potential pte_unmap on an not mapped pte
	proc: use kvzalloc for our kernel buffer
	csky: Fix a size determination in gpr_get()
	scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
	scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
	block: reopen the device in blkdev_reread_part
	ide/falconide: Fix module unload
	scsi: sd: Fix Opal support
	blk-settings: align max_sectors on "logical_block_size" boundary
	soundwire: intel: fix possible crash when no device is detected
	ACPI: property: Fix fwnode string properties matching
	ACPI: configfs: add missing check after configfs_register_default_group()
	cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
	HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
	HID: wacom: Ignore attempts to overwrite the touch_max value from HID
	Input: raydium_ts_i2c - do not send zero length
	Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
	Input: joydev - prevent potential read overflow in ioctl
	Input: i8042 - add ASUS Zenbook Flip to noselftest list
	media: mceusb: Fix potential out-of-bounds shift
	USB: serial: option: update interface mapping for ZTE P685M
	usb: musb: Fix runtime PM race in musb_queue_resume_work
	usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
	usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
	USB: serial: ftdi_sio: fix FTX sub-integer prescaler
	USB: serial: pl2303: fix line-speed handling on newer chips
	USB: serial: mos7840: fix error code in mos7840_write()
	USB: serial: mos7720: fix error code in mos7720_write()
	phy: lantiq: rcu-usb2: wait after clock enable
	ALSA: fireface: fix to parse sync status register of latter protocol
	ALSA: hda: Add another CometLake-H PCI ID
	ALSA: hda/hdmi: Drop bogus check at closing a stream
	ALSA: hda/realtek: modify EAPD in the ALC886
	ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
	MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too
	MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes
	MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target='
	Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y"
	Revert "bcache: Kill btree_io_wq"
	bcache: Give btree_io_wq correct semantics again
	bcache: Move journal work to new flush wq
	Revert "drm/amd/display: Update NV1x SR latency values"
	drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()
	drm/amd/display: Remove Assert from dcn10_get_dig_frontend
	drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
	drm/amdkfd: Fix recursive lock warnings
	drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2)
	drm/nouveau/kms: handle mDP connectors
	drm/modes: Switch to 64bit maths to avoid integer overflow
	drm/sched: Cancel and flush all outstanding jobs before finish.
	drm/panel: kd35t133: allow using non-continuous dsi clock
	drm/rockchip: Require the YTR modifier for AFBC
	ASoC: siu: Fix build error by a wrong const prefix
	selinux: fix inconsistency between inode_getxattr and inode_listsecurity
	erofs: initialized fields can only be observed after bit is set
	tpm_tis: Fix check_locality for correct locality acquisition
	tpm_tis: Clean up locality release
	KEYS: trusted: Fix incorrect handling of tpm_get_random()
	KEYS: trusted: Fix migratable=1 failing
	KEYS: trusted: Reserve TPM for seal and unseal operations
	btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
	btrfs: do not warn if we can't find the reloc root when looking up backref
	btrfs: add asserts for deleting backref cache nodes
	btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
	btrfs: fix reloc root leak with 0 ref reloc roots on recovery
	btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
	btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
	btrfs: account for new extents being deleted in total_bytes_pinned
	btrfs: fix extent buffer leak on failure to copy root
	drm/i915/gt: Flush before changing register state
	drm/i915/gt: Correct surface base address for renderclear
	crypto: arm64/sha - add missing module aliases
	crypto: aesni - prevent misaligned buffers on the stack
	crypto: michael_mic - fix broken misalignment handling
	crypto: sun4i-ss - checking sg length is not sufficient
	crypto: sun4i-ss - IV register does not work on A10 and A13
	crypto: sun4i-ss - handle BigEndian for cipher
	crypto: sun4i-ss - initialize need_fallback
	soc: samsung: exynos-asv: don't defer early on not-supported SoCs
	soc: samsung: exynos-asv: handle reading revision register error
	seccomp: Add missing return in non-void function
	arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
	misc: rtsx: init of rts522a add OCP power off when no card is present
	drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
	pstore: Fix typo in compression option name
	dts64: mt7622: fix slow sd card access
	arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2
	staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
	staging: gdm724x: Fix DMA from stack
	staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
	floppy: reintroduce O_NDELAY fix
	media: i2c: max9286: fix access to unallocated memory
	media: ir_toy: add another IR Droid device
	media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
	media: marvell-ccic: power up the device on mclk enable
	media: smipcie: fix interrupt handling and IR timeout
	x86/virt: Eat faults on VMXOFF in reboot flows
	x86/reboot: Force all cpus to exit VMX root if VMX is supported
	x86/fault: Fix AMD erratum #91 errata fixup for user code
	x86/entry: Fix instrumentation annotation
	powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan
	rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
	rcu/nocb: Perform deferred wake up before last idle's need_resched() check
	kprobes: Fix to delay the kprobes jump optimization
	arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
	iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs
	arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
	arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
	arm64 module: set plt* section addresses to 0x0
	arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
	riscv: Disable KSAN_SANITIZE for vDSO
	watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ
	watchdog: mei_wdt: request stop on unregister
	coresight: etm4x: Handle accesses to TRCSTALLCTLR
	mtd: spi-nor: sfdp: Fix last erase region marking
	mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region
	mtd: spi-nor: core: Fix erase type discovery for overlaid region
	mtd: spi-nor: core: Add erase size check for erase command initialization
	mtd: spi-nor: hisi-sfc: Put child node np on error path
	fs/affs: release old buffer head on error path
	seq_file: document how per-entry resources are managed.
	x86: fix seq_file iteration for pat/memtype.c
	mm: memcontrol: fix swap undercounting in cgroup2
	mm: memcontrol: fix get_active_memcg return value
	hugetlb: fix update_and_free_page contig page struct assumption
	hugetlb: fix copy_huge_page_from_user contig page struct assumption
	mm/vmscan: restore zone_reclaim_mode ABI
	mm, compaction: make fast_isolate_freepages() stay within zone
	KVM: nSVM: fix running nested guests when npt=0
	nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
	module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
	mmc: sdhci-esdhc-imx: fix kernel panic when remove module
	mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure
	powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
	powerpc/kexec_file: fix FDT size estimation for kdump kernel
	powerpc/32s: Add missing call to kuep_lock on syscall entry
	spmi: spmi-pmic-arb: Fix hw_irq overflow
	mei: fix transfer over dma with extended header
	mei: me: emmitsburg workstation DID
	mei: me: add adler lake point S DID
	mei: me: add adler lake point LP DID
	gpio: pcf857x: Fix missing first interrupt
	mfd: gateworks-gsc: Fix interrupt type
	printk: fix deadlock when kernel panic
	exfat: fix shift-out-of-bounds in exfat_fill_super()
	zonefs: Fix file size of zones in full condition
	kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
	thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
	cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
	cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
	cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
	proc: don't allow async path resolution of /proc/thread-self components
	s390/vtime: fix inline assembly clobber list
	virtio/s390: implement virtio-ccw revision 2 correctly
	um: mm: check more comprehensively for stub changes
	um: defer killing userspace on page table update failures
	irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
	f2fs: fix out-of-repair __setattr_copy()
	f2fs: enforce the immutable flag on open files
	f2fs: flush data when enabling checkpoint back
	sparc32: fix a user-triggerable oops in clear_user()
	spi: fsl: invert spisel_boot signal on MPC8309
	spi: spi-synquacer: fix set_cs handling
	gfs2: fix glock confusion in function signal_our_withdraw
	gfs2: Don't skip dlm unlock if glock has an lvb
	gfs2: Lock imbalance on error path in gfs2_recover_one
	gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
	dm: fix deadlock when swapping to encrypted device
	dm table: fix iterate_devices based device capability checks
	dm table: fix DAX iterate_devices based device capability checks
	dm table: fix zoned iterate_devices based device capability checks
	dm writecache: fix performance degradation in ssd mode
	dm writecache: return the exact table values that were set
	dm writecache: fix writing beyond end of underlying device when shrinking
	dm era: Recover committed writeset after crash
	dm era: Update in-core bitset after committing the metadata
	dm era: Verify the data block size hasn't changed
	dm era: Fix bitset memory leaks
	dm era: Use correct value size in equality function of writeset tree
	dm era: Reinitialize bitset cache before digesting a new writeset
	dm era: only resize metadata in preresume
	drm/i915: Reject 446-480MHz HDMI clock on GLK
	kgdb: fix to kill breakpoints on initmem after boot
	ipv6: silence compilation warning for non-IPV6 builds
	net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
	wireguard: selftests: test multiple parallel streams
	wireguard: queueing: get rid of per-peer ring buffers
	net: sched: fix police ext initialization
	net: qrtr: Fix memory leak in qrtr_tun_open
	net_sched: fix RTNL deadlock again caused by request_module()
	ARM: dts: aspeed: Add LCLK to lpc-snoop
	Linux 5.10.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
2021-03-07 12:33:33 +01:00
Filipe Manana
de3ea5be51 btrfs: fix extent buffer leak on failure to copy root
commit 72c9925f87c8b74f36f8e75a4cd93d964538d3ca upstream.

At btrfs_copy_root(), if the call to btrfs_inc_ref() fails we end up
returning without unlocking and releasing our reference on the extent
buffer named "cow" we previously allocated with btrfs_alloc_tree_block().

So fix that by unlocking the extent buffer and dropping our reference on
it before returning.

Fixes: be20aa9dba ("Btrfs: Add mount option to turn off data cow")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:30 +01:00
Josef Bacik
9a739917ef btrfs: account for new extents being deleted in total_bytes_pinned
commit 81e75ac74ecba929d1e922bf93f9fc467232e39f upstream.

My recent patch set "A variety of lock contention fixes", found here

https://lore.kernel.org/linux-btrfs/cover.1608319304.git.josef@toxicpanda.com/
(Tracked in https://github.com/btrfs/linux/issues/86)

that reduce lock contention on the extent root by running delayed refs
less often resulted in a regression in generic/371.  This test
fallocate()'s the fs until it's full, deletes all the files, and then
tries to fallocate() until full again.

Before these patches we would run all of the delayed refs during
flushing, and then would commit the transaction because we had plenty of
pinned space to recover in order to allocate.  However my patches made
it so we weren't running the delayed refs as aggressively, which meant
that we appeared to have less pinned space when we were deciding to
commit the transaction.

We use the space_info->total_bytes_pinned to approximate how much space
we have pinned.  It's approximate because if we remove a reference to an
extent we may free it, but there may be more references to it than we
know of at that point, but we account it as pinned at the creation time,
and then it's properly accounted when the delayed ref runs.

The way we account for pinned space is if the
delayed_ref_head->total_ref_mod is < 0, because that is clearly a
freeing option.  However there is another case, and that is where
->total_ref_mod == 0 && ->must_insert_reserved == 1.

When we allocate a new extent, we have ->total_ref_mod == 1 and we have
->must_insert_reserved == 1.  This is used to indicate that it is a
brand new extent and will need to have its extent entry added before we
modify any references on the delayed ref head.  But if we subsequently
remove that extent reference, our ->total_ref_mod will be 0, and that
space will be pinned and freed.  Accounting for this case properly
allows for generic/371 to pass with my delayed refs patches applied.

It's important to note that this problem exists without the referenced
patches, it just was uncovered by them.

CC: stable@vger.kernel.org # 5.10
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:30 +01:00
Josef Bacik
7ec1536e80 btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
commit 2187374f35fe9cadbddaa9fcf0c4121365d914e8 upstream.

Currently we pass things around to figure out if we maybe freeing data
based on the state of the delayed refs head.  This makes the accounting
sort of confusing and hard to follow, as it's distinctly separate from
the delayed ref heads stuff, but also depends on it entirely.

Fix this by explicitly adjusting the space_info->total_bytes_pinned in
the delayed refs code.  We now have two places where we modify this
counter, once where we create the delayed and destroy the delayed refs,
and once when we pin and unpin the extents.  This means there is a
slight overlap between delayed refs and the pin/unpin mechanisms, but
this is simply used by the ENOSPC infrastructure to determine if we need
to commit the transaction, so there's no adverse affect from this, we
might simply commit thinking it will give us enough space when it might
not.

CC: stable@vger.kernel.org # 5.10
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:30 +01:00
Josef Bacik
acaeedb193 btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
commit 938fcbfb0cbcf532a1869efab58e6009446b1ced upstream.

While doing error injection testing with my relocation patches I hit the
following assert:

  assertion failed: list_empty(&block_group->dirty_list), in fs/btrfs/block-group.c:3356
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3357!
  invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 24351 Comm: umount Tainted: G        W         5.10.0-rc3+ #193
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  RIP: 0010:assertfail.constprop.0+0x18/0x1a
  RSP: 0018:ffffa09b019c7e00 EFLAGS: 00010282
  RAX: 0000000000000056 RBX: ffff8f6492c18000 RCX: 0000000000000000
  RDX: ffff8f64fbc27c60 RSI: ffff8f64fbc19050 RDI: ffff8f64fbc19050
  RBP: ffff8f6483bbdc00 R08: 0000000000000000 R09: 0000000000000000
  R10: ffffa09b019c7c38 R11: ffffffff85d70928 R12: ffff8f6492c18100
  R13: ffff8f6492c18148 R14: ffff8f6483bbdd70 R15: dead000000000100
  FS:  00007fbfda4cdc40(0000) GS:ffff8f64fbc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fbfda666fd0 CR3: 000000013cf66002 CR4: 0000000000370ef0
  Call Trace:
   btrfs_free_block_groups.cold+0x55/0x55
   close_ctree+0x2c5/0x306
   ? fsnotify_destroy_marks+0x14/0x100
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20
   deactivate_locked_super+0x36/0xa0
   cleanup_mnt+0x12d/0x190
   task_work_run+0x5c/0xa0
   exit_to_user_mode_prepare+0x1b1/0x1d0
   syscall_exit_to_user_mode+0x54/0x280
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened because I injected an error in btrfs_cow_block() while
running the dirty block groups.  When we run the dirty block groups, we
splice the list onto a local list to process.  However if an error
occurs, we only cleanup the transactions dirty block group list, not any
pending block groups we have on our locally spliced list.

In fact if we fail to allocate a path in this function we'll also fail
to clean up the splice list.

Fix this by splicing the list back onto the transaction dirty block
group list so that the block groups are cleaned up.  Then add a 'out'
label and have the error conditions jump to out so that the errors are
handled properly.  This also has the side-effect of fixing a problem
where we would clear 'ret' on error because we unconditionally ran
btrfs_run_delayed_refs().

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:30 +01:00
Josef Bacik
c717ca57a4 btrfs: fix reloc root leak with 0 ref reloc roots on recovery
commit c78a10aebb275c38d0cfccae129a803fe622e305 upstream.

When recovering a relocation, if we run into a reloc root that has 0
refs we simply add it to the reloc_control->reloc_roots list, and then
clean it up later.  The problem with this is __del_reloc_root() doesn't
do anything if the root isn't in the radix tree, which in this case it
won't be because we never call __add_reloc_root() on the reloc_root.

This exit condition simply isn't correct really.  During normal
operation we can remove ourselves from the rb tree and then we're meant
to clean up later at merge_reloc_roots() time, and this happens
correctly.  During recovery we're depending on free_reloc_roots() to
drop our references, but we're short-circuiting.

Fix this by continuing to check if we're on the list and dropping
ourselves from the reloc_control root list and dropping our reference
appropriately.  Change the corresponding BUG_ON() to an ASSERT() that
does the correct thing if we aren't in the rb tree.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:30 +01:00
Josef Bacik
4d3edf72d6 btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
commit 867ed321f90d06aaba84e2c91de51cd3038825ef upstream.

While testing my error handling patches, I added a error injection site
at btrfs_inc_extent_ref, to validate the error handling I added was
doing the correct thing.  However I hit a pretty ugly corruption while
doing this check, with the following error injection stack trace:

btrfs_inc_extent_ref
  btrfs_copy_root
    create_reloc_root
      btrfs_init_reloc_root
	btrfs_record_root_in_trans
	  btrfs_start_transaction
	    btrfs_update_inode
	      btrfs_update_time
		touch_atime
		  file_accessed
		    btrfs_file_mmap

This is because we do not catch the error from btrfs_inc_extent_ref,
which in practice would be ENOMEM, which means we lose the extent
references for a root that has already been allocated and inserted,
which is the problem.  Fix this by aborting the transaction if we fail
to do the reference modification.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:29 +01:00
Josef Bacik
a1a5cc2548 btrfs: add asserts for deleting backref cache nodes
commit eddda68d97732ce05ca145f8e85e8a447f65cdad upstream.

A weird KASAN problem that Zygo reported could have been easily caught
if we checked for basic things in our backref freeing code.  We have two
methods of freeing a backref node

- btrfs_backref_free_node: this just is kfree() essentially.
- btrfs_backref_drop_node: this actually unlinks the node and cleans up
  everything and then calls btrfs_backref_free_node().

We should mostly be using btrfs_backref_drop_node(), to make sure the
node is properly unlinked from the backref cache, and only use
btrfs_backref_free_node() when we know the node isn't actually linked to
the backref cache.  We made a mistake here and thus got the KASAN splat.

Make this style of issue easier to find by adding some ASSERT()'s to
btrfs_backref_free_node() and adjusting our deletion stuff to properly
init the list so we can rely on list_empty() checks working properly.

  BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
  Read of size 8 at addr ffff888112402950 by task btrfs/28836

  CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ #23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  Call Trace:
   dump_stack+0xbc/0xf9
   ? btrfs_backref_cleanup_node+0x18a/0x420
   print_address_description.constprop.8+0x21/0x210
   ? record_print_text.cold.34+0x11/0x11
   ? btrfs_backref_cleanup_node+0x18a/0x420
   ? btrfs_backref_cleanup_node+0x18a/0x420
   kasan_report.cold.10+0x20/0x37
   ? btrfs_backref_cleanup_node+0x18a/0x420
   __asan_load8+0x69/0x90
   btrfs_backref_cleanup_node+0x18a/0x420
   btrfs_backref_release_cache+0x83/0x1b0
   relocate_block_group+0x394/0x780
   ? merge_reloc_roots+0x4a0/0x4a0
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   ? check_flags.part.50+0x6c/0x1e0
   ? btrfs_relocate_chunk+0x120/0x120
   ? kmem_cache_alloc_trace+0xa06/0xcb0
   ? _copy_from_user+0x83/0xc0
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   ? __kasan_check_read+0x11/0x20
   ? check_chain_key+0x1f4/0x2f0
   ? __asan_loadN+0xf/0x20
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? kvm_sched_clock_read+0x18/0x30
   ? check_chain_key+0x1f4/0x2f0
   ? lock_downgrade+0x3f0/0x3f0
   ? handle_mm_fault+0xad6/0x2150
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags+0x26/0x30
   ? lock_is_held_type+0xc3/0xf0
   ? syscall_enter_from_user_mode+0x1b/0x60
   ? do_syscall_64+0x13/0x80
   ? rcu_read_lock_sched_held+0xa1/0xd0
   ? __kasan_check_read+0x11/0x20
   ? __fget_light+0xae/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f4c4bdfe427
  RSP: 002b:00007fff33ee6df8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007fff33ee6e98 RCX: 00007f4c4bdfe427
  RDX: 00007fff33ee6e98 RSI: 00000000c4009420 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
  R10: fffffffffffff59d R11: 0000000000000202 R12: 0000000000000001
  R13: 0000000000000000 R14: 00007fff33ee8a34 R15: 0000000000000001

  Allocated by task 28836:
   kasan_save_stack+0x21/0x50
   __kasan_kmalloc.constprop.18+0xbe/0xd0
   kasan_kmalloc+0x9/0x10
   kmem_cache_alloc_trace+0x410/0xcb0
   btrfs_backref_alloc_node+0x46/0xf0
   btrfs_backref_add_tree_node+0x60d/0x11d0
   build_backref_tree+0xc5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 28836:
   kasan_save_stack+0x21/0x50
   kasan_set_track+0x20/0x30
   kasan_set_free_info+0x1f/0x30
   __kasan_slab_free+0xf3/0x140
   kasan_slab_free+0xe/0x10
   kfree+0xde/0x200
   btrfs_backref_error_cleanup+0x452/0x530
   build_backref_tree+0x1a5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff888112402900
   which belongs to the cache kmalloc-128 of size 128
  The buggy address is located 80 bytes inside of
   128-byte region [ffff888112402900, ffff888112402980)
  The buggy address belongs to the page:
  page:0000000028b1cd08 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888131c810c0 pfn:0x112402
  flags: 0x17ffe0000000200(slab)
  raw: 017ffe0000000200 ffffea000424f308 ffffea0007d572c8 ffff888100040440
  raw: ffff888131c810c0 ffff888112402000 0000000100000009 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888112402800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff888112402880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  >ffff888112402900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                   ^
   ffff888112402980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffff888112402a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Link: https://lore.kernel.org/linux-btrfs/20201208194607.GI31381@hungrycats.org/
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:29 +01:00
Josef Bacik
52f93e5ee7 btrfs: do not warn if we can't find the reloc root when looking up backref
commit f78743fbdae1bb31bc9c9233c3590a5048782381 upstream.

The backref code is looking for a reloc_root that corresponds to the
given fs root.  However any number of things could have gone wrong while
initializing that reloc_root, like ENOMEM while trying to allocate the
root itself, or EIO while trying to write the root item.  This would
result in no corresponding reloc_root being in the reloc root cache, and
thus would return NULL when we do the find_reloc_root() call.

Because of this we do not want to WARN_ON().  This presumably was meant
to catch developer errors, cases where we messed up adding the reloc
root.  However we can easily hit this case with error injection, and
thus should not do a WARN_ON().

CC: stable@vger.kernel.org # 5.10+
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:29 +01:00
Josef Bacik
02785bae77 btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
commit 7e2a870a599d4699a626ec26430c7a1ab14a2a49 upstream.

Zygo reported the following panic when testing my error handling patches
for relocation:

  kernel BUG at fs/btrfs/backref.c:2545!
  invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 3 PID: 8472 Comm: btrfs Tainted: G        W 14
  Hardware name: QEMU Standard PC (i440FX + PIIX,

  Call Trace:
   btrfs_backref_error_cleanup+0x4df/0x530
   build_backref_tree+0x1a5/0x700
   ? _raw_spin_unlock+0x22/0x30
   ? release_extent_buffer+0x225/0x280
   ? free_extent_buffer.part.52+0xd7/0x140
   relocate_tree_blocks+0x2a6/0xb60
   ? kasan_unpoison_shadow+0x35/0x50
   ? do_relocation+0xc10/0xc10
   ? kasan_kmalloc+0x9/0x10
   ? kmem_cache_alloc_trace+0x6a3/0xcb0
   ? free_extent_buffer.part.52+0xd7/0x140
   ? rb_insert_color+0x342/0x360
   ? add_tree_block.isra.36+0x236/0x2b0
   relocate_block_group+0x2eb/0x780
   ? merge_reloc_roots+0x470/0x470
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x18f0
   ? pvclock_clocksource_read+0xeb/0x190
   ? btrfs_relocate_chunk+0x120/0x120
   ? lock_contended+0x620/0x6e0
   ? do_raw_spin_lock+0x1e0/0x1e0
   ? do_raw_spin_unlock+0xa8/0x140
   btrfs_ioctl_balance+0x1f9/0x460
   btrfs_ioctl+0x24c8/0x4380
   ? __kasan_check_read+0x11/0x20
   ? check_chain_key+0x1f4/0x2f0
   ? __asan_loadN+0xf/0x20
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? kvm_sched_clock_read+0x18/0x30
   ? check_chain_key+0x1f4/0x2f0
   ? lock_downgrade+0x3f0/0x3f0
   ? handle_mm_fault+0xad6/0x2150
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags+0x26/0x30
   ? lock_is_held_type+0xc3/0xf0
   ? syscall_enter_from_user_mode+0x1b/0x60
   ? do_syscall_64+0x13/0x80
   ? rcu_read_lock_sched_held+0xa1/0xd0
   ? __kasan_check_read+0x11/0x20
   ? __fget_light+0xae/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This occurs because of this check

  if (RB_EMPTY_NODE(&upper->rb_node))
	  BUG_ON(!list_empty(&node->upper));

As we are dropping the backref node, if we discover that our upper node
in the edge we just cleaned up isn't linked into the cache that we are
now done with this node, thus the BUG_ON().

However this is an erroneous assumption, as we will look up all the
references for a node first, and then process the pending edges.  All of
the 'upper' nodes in our pending edges won't be in the cache's rb_tree
yet, because they haven't been processed.  We could very well have many
edges still left to cleanup on this node.

The fact is we simply do not need this check, we can just process all of
the edges only for this node, because below this check we do the
following

  if (list_empty(&upper->lower)) {
	  list_add_tail(&upper->lower, &cache->leaves);
	  upper->lowest = 1;
  }

If the upper node truly isn't used yet, then we add it to the
cache->leaves list to be cleaned up later.  If it is still used then the
last child node that has it linked into its node will add it to the
leaves list and then it will be cleaned up.

Fix this problem by dropping this logic altogether.  With this fix I no
longer see the panic when testing with error injection in the backref
code.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:29 +01:00
Qu Wenruo
6a402b937e btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
[ Upstream commit 951c80f83d61bd4b21794c8aba829c3c1a45c2d0 ]

Commit dbfdb6d1b3 ("Btrfs: Search for all ordered extents that could
span across a page") make btrfs_invalidapage() to search all ordered
extents.

The offending code looks like this:

  again:
	  start = page_start;
	  ordered = btrfs_lookup_ordered_range(inode, start, page_end - start + 1);
	  if (ordred) {
		  end = min(page_end,
			    ordered->file_offset + ordered->num_bytes - 1);

		  /* Do the cleanup */

		  start = end + 1;
		  if (start < page_end)
			  goto again;
	  }

The behavior is indeed necessary for the incoming subpage support, but
when it iterates through all the ordered extents, it also resets the
search range @start.

This means, for the following cases, we can double account the ordered
extents, causing its bytes_left underflow:

	Page offset
	0		16K		32K
	|<--- OE 1  --->|<--- OE 2 ---->|

As the first iteration will find ordered extent (OE) 1, which doesn't
cover the full page, thus after cleanup code, we need to retry again.
But again label will reset start to page_start, and we got OE 1 again,
which causes double accounting on OE 1, and cause OE 1's byte_left to
underflow.

This problem can only happen for subpage case, as for regular sectorsize
== PAGE_SIZE case, we will always find a OE ends at or after page end,
thus no way to trigger the problem.

Move the again label after start = page_start.  There will be more
comprehensive rework to convert the open coded loop to a proper while
loop for subpage support.

Fixes: dbfdb6d1b3 ("Btrfs: Search for all ordered extents that could span across a page")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:48 +01:00
Zhihao Cheng
006ef266c2 btrfs: clarify error returns values in __load_free_space_cache
[ Upstream commit 3cc64e7ebfb0d7faaba2438334c43466955a96e8 ]

Return value in __load_free_space_cache is not properly set after
(unlikely) memory allocation failures and 0 is returned instead.
This is not a problem for the caller load_free_space_cache because only
value 1 is considered as 'cache loaded' but for clarity it's better
to set the errors accordingly.

Fixes: a67509c300 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:47 +01:00
Greg Kroah-Hartman
59e0bda9e2 Merge 5.10.18 into android12-5.10
Changes in 5.10.18
	vdpa_sim: remove hard-coded virtq count
	vdpa_sim: add struct vdpasim_dev_attr for device attributes
	vdpa_sim: store parsed MAC address in a buffer
	vdpa_sim: make 'config' generic and usable for any device type
	vdpa_sim: add get_config callback in vdpasim_dev_attr
	IB/isert: add module param to set sg_tablesize for IO cmd
	net: qrtr: Fix port ID for control messages
	mptcp: skip to next candidate if subflow has unacked data
	net/sched: fix miss init the mru in qdisc_skb_cb
	mt76: mt7915: fix endian issues
	mt76: mt7615: fix rdd mcu cmd endianness
	net: sched: incorrect Kconfig dependencies on Netfilter modules
	net: openvswitch: fix TTL decrement exception action execution
	net: bridge: Fix a warning when del bridge sysfs
	net: fix proc_fs init handling in af_packet and tls
	Xen/x86: don't bail early from clear_foreign_p2m_mapping()
	Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
	Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
	Xen/gntdev: correct error checking in gntdev_map_grant_pages()
	xen/arm: don't ignore return errors from set_phys_to_machine
	xen-blkback: don't "handle" error by BUG()
	xen-netback: don't "handle" error by BUG()
	xen-scsiback: don't "handle" error by BUG()
	xen-blkback: fix error handling in xen_blkbk_map()
	tty: protect tty_write from odd low-level tty disciplines
	Bluetooth: btusb: Always fallback to alt 1 for WBS
	btrfs: fix backport of 2175bf57dc in 5.10.13
	btrfs: fix crash after non-aligned direct IO write with O_DSYNC
	media: pwc: Use correct device for DMA
	Linux 5.10.18

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7ef79a45f7dc711800fb62419bee1cabfad277a7
2021-02-25 07:35:32 +01:00
Filipe Manana
a6703c7115 btrfs: fix crash after non-aligned direct IO write with O_DSYNC
Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we
end up triggering an assertion and crashing. Example reproducer:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  mkfs.btrfs -f $DEV > /dev/null
  mount $DEV $MNT

  # Do a direct IO write with O_DSYNC into a non-aligned range...
  xfs_io -f -d -s -c "pwrite -S 0xab -b 64K 1111 64K" $MNT/foobar

  umount $MNT

When running the reproducer an assertion fails and produces the following
trace:

  [ 2418.403134] assertion failed: !current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA, in fs/btrfs/space-info.c:1467
  [ 2418.403745] ------------[ cut here ]------------
  [ 2418.404306] kernel BUG at fs/btrfs/ctree.h:3286!
  [ 2418.404862] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [ 2418.405451] CPU: 1 PID: 64705 Comm: xfs_io Tainted: G      D           5.10.15-btrfs-next-87 #1
  [ 2418.406026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [ 2418.407228] RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs]
  [ 2418.407835] Code: e6 48 c7 (...)
  [ 2418.409078] RSP: 0018:ffffb06080d13c98 EFLAGS: 00010246
  [ 2418.409696] RAX: 000000000000006c RBX: ffff994c1debbf08 RCX: 0000000000000000
  [ 2418.410302] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
  [ 2418.410904] RBP: ffff994c21770000 R08: 0000000000000000 R09: 0000000000000000
  [ 2418.411504] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000010000
  [ 2418.412111] R13: ffff994c22198400 R14: ffff994c21770000 R15: 0000000000000000
  [ 2418.412713] FS:  00007f54fd7aff00(0000) GS:ffff994d35200000(0000) knlGS:0000000000000000
  [ 2418.413326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 2418.413933] CR2: 000056549596d000 CR3: 000000010b928003 CR4: 0000000000370ee0
  [ 2418.414528] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 2418.415109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 2418.415669] Call Trace:
  [ 2418.416254]  btrfs_reserve_data_bytes.cold+0x22/0x22 [btrfs]
  [ 2418.416812]  btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
  [ 2418.417380]  btrfs_buffered_write+0x1b0/0x7f0 [btrfs]
  [ 2418.418315]  btrfs_file_write_iter+0x2a9/0x770 [btrfs]
  [ 2418.418920]  new_sync_write+0x11f/0x1c0
  [ 2418.419430]  vfs_write+0x2bb/0x3b0
  [ 2418.419972]  __x64_sys_pwrite64+0x90/0xc0
  [ 2418.420486]  do_syscall_64+0x33/0x80
  [ 2418.420979]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 2418.421486] RIP: 0033:0x7f54fda0b986
  [ 2418.421981] Code: 48 c7 c0 (...)
  [ 2418.423019] RSP: 002b:00007ffc40569c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
  [ 2418.423547] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54fda0b986
  [ 2418.424075] RDX: 0000000000010000 RSI: 000056549595e000 RDI: 0000000000000003
  [ 2418.424596] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000400
  [ 2418.425119] R10: 0000000000000400 R11: 0000000000000246 R12: 00000000ffffffff
  [ 2418.425644] R13: 0000000000000400 R14: 0000000000010000 R15: 0000000000000000
  [ 2418.426148] Modules linked in: btrfs blake2b_generic (...)
  [ 2418.429540] ---[ end trace ef2aeb44dc0afa34 ]---

1) At btrfs_file_write_iter() we set current->journal_info to
   BTRFS_DIO_SYNC_STUB;

2) We then call __btrfs_direct_write(), which calls btrfs_direct_IO();

3) We can't do the direct IO write because it starts at a non-aligned
   offset (1111). So at btrfs_direct_IO() we return -EINVAL (coming from
   check_direct_IO() which does the alignment check), but we leave
   current->journal_info set to BTRFS_DIO_SYNC_STUB - we only clear it
   at btrfs_dio_iomap_begin(), because we assume we always get there;

4) Then at __btrfs_direct_write() we see that the attempt to do the
   direct IO write was not successful, 0 bytes written, so we fallback
   to a buffered write by calling btrfs_buffered_write();

5) There we call btrfs_check_data_free_space() which in turn calls
   btrfs_alloc_data_chunk_ondemand() and that calls
   btrfs_reserve_data_bytes() with flush == BTRFS_RESERVE_FLUSH_DATA;

6) Then at btrfs_reserve_data_bytes() we have current->journal_info set to
   BTRFS_DIO_SYNC_STUB, therefore not NULL, and flush has the value
   BTRFS_RESERVE_FLUSH_DATA, triggering the second assertion:

  int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
                               enum btrfs_reserve_flush_enum flush)
  {
      struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
      int ret;

      ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
             flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
      ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
  (...)

So fix that by setting the journal to NULL whenever check_direct_IO()
returns a failure.

This bug only affects 5.10 kernels, and the regression was introduced in
5.10-rc1 by commit 0eb79294db ("btrfs: dio iomap DSYNC workaround").
The bug does not exist in 5.11 kernels due to commit ecfdc08b8cc65d
("btrfs: remove dio iomap DSYNC workaround"), which depends on a large
patchset that went into the merge window for 5.11. So this is a fix only
for 5.10.x stable kernels, as there are people hitting this bug.

Fixes: 0eb79294db ("btrfs: dio iomap DSYNC workaround")
CC: stable@vger.kernel.org # 5.10 (and only 5.10)
Acked-by: David Sterba <dsterba@suse.com>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1181605
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23 15:53:25 +01:00
David Sterba
aa0fd921d2 btrfs: fix backport of 2175bf57dc in 5.10.13
There's a mistake in backport of upstream commit 2175bf57dc ("btrfs:
fix possible free space tree corruption with online conversion") as
5.10.13 commit 2175bf57dc.

The enum value BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED has been added to the
wrong enum set, colliding with value of BTRFS_FS_QUOTA_ENABLE. This
could cause problems during the tree conversion, where the quotas
wouldn't be set up properly but the related code executed anyway due to
the bit set.

Link: https://lore.kernel.org/linux-btrfs/20210219111741.95DD.409509F4@e16-tech.com
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
CC: stable@vger.kernel.org # 5.10.13+
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23 15:53:25 +01:00
Greg Kroah-Hartman
cf5b2483a5 Merge 5.10.13 into android12-5.10
Changes in 5.10.13
	iwlwifi: provide gso_type to GSO packets
	nbd: freeze the queue while we're adding connections
	tty: avoid using vfs_iocb_iter_write() for redirected console writes
	ACPI: sysfs: Prefer "compatible" modalias
	ACPI: thermal: Do not call acpi_thermal_check() directly
	kernel: kexec: remove the lock operation of system_transition_mutex
	ALSA: hda/realtek: Enable headset of ASUS B1400CEPE with ALC256
	ALSA: hda/via: Apply the workaround generically for Clevo machines
	parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES
	media: cec: add stm32 driver
	media: cedrus: Fix H264 decoding
	media: hantro: Fix reset_raw_fmt initialization
	media: rc: fix timeout handling after switch to microsecond durations
	media: rc: ite-cir: fix min_timeout calculation
	media: rc: ensure that uevent can be read directly after rc device register
	ARM: dts: tbs2910: rename MMC node aliases
	ARM: dts: ux500: Reserve memory carveouts
	ARM: dts: imx6qdl-gw52xx: fix duplicate regulator naming
	wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
	x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled
	ASoC: AMD Renoir - refine DMI entries for some Lenovo products
	Revert "drm/amdgpu/swsmu: drop set_fan_speed_percent (v2)"
	drm/nouveau/kms/gk104-gp1xx: Fix > 64x64 cursors
	drm/i915: Always flush the active worker before returning from the wait
	drm/i915/gt: Always try to reserve GGTT address 0x0
	drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes
	bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
	net: usb: qmi_wwan: added support for Thales Cinterion PLSx3 modem family
	s390: uv: Fix sysfs max number of VCPUs reporting
	s390/vfio-ap: No need to disable IRQ after queue reset
	PM: hibernate: flush swap writer after marking
	x86/entry: Emit a symbol for register restoring thunk
	efi/apple-properties: Reinstate support for boolean properties
	crypto: marvel/cesa - Fix tdma descriptor on 64-bit
	drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs
	drivers: soc: atmel: add null entry at the end of at91_soc_allowed_list[]
	btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
	btrfs: fix possible free space tree corruption with online conversion
	KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
	KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
	KVM: arm64: Filter out v8.1+ events on v8.0 HW
	KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
	KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
	KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
	KVM: x86: get smi pending status correctly
	KVM: Forbid the use of tagged userspace addresses for memslots
	io_uring: fix wqe->lock/completion_lock deadlock
	xen: Fix XenStore initialisation for XS_LOCAL
	leds: trigger: fix potential deadlock with libata
	arm64: dts: broadcom: Fix USB DMA address translation for Stingray
	mt7601u: fix kernel crash unplugging the device
	mt76: mt7663s: fix rx buffer refcounting
	mt7601u: fix rx buffer refcounting
	iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
	drm/i915/gt: Clear CACHE_MODE prior to clearing residuals
	drm/i915/pmu: Don't grab wakeref when enabling events
	net/mlx5e: Fix IPSEC stats
	ARM: dts: imx6qdl-kontron-samx6i: fix pwms for lcd-backlight
	drm/nouveau/svm: fail NOUVEAU_SVM_INIT ioctl on unsupported devices
	drm/vc4: Correct lbm size and calculation
	drm/vc4: Correct POS1_SCL for hvs5
	drm/nouveau/dispnv50: Restore pushing of all data.
	drm/i915: Check for all subplatform bits
	drm/i915/selftest: Fix potential memory leak
	uapi: fix big endian definition of ipv6_rpl_sr_hdr
	KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
	tee: optee: replace might_sleep with cond_resched
	xen-blkfront: allow discard-* nodes to be optional
	blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
	clk: imx: fix Kconfig warning for i.MX SCU clk
	clk: mmp2: fix build without CONFIG_PM
	clk: qcom: gcc-sm250: Use floor ops for sdcc clks
	ARM: imx: build suspend-imx6.S with arm instruction set
	ARM: zImage: atags_to_fdt: Fix node names on added root nodes
	netfilter: nft_dynset: add timeout extension to template
	Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"
	Revert "block: simplify set_init_blocksize" to regain lost performance
	xfrm: Fix oops in xfrm_replay_advance_bmp
	xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
	selftests: xfrm: fix test return value override issue in xfrm_policy.sh
	xfrm: Fix wraparound in xfrm_policy_addr_delta()
	arm64: dts: ls1028a: fix the offset of the reset register
	ARM: imx: fix imx8m dependencies
	ARM: dts: imx6qdl-kontron-samx6i: fix i2c_lcd/cam default status
	ARM: dts: imx6qdl-sr-som: fix some cubox-i platforms
	arm64: dts: imx8mp: Correct the gpio ranges of gpio3
	firmware: imx: select SOC_BUS to fix firmware build
	RDMA/cxgb4: Fix the reported max_recv_sge value
	ASoC: dt-bindings: lpass: Fix and common up lpass dai ids
	ASoC: qcom: Fix incorrect volatile registers
	ASoC: qcom: Fix broken support to MI2S TERTIARY and QUATERNARY
	ASoC: qcom: lpass-ipq806x: fix bitwidth regmap field
	spi: altera: Fix memory leak on error path
	ASoC: Intel: Skylake: skl-topology: Fix OOPs ib skl_tplg_complete
	powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous interrupt
	pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
	pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
	ASoC: SOF: Intel: soundwire: fix select/depend unmet dependencies
	ASoC: qcom: lpass: Fix out-of-bounds DAI ID lookup
	iwlwifi: pcie: avoid potential PNVM leaks
	iwlwifi: pnvm: don't skip everything when not reloading
	iwlwifi: pnvm: don't try to load after failures
	iwlwifi: pcie: set LTR on more devices
	iwlwifi: pcie: use jiffies for memory read spin time limit
	iwlwifi: pcie: reschedule in long-running memory reads
	mac80211: pause TX while changing interface type
	ice: fix FDir IPv6 flexbyte
	ice: Implement flow for IPv6 next header (extension header)
	ice: update dev_addr in ice_set_mac_address even if HW filter exists
	ice: Don't allow more channels than LAN MSI-X available
	ice: Fix MSI-X vector fallback logic
	i40e: acquire VSI pointer only after VF is initialized
	igc: fix link speed advertising
	net/mlx5: Fix memory leak on flow table creation error flow
	net/mlx5e: E-switch, Fix rate calculation for overflow
	net/mlx5e: free page before return
	net/mlx5e: Reduce tc unsupported key print level
	net/mlx5: Maintain separate page trees for ECPF and PF functions
	net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
	net/mlx5e: Fix CT rule + encap slow path offload and deletion
	net/mlx5e: Correctly handle changing the number of queues when the interface is down
	net/mlx5e: Revert parameters on errors when changing trust state without reset
	net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
	net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
	can: dev: prevent potential information leak in can_fill_info()
	ACPI/IORT: Do not blindly trust DMA masks from firmware
	of/device: Update dma_range_map only when dev has valid dma-ranges
	iommu/amd: Use IVHD EFR for early initialization of IOMMU features
	iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()
	nvme-multipath: Early exit if no path is available
	selftests: forwarding: Specify interface when invoking mausezahn
	rxrpc: Fix memory leak in rxrpc_lookup_local
	NFC: fix resource leak when target index is invalid
	NFC: fix possible resource leak
	ASoC: mediatek: mt8183-da7219: ignore TDM DAI link by default
	ASoC: mediatek: mt8183-mt6358: ignore TDM DAI link by default
	ASoC: topology: Properly unregister DAI on removal
	ASoC: topology: Fix memory corruption in soc_tplg_denum_create_values()
	scsi: qla2xxx: Fix description for parameter ql2xenforce_iocb_limit
	team: protect features update by RCU to avoid deadlock
	tcp: make TCP_USER_TIMEOUT accurate for zero window probes
	tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN
	vsock: fix the race conditions in multi-transport support
	Linux 5.10.13

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I75f419b25f24da559e446d62f75ce6bb9b0a5396
2021-02-05 10:38:34 +01:00
Josef Bacik
2175bf57dc btrfs: fix possible free space tree corruption with online conversion
commit 2f96e40212d435b328459ba6b3956395eed8fa9f upstream.

While running btrfs/011 in a loop I would often ASSERT() while trying to
add a new free space entry that already existed, or get an EEXIST while
adding a new block to the extent tree, which is another indication of
double allocation.

This occurs because when we do the free space tree population, we create
the new root and then populate the tree and commit the transaction.
The problem is when you create a new root, the root node and commit root
node are the same.  During this initial transaction commit we will run
all of the delayed refs that were paused during the free space tree
generation, and thus begin to cache block groups.  While caching block
groups the caching thread will be reading from the main root for the
free space tree, so as we make allocations we'll be changing the free
space tree, which can cause us to add the same range twice which results
in either the ASSERT(ret != -EEXIST); in __btrfs_add_free_space, or in a
variety of different errors when running delayed refs because of a
double allocation.

Fix this by marking the fs_info as unsafe to load the free space tree,
and fall back on the old slow method.  We could be smarter than this,
for example caching the block group while we're populating the free
space tree, but since this is a serious problem I've opted for the
simplest solution.

CC: stable@vger.kernel.org # 4.9+
Fixes: a5ed918285 ("Btrfs: implement the free space B-tree")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03 23:28:40 +01:00
Su Yue
f343bf1aaf btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
commit c41ec4529d3448df8998950d7bada757a1b321cf upstream.

This effectively reverts commit d5c8238849 ("btrfs: convert
data_seqcount to seqcount_mutex_t").

While running fstests on 32 bits test box, many tests failed because of
warnings in dmesg. One of those warnings (btrfs/003):

  [66.441317] WARNING: CPU: 6 PID: 9251 at include/linux/seqlock.h:279 btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
  [66.441446] CPU: 6 PID: 9251 Comm: btrfs Tainted: G           O      5.11.0-rc4-custom+ #5
  [66.441449] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
  [66.441451] EIP: btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
  [66.441472] EAX: 00000000 EBX: 00000001 ECX: c576070c EDX: c6b15803
  [66.441475] ESI: 10000000 EDI: 00000000 EBP: c56fbcfc ESP: c56fbc70
  [66.441477] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
  [66.441481] CR0: 80050033 CR2: 05c8da20 CR3: 04b20000 CR4: 00350ed0
  [66.441485] Call Trace:
  [66.441510]  btrfs_relocate_chunk+0xb1/0x100 [btrfs]
  [66.441529]  ? btrfs_lookup_block_group+0x17/0x20 [btrfs]
  [66.441562]  btrfs_balance+0x8ed/0x13b0 [btrfs]
  [66.441586]  ? btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
  [66.441619]  ? __this_cpu_preempt_check+0xf/0x11
  [66.441643]  btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
  [66.441664]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [66.441683]  btrfs_ioctl+0x414/0x2ae0 [btrfs]
  [66.441700]  ? __lock_acquire+0x35f/0x2650
  [66.441717]  ? lockdep_hardirqs_on+0x87/0x120
  [66.441720]  ? lockdep_hardirqs_on_prepare+0xd0/0x1e0
  [66.441724]  ? call_rcu+0x2d3/0x530
  [66.441731]  ? __might_fault+0x41/0x90
  [66.441736]  ? kvm_sched_clock_read+0x15/0x50
  [66.441740]  ? sched_clock+0x8/0x10
  [66.441745]  ? sched_clock_cpu+0x13/0x180
  [66.441750]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [66.441750]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [66.441768]  __ia32_sys_ioctl+0x165/0x8a0
  [66.441773]  ? __this_cpu_preempt_check+0xf/0x11
  [66.441785]  ? __might_fault+0x89/0x90
  [66.441791]  __do_fast_syscall_32+0x54/0x80
  [66.441796]  do_fast_syscall_32+0x32/0x70
  [66.441801]  do_SYSENTER_32+0x15/0x20
  [66.441805]  entry_SYSENTER_32+0x9f/0xf2
  [66.441808] EIP: 0xab7b5549
  [66.441814] EAX: ffffffda EBX: 00000003 ECX: c4009420 EDX: bfa91f5c
  [66.441816] ESI: 00000003 EDI: 00000001 EBP: 00000000 ESP: bfa91e98
  [66.441818] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292
  [66.441833] irq event stamp: 42579
  [66.441835] hardirqs last  enabled at (42585): [<c60eb065>] console_unlock+0x495/0x590
  [66.441838] hardirqs last disabled at (42590): [<c60eafd5>] console_unlock+0x405/0x590
  [66.441840] softirqs last  enabled at (41698): [<c601b76c>] call_on_stack+0x1c/0x60
  [66.441843] softirqs last disabled at (41681): [<c601b76c>] call_on_stack+0x1c/0x60

  ========================================================================
  btrfs_remove_chunk+0x58b/0x7b0:
  __seqprop_mutex_assert at linux/./include/linux/seqlock.h:279
  (inlined by) btrfs_device_set_bytes_used at linux/fs/btrfs/volumes.h:212
  (inlined by) btrfs_remove_chunk at linux/fs/btrfs/volumes.c:2994
  ========================================================================

The warning is produced by lockdep_assert_held() in
__seqprop_mutex_assert() if CONFIG_LOCKDEP is enabled.
And "olumes.c:2994 is btrfs_device_set_bytes_used() with mutex lock
fs_info->chunk_mutex held already.

After adding some debug prints, the cause was found that many
__alloc_device() are called with NULL @fs_info (during scanning ioctl).
Inside the function, btrfs_device_data_ordered_init() is expanded to
seqcount_mutex_init().  In this scenario, its second
parameter info->chunk_mutex  is &NULL->chunk_mutex which equals
to offsetof(struct btrfs_fs_info, chunk_mutex) unexpectedly. Thus,
seqcount_mutex_init() is called in wrong way. And later
btrfs_device_get/set helpers trigger lockdep warnings.

The device and filesystem object lifetimes are different and we'd have
to synchronize initialization of the btrfs_device::data_seqcount with
the fs_info, possibly using some additional synchronization. It would
still not prevent concurrent access to the seqcount lock when it's used
for read and initialization.

Commit d5c8238849 ("btrfs: convert data_seqcount to seqcount_mutex_t")
does not mention a particular problem being fixed so revert should not
cause any harm and we'll get the lockdep warning fixed.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210139
Reported-by: Erhard F <erhard_f@mailbox.org>
Fixes: d5c8238849 ("btrfs: convert data_seqcount to seqcount_mutex_t")
CC: stable@vger.kernel.org # 5.10
CC: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03 23:28:40 +01:00
Greg Kroah-Hartman
ba152773be This is the 5.10.11 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmARRrcACgkQONu9yGCS
 aT7n8xAA0lDCM7V1v/EAYw+7ZN1HNXCZhkOZzwrbFz2SSRnu0UGRbA6B9tcdHWA4
 FX3ZgNcEaut3HgT4TYa3xZfhq6tB4w1vfHXyvcuDlgVv+0+LGbPaZiH/pq6wIE00
 s3buqbtasRIQw4UEokos6ZJCiXHr9EmbCauQYViSf8YWR5tDjbIaqUB2TI/Cw4Io
 xYYqpnHJ51KItNvOPaFoUWYA6zrD7R963aAf+vmvBfDrgcs0UgH5OTO0XY6FCYKR
 ZgVMX/1pioWTAF8TMq2pzU55FP7SlohlcISC8NQrygq8JrKHVmd9c5U/MTd4ytaz
 nBaEizEA8Ewz3eTSSED173jJ1OMA06prfx1NQ4fO90U11eTj+IG9qkLePsB7UBe8
 7FEJ8lTKgWDDwC7UzctHaI7mIyzAX5EiItRjjlln++bnQZNjjd/TBVGqXu4FAGBP
 0yBamdBrNVLVmjJQ7Wudt2iZU/gMkt9R+frq24IZLjW+5XB3L3G3S3vAm+8IYthV
 OVh23SfD/w2kPJcvw4z8ZYeOigyDd7CgoLeVwrdH6jM9N1gEEElkt6gsJugPhnxH
 odRRfUaRp+3j6M6ZCf8ai2QzsE0l3Uu+mJHdRaWiXOMmbsIuDFrYW/iNlLyHy8i8
 EZ01/chEyjs7ek8TNdkbhpHow3wI3Uvpczt/BG7OxH/F+DpTmkE=
 =tvv2
 -----END PGP SIGNATURE-----

Merge 5.10.11 into android12-5.10

Changes in 5.10.11
	scsi: target: tcmu: Fix use-after-free of se_cmd->priv
	mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
	mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
	i2c: tegra: Wait for config load atomically while in ISR
	i2c: bpmp-tegra: Ignore unknown I2C_M flags
	platform/x86: i2c-multi-instantiate: Don't create platform device for INT3515 ACPI nodes
	platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
	ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
	ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
	ALSA: hda/via: Add minimum mute flag
	crypto: xor - Fix divide error in do_xor_speed()
	dm crypt: fix copy and paste bug in crypt_alloc_req_aead
	ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
	btrfs: don't get an EINTR during drop_snapshot for reloc
	btrfs: do not double free backref nodes on error
	btrfs: fix lockdep splat in btrfs_recover_relocation
	btrfs: don't clear ret in btrfs_start_dirty_block_groups
	btrfs: send: fix invalid clone operations when cloning from the same file and root
	fs: fix lazytime expiration handling in __writeback_single_inode()
	pinctrl: ingenic: Fix JZ4760 support
	mmc: core: don't initialize block size from ext_csd if not present
	mmc: sdhci-of-dwcmshc: fix rpmb access
	mmc: sdhci-xenon: fix 1.8v regulator stabilization
	mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
	dm: avoid filesystem lookup in dm_get_dev_t()
	dm integrity: fix a crash if "recalculate" used without "internal_hash"
	dm integrity: conditionally disable "recalculate" feature
	drm/atomic: put state on error path
	drm/syncobj: Fix use-after-free
	drm/amdgpu: remove gpu info firmware of green sardine
	drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case
	drm/i915/gt: Prevent use of engine->wa_ctx after error
	drm/i915: Check for rq->hwsp validity after acquiring RCU lock
	ASoC: Intel: haswell: Add missing pm_ops
	ASoC: rt711: mutex between calibration and power state changes
	SUNRPC: Handle TCP socket sends with kernel_sendpage() again
	HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device
	HID: sony: select CONFIG_CRC32
	dm integrity: select CRYPTO_SKCIPHER
	x86/hyperv: Fix kexec panic/hang issues
	scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
	scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
	scsi: qedi: Correct max length of CHAP secret
	scsi: scsi_debug: Fix memleak in scsi_debug_init()
	scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
	riscv: Fix kernel time_init()
	riscv: Fix sifive serial driver
	riscv: Enable interrupts during syscalls with M-Mode
	HID: logitech-dj: add the G602 receiver
	HID: Ignore battery for Elan touchscreen on ASUS UX550
	clk: tegra30: Add hda clock default rates to clock driver
	ALSA: hda/tegra: fix tegra-hda on tegra30 soc
	riscv: cacheinfo: Fix using smp_processor_id() in preemptible
	arm64: make atomic helpers __always_inline
	xen: Fix event channel callback via INTX/GSI
	x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery
	x86/xen: Fix xen_hvm_smp_init() when vector callback not available
	dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
	dts: phy: add GPIO number and active state used for phy reset
	riscv: defconfig: enable gpio support for HiFive Unleashed
	drm/amdgpu/psp: fix psp gfx ctrl cmds
	drm/amd/display: disable dcn10 pipe split by default
	HID: logitech-hidpp: Add product ID for MX Ergo in Bluetooth mode
	drm/amd/display: Fix to be able to stop crc calculation
	drm/nouveau/bios: fix issue shadowing expansion ROMs
	drm/nouveau/privring: ack interrupts the same way as RM
	drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields
	drm/nouveau/mmu: fix vram heap sizing
	drm/nouveau/kms/nv50-: fix case where notifier buffer is at offset 0
	io_uring: flush timeouts that should already have expired
	libperf tests: If a test fails return non-zero
	libperf tests: Fail when failing to get a tracepoint id
	RISC-V: Set current memblock limit
	RISC-V: Fix maximum allowed phsyical memory for RV32
	x86/xen: fix 'nopvspin' build error
	nfsd: Fixes for nfsd4_encode_read_plus_data()
	nfsd: Don't set eof on a truncated READ_PLUS
	gpiolib: cdev: fix frame size warning in gpio_ioctl()
	pinctrl: aspeed: g6: Fix PWMG0 pinctrl setting
	pinctrl: mediatek: Fix fallback call path
	RDMA/ucma: Do not miss ctx destruction steps in some cases
	btrfs: print the actual offset in btrfs_root_name
	scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression
	scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEM
	scsi: ufs: Fix tm request when non-fatal error happens
	crypto: omap-sham - Fix link error without crypto-engine
	bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
	powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
	powerpc: Fix alignment bug within the init sections
	arm64: entry: remove redundant IRQ flag tracing
	bpf: Reject too big ctx_size_in for raw_tp test run
	drm/amdkfd: Fix out-of-bounds read in kdf_create_vcrat_image_cpu()
	RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()
	RDMA/cma: Fix error flow in default_roce_mode_store
	printk: ringbuffer: fix line counting
	printk: fix kmsg_dump_get_buffer length calulations
	iov_iter: fix the uaccess area in copy_compat_iovec_from_user
	i2c: octeon: check correct size of maximum RECV_LEN packet
	drm/vc4: Unify PCM card's driver_name
	platform/x86: intel-vbtn: Drop HP Stream x360 Convertible PC 11 from allow-list
	platform/x86: hp-wmi: Don't log a warning on HPWMI_RET_UNKNOWN_COMMAND errors
	gpio: sifive: select IRQ_DOMAIN_HIERARCHY rather than depend on it
	ALSA: hda: Balance runtime/system PM if direct-complete is disabled
	xsk: Clear pool even for inactive queues
	selftests: net: fib_tests: remove duplicate log test
	can: dev: can_restart: fix use after free bug
	can: vxcan: vxcan_xmit: fix use after free bug
	can: peak_usb: fix use after free bugs
	perf evlist: Fix id index for heterogeneous systems
	i2c: sprd: depend on COMMON_CLK to fix compile tests
	iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
	iio: ad5504: Fix setting power-down state
	drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
	iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
	counter:ti-eqep: remove floor
	powerpc/64s: fix scv entry fallback flush vs interrupt
	cifs: do not fail __smb_send_rqst if non-fatal signals are pending
	irqchip/mips-cpu: Set IPI domain parent chip
	x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
	x86/topology: Make __max_die_per_package available unconditionally
	x86/mmx: Use KFPU_387 for MMX string operations
	x86/setup: don't remove E820_TYPE_RAM for pfn 0
	proc_sysctl: fix oops caused by incorrect command parameters
	mm: memcg/slab: optimize objcg stock draining
	mm: memcg: fix memcg file_dirty numa stat
	mm: fix numa stats for thp migration
	io_uring: iopoll requests should also wake task ->in_idle state
	io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
	io_uring: fix short read retries for non-reg files
	intel_th: pci: Add Alder Lake-P support
	stm class: Fix module init return on allocation failure
	serial: mvebu-uart: fix tx lost characters at power off
	ehci: fix EHCI host controller initialization sequence
	USB: ehci: fix an interrupt calltrace error
	usb: gadget: aspeed: fix stop dma register setting.
	USB: gadget: dummy-hcd: Fix errors in port-reset handling
	usb: udc: core: Use lock when write to soft_connect
	usb: bdc: Make bdc pci driver depend on BROKEN
	usb: cdns3: imx: fix writing read-only memory issue
	usb: cdns3: imx: fix can't create core device the second time issue
	xhci: make sure TRB is fully written before giving it to the controller
	xhci: tegra: Delay for disabling LFPS detector
	drivers core: Free dma_range_map when driver probe failed
	driver core: Fix device link device name collision
	driver core: Extend device_is_dependent()
	drm/i915: s/intel_dp_sink_dpms/intel_dp_set_power/
	drm/i915: Only enable DFP 4:4:4->4:2:0 conversion when outputting YCbCr 4:4:4
	x86/entry: Fix noinstr fail
	x86/cpu/amd: Set __max_die_per_package on AMD
	cls_flower: call nla_ok() before nla_next()
	netfilter: rpfilter: mask ecn bits before fib lookup
	tools: gpio: fix %llu warning in gpio-event-mon.c
	tools: gpio: fix %llu warning in gpio-watch.c
	drm/i915/hdcp: Update CP property in update_pipe
	sh: dma: fix kconfig dependency for G2_DMA
	sh: Remove unused HAVE_COPY_THREAD_TLS macro
	locking/lockdep: Cure noinstr fail
	ASoC: SOF: Intel: fix page fault at probe if i915 init fails
	octeontx2-af: Fix missing check bugs in rvu_cgx.c
	net: dsa: mv88e6xxx: also read STU state in mv88e6250_g1_vtu_getnext
	selftests/powerpc: Fix exit status of pkey tests
	sh_eth: Fix power down vs. is_opened flag ordering
	nvme-pci: refactor nvme_unmap_data
	nvme-pci: fix error unwind in nvme_map_data
	cachefiles: Drop superfluous readpages aops NULL check
	lightnvm: fix memory leak when submit fails
	skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
	kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
	kasan: fix incorrect arguments passing in kasan_add_zero_shadow
	tcp: fix TCP socket rehash stats mis-accounting
	net_sched: gen_estimator: support large ewma log
	udp: mask TOS bits in udp_v4_early_demux()
	ipv6: create multicast route with RTPROT_KERNEL
	net_sched: avoid shift-out-of-bounds in tcindex_set_parms()
	net_sched: reject silly cell_log in qdisc_get_rtab()
	ipv6: set multicast flag on the multicast route
	net: mscc: ocelot: allow offloading of bridge on top of LAG
	net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
	net: dsa: b53: fix an off by one in checking "vlan->vid"
	tcp: do not mess with cloned skbs in tcp_add_backlog()
	tcp: fix TCP_USER_TIMEOUT with zero window
	net: mscc: ocelot: Fix multicast to the CPU port
	net: core: devlink: use right genl user_ptr when handling port param get/set
	pinctrl: qcom: Allow SoCs to specify a GPIO function that's not 0
	pinctrl: qcom: No need to read-modify-write the interrupt status
	pinctrl: qcom: Properly clear "intr_ack_high" interrupts when unmasking
	pinctrl: qcom: Don't clear pending interrupts when enabling
	x86/sev: Fix nonistr violation
	tty: implement write_iter
	tty: fix up hung_up_tty_write() conversion
	net: systemport: free dev before on error path
	x86/sev-es: Handle string port IO to kernel memory properly
	tcp: Fix potential use-after-free due to double kfree()
	ASoC: SOF: Intel: hda: Avoid checking jack on system suspend
	drm/i915/hdcp: Get conn while content_type changed
	bpf: Local storage helpers should check nullness of owner ptr passed
	kernfs: implement ->read_iter
	kernfs: implement ->write_iter
	kernfs: wire up ->splice_read and ->splice_write
	interconnect: imx8mq: Use icc_sync_state
	fs/pipe: allow sendfile() to pipe again
	Commit 9bb48c82aced ("tty: implement write_iter") converted the tty layer to use write_iter. Fix the redirected_tty_write declaration also in n_tty and change the comparisons to use write_iter instead of write. also in n_tty and change the comparisons to use write_iter instead of write.
	mm: fix initialization of struct page for holes in memory layout
	Revert "mm: fix initialization of struct page for holes in memory layout"
	Linux 5.10.11

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I502239d06f9bcd68b59149376f3c796c64de5942
2021-01-27 12:12:33 +01:00
Josef Bacik
dbba7a38b0 btrfs: print the actual offset in btrfs_root_name
[ Upstream commit 71008734d27f2276fcef23a5e546d358430f2d52 ]

We're supposed to print the root_key.offset in btrfs_root_name in the
case of a reloc root, not the objectid.  Fix this helper to take the key
so we have access to the offset when we need it.

Fixes: 457f1864b5 ("btrfs: pretty print leaked root name")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-27 11:55:06 +01:00
Filipe Manana
adc11110d1 btrfs: send: fix invalid clone operations when cloning from the same file and root
commit 518837e65068c385dddc0a87b3e577c8be7c13b1 upstream.

When an incremental send finds an extent that is shared, it checks which
file extent items in the range refer to that extent, and for those it
emits clone operations, while for others it emits regular write operations
to avoid corruption at the destination (as described and fixed by commit
d906d49fc5 ("Btrfs: send, fix file corruption due to incorrect cloning
operations")).

However when the root we are cloning from is the send root, we are cloning
from the inode currently being processed and the source file range has
several extent items that partially point to the desired extent, with an
offset smaller than the offset in the file extent item for the range we
want to clone into, it can cause the algorithm to issue a clone operation
that starts at the current eof of the file being processed in the receiver
side, in which case the receiver will fail, with EINVAL, when attempting
to execute the clone operation.

Example reproducer:

  $ cat test-send-clone.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV >/dev/null
  mount $DEV $MNT

  # Create our test file with a single and large extent (1M) and with
  # different content for different file ranges that will be reflinked
  # later.
  xfs_io -f \
         -c "pwrite -S 0xab 0 128K" \
         -c "pwrite -S 0xcd 128K 128K" \
         -c "pwrite -S 0xef 256K 256K" \
         -c "pwrite -S 0x1a 512K 512K" \
         $MNT/foobar

  btrfs subvolume snapshot -r $MNT $MNT/snap1
  btrfs send -f /tmp/snap1.send $MNT/snap1

  # Now do a series of changes to our file such that we end up with
  # different parts of the extent reflinked into different file offsets
  # and we overwrite a large part of the extent too, so no file extent
  # items refer to that part that was overwritten. This used to confuse
  # the algorithm used by the kernel to figure out which file ranges to
  # clone, making it attempt to clone from a source range starting at
  # the current eof of the file, resulting in the receiver to fail since
  # it is an invalid clone operation.
  #
  xfs_io -c "reflink $MNT/foobar 64K 1M 960K" \
         -c "reflink $MNT/foobar 0K 512K 256K" \
         -c "reflink $MNT/foobar 512K 128K 256K" \
         -c "pwrite -S 0x73 384K 640K" \
         $MNT/foobar

  btrfs subvolume snapshot -r $MNT $MNT/snap2
  btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2

  echo -e "\nFile digest in the original filesystem:"
  md5sum $MNT/snap2/foobar

  # Now unmount the filesystem, create a new one, mount it and try to
  # apply both send streams to recreate both snapshots.
  umount $DEV

  mkfs.btrfs -f $DEV >/dev/null
  mount $DEV $MNT

  btrfs receive -f /tmp/snap1.send $MNT
  btrfs receive -f /tmp/snap2.send $MNT

  # Must match what we got in the original filesystem of course.
  echo -e "\nFile digest in the new filesystem:"
  md5sum $MNT/snap2/foobar

  umount $MNT

When running the reproducer, the incremental send operation fails due to
an invalid clone operation:

  $ ./test-send-clone.sh
  wrote 131072/131072 bytes at offset 0
  128 KiB, 32 ops; 0.0015 sec (80.906 MiB/sec and 20711.9741 ops/sec)
  wrote 131072/131072 bytes at offset 131072
  128 KiB, 32 ops; 0.0013 sec (90.514 MiB/sec and 23171.6148 ops/sec)
  wrote 262144/262144 bytes at offset 262144
  256 KiB, 64 ops; 0.0025 sec (98.270 MiB/sec and 25157.2327 ops/sec)
  wrote 524288/524288 bytes at offset 524288
  512 KiB, 128 ops; 0.0052 sec (95.730 MiB/sec and 24506.9883 ops/sec)
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  linked 983040/983040 bytes at offset 1048576
  960 KiB, 1 ops; 0.0006 sec (1.419 GiB/sec and 1550.3876 ops/sec)
  linked 262144/262144 bytes at offset 524288
  256 KiB, 1 ops; 0.0020 sec (120.192 MiB/sec and 480.7692 ops/sec)
  linked 262144/262144 bytes at offset 131072
  256 KiB, 1 ops; 0.0018 sec (133.833 MiB/sec and 535.3319 ops/sec)
  wrote 655360/655360 bytes at offset 393216
  640 KiB, 160 ops; 0.0093 sec (66.781 MiB/sec and 17095.8436 ops/sec)
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2

  File digest in the original filesystem:
  9c13c61cb0b9f5abf45344375cb04dfa  /mnt/sdi/snap2/foobar
  At subvol snap1
  At snapshot snap2
  ERROR: failed to clone extents to foobar: Invalid argument

  File digest in the new filesystem:
  132f0396da8f48d2e667196bff882cfc  /mnt/sdi/snap2/foobar

The clone operation is invalid because its source range starts at the
current eof of the file in the receiver, causing the receiver to get
an EINVAL error from the clone operation when attempting it.

For the example above, what happens is the following:

1) When processing the extent at file offset 1M, the algorithm checks that
   the extent is shared and can be (fully or partially) found at file
   offset 0.

   At this point the file has a size (and eof) of 1M at the receiver;

2) It finds that our extent item at file offset 1M has a data offset of
   64K and, since the file extent item at file offset 0 has a data offset
   of 0, it issues a clone operation, from the same file and root, that
   has a source range offset of 64K, destination offset of 1M and a length
   of 64K, since the extent item at file offset 0 refers only to the first
   128K of the shared extent.

   After this clone operation, the file size (and eof) at the receiver is
   increased from 1M to 1088K (1M + 64K);

3) Now there's still 896K (960K - 64K) of data left to clone or write, so
   it checks for the next file extent item, which starts at file offset
   128K. This file extent item has a data offset of 0 and a length of
   256K, so a clone operation with a source range offset of 256K, a
   destination offset of 1088K (1M + 64K) and length of 128K is issued.

   After this operation the file size (and eof) at the receiver increases
   from 1088K to 1216K (1088K + 128K);

4) Now there's still 768K (896K - 128K) of data left to clone or write, so
   it checks for the next file extent item, located at file offset 384K.
   This file extent item points to a different extent, not the one we want
   to clone, with a length of 640K. So we issue a write operation into the
   file range 1216K (1088K + 128K, end of the last clone operation), with
   a length of 640K and with a data matching the one we can find for that
   range in send root.

   After this operation, the file size (and eof) at the receiver increases
   from 1216K to 1856K (1216K + 640K);

5) Now there's still 128K (768K - 640K) of data left to clone or write, so
   we look into the file extent item, which is for file offset 1M and it
   points to the extent we want to clone, with a data offset of 64K and a
   length of 960K.

   However this matches the file offset we started with, the start of the
   range to clone into. So we can't for sure find any file extent item
   from here onwards with the rest of the data we want to clone, yet we
   proceed and since the file extent item points to the shared extent,
   with a data offset of 64K, we issue a clone operation with a source
   range starting at file offset 1856K, which matches the file extent
   item's offset, 1M, plus the amount of data cloned and written so far,
   which is 64K (step 2) + 128K (step 3) + 640K (step 4). This clone
   operation is invalid since the source range offset matches the current
   eof of the file in the receiver. We should have stopped looking for
   extents to clone at this point and instead fallback to write, which
   would simply the contain the data in the file range from 1856K to
   1856K + 128K.

So fix this by stopping the loop that looks for file ranges to clone at
clone_range() when we reach the current eof of the file being processed,
if we are cloning from the same file and using the send root as the clone
root. This ensures any data not yet cloned will be sent to the receiver
through a write operation.

A test case for fstests will follow soon.

Reported-by: Massimo B. <massimo.b@gmx.net>
Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/
Fixes: 11f2069c11 ("Btrfs: send, allow clone operations within the same file")
CC: stable@vger.kernel.org # 5.5+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:54:53 +01:00
Josef Bacik
018abb5089 btrfs: don't clear ret in btrfs_start_dirty_block_groups
commit 34d1eb0e599875064955a74712f08ff14c8e3d5f upstream.

If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly.  Fix this by only running the delayed refs
if there was no failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:54:53 +01:00
Josef Bacik
14e17e90bf btrfs: fix lockdep splat in btrfs_recover_relocation
commit fb286100974e7239af243bc2255a52f29442f9c8 upstream.

While testing the error paths of relocation I hit the following lockdep
splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.10.0-rc6+ #217 Not tainted
  ------------------------------------------------------
  mount/779 is trying to acquire lock:
  ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340

  but task is already holding lock:
  ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (btrfs-root-00){++++}-{3:3}:
	 down_read_nested+0x43/0x130
	 __btrfs_tree_read_lock+0x27/0x100
	 btrfs_read_lock_root_node+0x31/0x40
	 btrfs_search_slot+0x462/0x8f0
	 btrfs_update_root+0x55/0x2b0
	 btrfs_drop_snapshot+0x398/0x750
	 clean_dirty_subvols+0xdf/0x120
	 btrfs_recover_relocation+0x534/0x5a0
	 btrfs_start_pre_rw_mount+0xcb/0x170
	 open_ctree+0x151f/0x1726
	 btrfs_mount_root.cold+0x12/0xea
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 vfs_kern_mount.part.0+0x71/0xb0
	 btrfs_mount+0x10d/0x380
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 path_mount+0x433/0xc10
	 __x64_sys_mount+0xe3/0x120
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (sb_internal#2){.+.+}-{0:0}:
	 start_transaction+0x444/0x700
	 insert_balance_item.isra.0+0x37/0x320
	 btrfs_balance+0x354/0xf40
	 btrfs_ioctl_balance+0x2cf/0x380
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}:
	 __lock_acquire+0x1120/0x1e10
	 lock_acquire+0x116/0x370
	 __mutex_lock+0x7e/0x7b0
	 btrfs_recover_balance+0x2f0/0x340
	 open_ctree+0x1095/0x1726
	 btrfs_mount_root.cold+0x12/0xea
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 vfs_kern_mount.part.0+0x71/0xb0
	 btrfs_mount+0x10d/0x380
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 path_mount+0x433/0xc10
	 __x64_sys_mount+0xe3/0x120
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

  Chain exists of:
    &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(btrfs-root-00);
				 lock(sb_internal#2);
				 lock(btrfs-root-00);
    lock(&fs_info->balance_mutex);

   *** DEADLOCK ***

  2 locks held by mount/779:
   #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
   #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100

  stack backtrace:
  CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x8b/0xb0
   check_noncircular+0xcf/0xf0
   ? trace_call_bpf+0x139/0x260
   __lock_acquire+0x1120/0x1e10
   lock_acquire+0x116/0x370
   ? btrfs_recover_balance+0x2f0/0x340
   __mutex_lock+0x7e/0x7b0
   ? btrfs_recover_balance+0x2f0/0x340
   ? btrfs_recover_balance+0x2f0/0x340
   ? rcu_read_lock_sched_held+0x3f/0x80
   ? kmem_cache_alloc_trace+0x2c4/0x2f0
   ? btrfs_get_64+0x5e/0x100
   btrfs_recover_balance+0x2f0/0x340
   open_ctree+0x1095/0x1726
   btrfs_mount_root.cold+0x12/0xea
   ? rcu_read_lock_sched_held+0x3f/0x80
   legacy_get_tree+0x30/0x50
   vfs_get_tree+0x28/0xc0
   vfs_kern_mount.part.0+0x71/0xb0
   btrfs_mount+0x10d/0x380
   ? __kmalloc_track_caller+0x2f2/0x320
   legacy_get_tree+0x30/0x50
   vfs_get_tree+0x28/0xc0
   ? capable+0x3a/0x60
   path_mount+0x433/0xc10
   __x64_sys_mount+0xe3/0x120
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is straightforward to fix, simply release the path before we setup
the balance_ctl.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:54:53 +01:00
Josef Bacik
5169a289fc btrfs: do not double free backref nodes on error
commit 49ecc679ab48b40ca799bf94b327d5284eac9e46 upstream.

Zygo reported the following KASAN splat:

  BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
  Read of size 8 at addr ffff888112402950 by task btrfs/28836

  CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ #23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  Call Trace:
   dump_stack+0xbc/0xf9
   ? btrfs_backref_cleanup_node+0x18a/0x420
   print_address_description.constprop.8+0x21/0x210
   ? record_print_text.cold.34+0x11/0x11
   ? btrfs_backref_cleanup_node+0x18a/0x420
   ? btrfs_backref_cleanup_node+0x18a/0x420
   kasan_report.cold.10+0x20/0x37
   ? btrfs_backref_cleanup_node+0x18a/0x420
   __asan_load8+0x69/0x90
   btrfs_backref_cleanup_node+0x18a/0x420
   btrfs_backref_release_cache+0x83/0x1b0
   relocate_block_group+0x394/0x780
   ? merge_reloc_roots+0x4a0/0x4a0
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   ? check_flags.part.50+0x6c/0x1e0
   ? btrfs_relocate_chunk+0x120/0x120
   ? kmem_cache_alloc_trace+0xa06/0xcb0
   ? _copy_from_user+0x83/0xc0
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   ? __kasan_check_read+0x11/0x20
   ? check_chain_key+0x1f4/0x2f0
   ? __asan_loadN+0xf/0x20
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? kvm_sched_clock_read+0x18/0x30
   ? check_chain_key+0x1f4/0x2f0
   ? lock_downgrade+0x3f0/0x3f0
   ? handle_mm_fault+0xad6/0x2150
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags+0x26/0x30
   ? lock_is_held_type+0xc3/0xf0
   ? syscall_enter_from_user_mode+0x1b/0x60
   ? do_syscall_64+0x13/0x80
   ? rcu_read_lock_sched_held+0xa1/0xd0
   ? __kasan_check_read+0x11/0x20
   ? __fget_light+0xae/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f4c4bdfe427

  Allocated by task 28836:
   kasan_save_stack+0x21/0x50
   __kasan_kmalloc.constprop.18+0xbe/0xd0
   kasan_kmalloc+0x9/0x10
   kmem_cache_alloc_trace+0x410/0xcb0
   btrfs_backref_alloc_node+0x46/0xf0
   btrfs_backref_add_tree_node+0x60d/0x11d0
   build_backref_tree+0xc5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 28836:
   kasan_save_stack+0x21/0x50
   kasan_set_track+0x20/0x30
   kasan_set_free_info+0x1f/0x30
   __kasan_slab_free+0xf3/0x140
   kasan_slab_free+0xe/0x10
   kfree+0xde/0x200
   btrfs_backref_error_cleanup+0x452/0x530
   build_backref_tree+0x1a5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache().  This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up.  However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache.  Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.

Fixes: 75bfb9aff4 ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:54:52 +01:00
Josef Bacik
9e2fc8f10c btrfs: don't get an EINTR during drop_snapshot for reloc
commit 18d3bff411c8d46d40537483bdc0b61b33ce0371 upstream.

This was partially fixed by f3e3d9cc35 ("btrfs: avoid possible signal
interruption of btrfs_drop_snapshot() on relocation tree"), however it
missed a spot when we restart a trans handle because we need to end the
transaction.  The fix is the same, simply use btrfs_join_transaction()
instead of btrfs_start_transaction() when deleting reloc roots.

Fixes: f3e3d9cc35 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:54:52 +01:00
Greg Kroah-Hartman
88e2d5fd10 This is the 5.10.9 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAHFpcACgkQONu9yGCS
 aT4Vhw/+JLscHnfK//hbS6Nx95MY95VMzy+p2ccADXRy3O/5nr0HwGKnXTKB4Bg+
 05S3Hv9ZU/XSszLWvgFQ0Z0peU241ASPz1uLTgtpziBT5plXa5eJULBZ+WknWMef
 dNKpvKPphpEbQ0yz6o/4sbNAdiI9BzyGCOicQ2dl9nY7R/JA9YHquUD7iHMnvbs+
 yxwwawNHVwszUT/fJT3iFzOAehHGAttHdf3z/bGPS1ogy2S7J5IluJgTAibd3P7G
 5o7OUUA5ujEtjBLIkA61fqeL2Qaci83Ff/8KEPEfF1JeLBbMHYcLHnz3RAwBaLZh
 nlM4smyTeekcnHIzyRGw16OmpoYwY3MQAt+UFLCzKhlnscB0UqCNkA9zQA9k/taw
 cy7/fe5hWFU9DRv4uTUT2H1tkP+pNQ5eIaejPHMtld5JlYXoDN4RyQq7sAyMQgBj
 CXADStYSR/f5sWWgRbRs1F7E0lrePsVpjOcqHXxbsS+52yN2CZSKazlOIJ9xArfM
 cTzzLUuYbMZoHjIDdMMkjA41VMmyJ+BKrqEgzu3LsJQs57o/ckjnQx4VV5YiHhci
 v35OL8oa9IZi8WQikB9bx2WZRWUChOGKwMNeeUwEFD4Zmye1OtyyHuzYQf9QSjRv
 zbf1owwsg3xnfkvLcfru8mNMgJkgG8RpuNNVPO8boWZ4pgPu2tk=
 =5K55
 -----END PGP SIGNATURE-----

Merge 5.10.9 into android12-5.10

Changes in 5.10.9
	btrfs: reloc: fix wrong file extent type check to avoid false ENOENT
	btrfs: prevent NULL pointer dereference in extent_io_tree_panic
	ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines
	ALSA: doc: Fix reference to mixart.rst
	ASoC: AMD Renoir - add DMI entry for Lenovo ThinkPad X395
	ASoC: dapm: remove widget from dirty list on free
	x86/hyperv: check cpu mask after interrupt has been disabled
	drm/amdgpu: add green_sardine device id (v2)
	drm/amdgpu: fix DRM_INFO flood if display core is not supported (bug 210921)
	Revert "drm/amd/display: Fixed Intermittent blue screen on OLED panel"
	drm/amdgpu: add new device id for Renior
	drm/i915: Allow the sysadmin to override security mitigations
	drm/i915/gt: Limit VFE threads based on GT
	drm/i915/backlight: fix CPU mode backlight takeover on LPT
	drm/bridge: sii902x: Refactor init code into separate function
	dt-bindings: display: sii902x: Add supply bindings
	drm/bridge: sii902x: Enable I/O and core VCC supplies if present
	tracing/kprobes: Do the notrace functions check without kprobes on ftrace
	tools/bootconfig: Add tracing_on support to helper scripts
	ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
	ext4: fix wrong list_splice in ext4_fc_cleanup
	ext4: fix bug for rename with RENAME_WHITEOUT
	cifs: check pointer before freeing
	cifs: fix interrupted close commands
	riscv: Drop a duplicated PAGE_KERNEL_EXEC
	riscv: return -ENOSYS for syscall -1
	riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
	riscv: Fix KASAN memory mapping.
	mips: fix Section mismatch in reference
	mips: lib: uncached: fix non-standard usage of variable 'sp'
	MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
	MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
	MIPS: relocatable: fix possible boot hangup with KASLR enabled
	RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()
	ACPI: scan: Harden acpi_device_add() against device ID overflows
	xen/privcmd: allow fetching resource sizes
	compiler.h: Raise minimum version of GCC to 5.1 for arm64
	mm/vmalloc.c: fix potential memory leak
	mm/hugetlb: fix potential missing huge page size info
	mm/process_vm_access.c: include compat.h
	dm raid: fix discard limits for raid1
	dm snapshot: flush merged data before committing metadata
	dm integrity: fix flush with external metadata device
	dm integrity: fix the maximum number of arguments
	dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
	dm crypt: do not wait for backlogged crypto request completion in softirq
	dm crypt: do not call bio_endio() from the dm-crypt tasklet
	dm crypt: defer decryption to a tasklet if interrupts disabled
	stmmac: intel: change all EHL/TGL to auto detect phy addr
	r8152: Add Lenovo Powered USB-C Travel Hub
	btrfs: tree-checker: check if chunk item end overflows
	ext4: don't leak old mountpoint samples
	io_uring: don't take files/mm for a dead task
	io_uring: drop mm and files after task_work_run
	ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
	ARC: build: add uImage.lzma to the top-level target
	ARC: build: add boot_targets to PHONY
	ARC: build: move symlink creation to arch/arc/Makefile to avoid race
	ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875
	ath11k: fix crash caused by NULL rx_channel
	netfilter: ipset: fixes possible oops in mtype_resize
	ath11k: qmi: try to allocate a big block of DMA memory first
	btrfs: fix async discard stall
	btrfs: merge critical sections of discard lock in workfn
	btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
	regulator: bd718x7: Add enable times
	ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
	ARM: dts: ux500/golden: Set display max brightness
	habanalabs: adjust pci controller init to new firmware
	habanalabs/gaudi: retry loading TPC f/w on -EINTR
	habanalabs: register to pci shutdown callback
	staging: spmi: hisi-spmi-controller: Fix some error handling paths
	spi: altera: fix return value for altera_spi_txrx()
	habanalabs: Fix memleak in hl_device_reset
	hwmon: (pwm-fan) Ensure that calculation doesn't discard big period values
	lib/raid6: Let $(UNROLL) rules work with macOS userland
	kconfig: remove 'kvmconfig' and 'xenconfig' shorthands
	spi: fix the divide by 0 error when calculating xfer waiting time
	io_uring: drop file refs after task cancel
	bfq: Fix computation of shallow depth
	arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
	misdn: dsp: select CONFIG_BITREVERSE
	net: ethernet: fs_enet: Add missing MODULE_LICENSE
	selftests: fix the return value for UDP GRO test
	nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
	nvme: avoid possible double fetch in handling CQE
	nvmet-rdma: Fix list_del corruption on queue establishment failure
	drm/amd/display: fix sysfs amdgpu_current_backlight_pwm NULL pointer issue
	drm/amdgpu: fix a GPU hang issue when remove device
	drm/amd/pm: fix the failure when change power profile for renoir
	drm/amdgpu: fix potential memory leak during navi12 deinitialization
	usb: typec: Fix copy paste error for NVIDIA alt-mode description
	iommu/vt-d: Fix lockdep splat in sva bind()/unbind()
	ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
	drm/msm: Call msm_init_vram before binding the gpu
	ARM: picoxcell: fix missing interrupt-parent properties
	poll: fix performance regression due to out-of-line __put_user()
	rcu-tasks: Move RCU-tasks initialization to before early_initcall()
	bpf: Simplify task_file_seq_get_next()
	bpf: Save correct stopping point in file seq iteration
	x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
	cfg80211: select CONFIG_CRC32
	nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context
	iommu/vt-d: Update domain geometry in iommu_ops.at(de)tach_dev
	net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
	net/mlx5: Fix passing zero to 'PTR_ERR'
	net/mlx5: E-Switch, fix changing vf VLANID
	blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
	mm: fix clear_refs_write locking
	mm: don't play games with pinned pages in clear_page_refs
	mm: don't put pinned pages into the swap cache
	perf intel-pt: Fix 'CPU too large' error
	dump_common_audit_data(): fix racy accesses to ->d_name
	ASoC: meson: axg-tdm-interface: fix loopback
	ASoC: meson: axg-tdmin: fix axg skew offset
	ASoC: Intel: fix error code cnl_set_dsp_D0()
	nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
	nvme: don't intialize hwmon for discovery controllers
	nvme-tcp: fix possible data corruption with bio merges
	nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
	NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
	pNFS: We want return-on-close to complete when evicting the inode
	pNFS: Mark layout for return if return-on-close was not sent
	pNFS: Stricter ordering of layoutget and layoutreturn
	NFS: Adjust fs_context error logging
	NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
	NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
	NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
	NFS: nfs_delegation_find_inode_server must first reference the superblock
	NFS: nfs_igrab_and_active must first reference the superblock
	scsi: ufs: Fix possible power drain during system suspend
	ext4: fix superblock checksum failure when setting password salt
	RDMA/restrack: Don't treat as an error allocation ID wrapping
	RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
	bnxt_en: Improve stats context resource accounting with RDMA driver loaded.
	RDMA/mlx5: Fix wrong free of blue flame register on error
	IB/mlx5: Fix error unwinding when set_has_smi_cap fails
	umount(2): move the flag validity checks first
	dm zoned: select CONFIG_CRC32
	drm/i915/dsi: Use unconditional msleep for the panel_on_delay when there is no reset-deassert MIPI-sequence
	drm/i915/icl: Fix initing the DSI DSC power refcount during HW readout
	drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail
	mm, slub: consider rest of partial list if acquire_slab() fails
	riscv: Trace irq on only interrupt is enabled
	iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev()
	net: sunrpc: interpret the return value of kstrtou32 correctly
	selftests: netfilter: Pass family parameter "-f" to conntrack tool
	dm: eliminate potential source of excessive kernel log noise
	ALSA: fireface: Fix integer overflow in transmit_midi_msg()
	ALSA: firewire-tascam: Fix integer overflow in midi_port_work()
	netfilter: conntrack: fix reading nf_conntrack_buckets
	netfilter: nf_nat: Fix memleak in nf_nat_init
	Linux 5.10.9

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I609e501511889081e03d2d18ee7e1be95406f396
2021-01-19 18:49:54 +01:00
Filipe Manana
29543864c8 btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
[ Upstream commit cb13eea3b49055bd78e6ddf39defd6340f7379fc ]

If we remount a filesystem in RO mode while the qgroup rescan worker is
running, we can end up having it still running after the remount is done,
and at unmount time we may end up with an open transaction that ends up
never getting committed. If that happens we end up with several memory
leaks and can crash when hardware acceleration is unavailable for crc32c.
Possibly it can lead to other nasty surprises too, due to use-after-free
issues.

The following steps explain how the problem happens.

1) We have a filesystem mounted in RW mode and the qgroup rescan worker is
   running;

2) We remount the filesystem in RO mode, and never stop/pause the rescan
   worker, so after the remount the rescan worker is still running. The
   important detail here is that the rescan task is still running after
   the remount operation committed any ongoing transaction through its
   call to btrfs_commit_super();

3) The rescan is still running, and after the remount completed, the
   rescan worker started a transaction, after it finished iterating all
   leaves of the extent tree, to update the qgroup status item in the
   quotas tree. It does not commit the transaction, it only releases its
   handle on the transaction;

4) A filesystem unmount operation starts shortly after;

5) The unmount task, at close_ctree(), stops the transaction kthread,
   which had not had a chance to commit the open transaction since it was
   sleeping and the commit interval (default of 30 seconds) has not yet
   elapsed since the last time it committed a transaction;

6) So after stopping the transaction kthread we still have the transaction
   used to update the qgroup status item open. At close_ctree(), when the
   filesystem is in RO mode and no transaction abort happened (or the
   filesystem is in error mode), we do not expect to have any transaction
   open, so we do not call btrfs_commit_super();

7) We then proceed to destroy the work queues, free the roots and block
   groups, etc. After that we drop the last reference on the btree inode
   by calling iput() on it. Since there are dirty pages for the btree
   inode, corresponding to the COWed extent buffer for the quotas btree,
   btree_write_cache_pages() is invoked to flush those dirty pages. This
   results in creating a bio and submitting it, which makes us end up at
   btrfs_submit_metadata_bio();

8) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
   that calls btrfs_wq_submit_bio(), because check_async_write() returned
   a value of 1. This value of 1 is because we did not have hardware
   acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
   set in fs_info->flags;

9) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
   workqueue at fs_info->workers, which was already freed before by the
   call to btrfs_stop_all_workers() at close_ctree(). This results in an
   invalid memory access due to a use-after-free, leading to a crash.

When this happens, before the crash there are several warnings triggered,
since we have reserved metadata space in a block group, the delayed refs
reservation, etc:

  ------------[ cut here ]------------
  WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
  Code: f0 01 00 00 48 39 c2 75 (...)
  RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
  RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
  RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
  RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
  R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
  FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
   close_ctree+0x2ba/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15ee221ee7
  Code: ff 0b 00 f7 d8 64 89 01 48 (...)
  RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
  RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
  R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace dd74718fef1ed5c6 ]---
  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
  Code: 48 83 bb b0 03 00 00 00 (...)
  RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
  RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
  RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
  R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
  FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
   close_ctree+0x2ba/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15ee221ee7
  Code: ff 0b 00 f7 d8 64 89 01 (...)
  RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
  RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
  R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace dd74718fef1ed5c7 ]---
  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
  Code: ad de 49 be 22 01 00 (...)
  RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
  RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
  RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
  R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
  FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   close_ctree+0x2ba/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15ee221ee7
  Code: ff 0b 00 f7 d8 64 89 (...)
  RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
  RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
  R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace dd74718fef1ed5c8 ]---
  BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
  BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
  BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0

And the crash, which only happens when we do not have crc32c hardware
acceleration, produces the following trace immediately after those
warnings:

  stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
  Code: 54 55 53 48 89 f3 (...)
  RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
  RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
  RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
  R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
  FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
   btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
   submit_one_bio+0x61/0x70 [btrfs]
   btree_write_cache_pages+0x414/0x450 [btrfs]
   ? kobject_put+0x9a/0x1d0
   ? trace_hardirqs_on+0x1b/0xf0
   ? _raw_spin_unlock_irqrestore+0x3c/0x60
   ? free_debug_processing+0x1e1/0x2b0
   do_writepages+0x43/0xe0
   ? lock_acquired+0x199/0x490
   __writeback_single_inode+0x59/0x650
   writeback_single_inode+0xaf/0x120
   write_inode_now+0x94/0xd0
   iput+0x187/0x2b0
   close_ctree+0x2c6/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f3cfebabee7
  Code: ff 0b 00 f7 d8 64 89 01 (...)
  RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
  RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
  R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  ---[ end trace dd74718fef1ed5cc ]---

Finally when we remove the btrfs module (rmmod btrfs), there are several
warnings about objects that were allocated from our slabs but were never
freed, consequence of the transaction that was never committed and got
leaked:

  =============================================================================
  BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   slab_err+0xb7/0xdc
   ? lock_acquired+0x199/0x490
   __kmem_cache_shutdown+0x1ac/0x3c0
   ? lock_release+0x20e/0x4c0
   kmem_cache_destroy+0x55/0x120
   btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  INFO: Object 0x0000000050cbdd61 @offset=12104
  INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
	btrfs_free_tree_block+0x128/0x360 [btrfs]
	__btrfs_cow_block+0x489/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
	btrfs_mount+0x13b/0x3e0 [btrfs]
  INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	commit_cowonly_roots+0xfb/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	sync_filesystem+0x74/0x90
	generic_shutdown_super+0x22/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  INFO: Object 0x0000000086e9b0ff @offset=12776
  INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
  INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
	commit_cowonly_roots+0x248/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	close_ctree+0x113/0x2fa [btrfs]
	generic_shutdown_super+0x6c/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   kmem_cache_destroy+0x119/0x120
   btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  =============================================================================
  BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
  CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   slab_err+0xb7/0xdc
   ? lock_acquired+0x199/0x490
   __kmem_cache_shutdown+0x1ac/0x3c0
   ? lock_release+0x20e/0x4c0
   kmem_cache_destroy+0x55/0x120
   btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  INFO: Object 0x000000001a340018 @offset=4408
  INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
	btrfs_free_tree_block+0x128/0x360 [btrfs]
	__btrfs_cow_block+0x489/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
	btrfs_mount+0x13b/0x3e0 [btrfs]
  INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	btrfs_commit_transaction+0x60/0xc40 [btrfs]
	create_subvol+0x56a/0x990 [btrfs]
	btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
	__btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
	btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
	btrfs_ioctl+0x1a92/0x36f0 [btrfs]
	__x64_sys_ioctl+0x83/0xb0
	do_syscall_64+0x33/0x80
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  INFO: Object 0x000000002b46292a @offset=13648
  INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
  INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	commit_cowonly_roots+0xfb/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	close_ctree+0x113/0x2fa [btrfs]
	generic_shutdown_super+0x6c/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   kmem_cache_destroy+0x119/0x120
   btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  =============================================================================
  BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   slab_err+0xb7/0xdc
   ? lock_acquired+0x199/0x490
   __kmem_cache_shutdown+0x1ac/0x3c0
   ? __mutex_unlock_slowpath+0x45/0x2a0
   kmem_cache_destroy+0x55/0x120
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  INFO: Object 0x000000004cf95ea8 @offset=6264
  INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
	btrfs_mount+0x13b/0x3e0 [btrfs]
  INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	commit_cowonly_roots+0xfb/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	close_ctree+0x113/0x2fa [btrfs]
	generic_shutdown_super+0x6c/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
  CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   kmem_cache_destroy+0x119/0x120
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1

Fix this issue by having the remount path stop the qgroup rescan worker
when we are remounting RO and teach the rescan worker to stop when a
remount is in progress. If later a remount in RW mode happens, we are
already resuming the qgroup rescan worker through the call to
btrfs_qgroup_rescan_resume(), so we do not need to worry about that.

Tested-by: Fabian Vogt <fvogt@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:24 +01:00
Pavel Begunkov
f89d84b35a btrfs: merge critical sections of discard lock in workfn
[ Upstream commit 8fc058597a283e9a37720abb0e8d68e342b9387d ]

btrfs_discard_workfn() drops discard_ctl->lock just to take it again in
a moment in btrfs_discard_schedule_work(). Avoid that and also reuse
ktime.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:24 +01:00
Pavel Begunkov
33061bd104 btrfs: fix async discard stall
[ Upstream commit ea9ed87c73e87e044b2c58d658eb4ba5216bc488 ]

Might happen that bg->discard_eligible_time was changed without
rescheduling, so btrfs_discard_workfn() wakes up earlier than that new
time, peek_discard_list() returns NULL, and all work halts and goes to
sleep without further rescheduling even there are block groups to
discard.

It happens pretty often, but not so visible from the userspace because
after some time it usually will be kicked off anyway by someone else
calling btrfs_discard_reschedule_work().

Fix it by continue rescheduling if block group discard lists are not
empty.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:24 +01:00
Su Yue
41b5ec745c btrfs: tree-checker: check if chunk item end overflows
[ Upstream commit 347fb0cfc9bab5195c6701e62eda488310d7938f ]

While mounting a crafted image provided by user, kernel panics due to
the invalid chunk item whose end is less than start.

  [66.387422] loop: module loaded
  [66.389773] loop0: detected capacity change from 262144 to 0
  [66.427708] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 12 /dev/loop0 scanned by mount (613)
  [66.431061] BTRFS info (device loop0): disk space caching is enabled
  [66.431078] BTRFS info (device loop0): has skinny extents
  [66.437101] BTRFS error: insert state: end < start 29360127 37748736
  [66.437136] ------------[ cut here ]------------
  [66.437140] WARNING: CPU: 16 PID: 613 at fs/btrfs/extent_io.c:557 insert_state.cold+0x1a/0x46 [btrfs]
  [66.437369] CPU: 16 PID: 613 Comm: mount Tainted: G           O      5.11.0-rc1-custom #45
  [66.437374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
  [66.437378] RIP: 0010:insert_state.cold+0x1a/0x46 [btrfs]
  [66.437420] RSP: 0018:ffff93e5414c3908 EFLAGS: 00010286
  [66.437427] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
  [66.437431] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
  [66.437434] RBP: ffff93e5414c3938 R08: 0000000000000001 R09: 0000000000000001
  [66.437438] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d72aa0
  [66.437441] R13: ffff8ec78bc71628 R14: 0000000000000000 R15: 0000000002400000
  [66.437447] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
  [66.437451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [66.437455] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
  [66.437460] PKRU: 55555554
  [66.437464] Call Trace:
  [66.437475]  set_extent_bit+0x652/0x740 [btrfs]
  [66.437539]  set_extent_bits_nowait+0x1d/0x20 [btrfs]
  [66.437576]  add_extent_mapping+0x1e0/0x2f0 [btrfs]
  [66.437621]  read_one_chunk+0x33c/0x420 [btrfs]
  [66.437674]  btrfs_read_chunk_tree+0x6a4/0x870 [btrfs]
  [66.437708]  ? kvm_sched_clock_read+0x18/0x40
  [66.437739]  open_ctree+0xb32/0x1734 [btrfs]
  [66.437781]  ? bdi_register_va+0x1b/0x20
  [66.437788]  ? super_setup_bdi_name+0x79/0xd0
  [66.437810]  btrfs_mount_root.cold+0x12/0xeb [btrfs]
  [66.437854]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.437873]  legacy_get_tree+0x34/0x60
  [66.437880]  vfs_get_tree+0x2d/0xc0
  [66.437888]  vfs_kern_mount.part.0+0x78/0xc0
  [66.437897]  vfs_kern_mount+0x13/0x20
  [66.437902]  btrfs_mount+0x11f/0x3c0 [btrfs]
  [66.437940]  ? kfree+0x5ff/0x670
  [66.437944]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.437962]  legacy_get_tree+0x34/0x60
  [66.437974]  vfs_get_tree+0x2d/0xc0
  [66.437983]  path_mount+0x48c/0xd30
  [66.437998]  __x64_sys_mount+0x108/0x140
  [66.438011]  do_syscall_64+0x38/0x50
  [66.438018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [66.438023] RIP: 0033:0x7f0138827f6e
  [66.438033] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
  [66.438040] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e
  [66.438044] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0
  [66.438047] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001
  [66.438050] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  [66.438054] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440
  [66.438078] irq event stamp: 18169
  [66.438082] hardirqs last  enabled at (18175): [<ffffffffb81154bf>] console_unlock+0x4ff/0x5f0
  [66.438088] hardirqs last disabled at (18180): [<ffffffffb8115427>] console_unlock+0x467/0x5f0
  [66.438092] softirqs last  enabled at (16910): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20
  [66.438097] softirqs last disabled at (16905): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20
  [66.438103] ---[ end trace e114b111db64298b ]---
  [66.438107] BTRFS error: found node 12582912 29360127 on insert of 37748736 29360127
  [66.438127] BTRFS critical: panic in extent_io_tree_panic:679: locking error: extent tree was modified by another thread while locked (errno=-17 Object already exists)
  [66.441069] ------------[ cut here ]------------
  [66.441072] kernel BUG at fs/btrfs/extent_io.c:679!
  [66.442064] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  [66.443018] CPU: 16 PID: 613 Comm: mount Tainted: G        W  O      5.11.0-rc1-custom #45
  [66.444538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
  [66.446223] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs]
  [66.450878] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246
  [66.451840] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
  [66.453141] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
  [66.454445] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001
  [66.455743] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0
  [66.457055] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000
  [66.458356] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
  [66.459841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [66.460895] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
  [66.462196] PKRU: 55555554
  [66.462692] Call Trace:
  [66.463139]  set_extent_bit.cold+0x30/0x98 [btrfs]
  [66.464049]  set_extent_bits_nowait+0x1d/0x20 [btrfs]
  [66.490466]  add_extent_mapping+0x1e0/0x2f0 [btrfs]
  [66.514097]  read_one_chunk+0x33c/0x420 [btrfs]
  [66.534976]  btrfs_read_chunk_tree+0x6a4/0x870 [btrfs]
  [66.555718]  ? kvm_sched_clock_read+0x18/0x40
  [66.575758]  open_ctree+0xb32/0x1734 [btrfs]
  [66.595272]  ? bdi_register_va+0x1b/0x20
  [66.614638]  ? super_setup_bdi_name+0x79/0xd0
  [66.633809]  btrfs_mount_root.cold+0x12/0xeb [btrfs]
  [66.652938]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.671925]  legacy_get_tree+0x34/0x60
  [66.690300]  vfs_get_tree+0x2d/0xc0
  [66.708221]  vfs_kern_mount.part.0+0x78/0xc0
  [66.725808]  vfs_kern_mount+0x13/0x20
  [66.742730]  btrfs_mount+0x11f/0x3c0 [btrfs]
  [66.759350]  ? kfree+0x5ff/0x670
  [66.775441]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.791750]  legacy_get_tree+0x34/0x60
  [66.807494]  vfs_get_tree+0x2d/0xc0
  [66.823349]  path_mount+0x48c/0xd30
  [66.838753]  __x64_sys_mount+0x108/0x140
  [66.854412]  do_syscall_64+0x38/0x50
  [66.869673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [66.885093] RIP: 0033:0x7f0138827f6e
  [66.945613] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
  [66.977214] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e
  [66.994266] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0
  [67.011544] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001
  [67.028836] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  [67.045812] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440
  [67.216138] ---[ end trace e114b111db64298c ]---
  [67.237089] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs]
  [67.325317] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246
  [67.347946] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
  [67.371343] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
  [67.394757] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001
  [67.418409] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0
  [67.441906] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000
  [67.465436] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
  [67.511660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [67.535047] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
  [67.558449] PKRU: 55555554
  [67.581146] note: mount[613] exited with preempt_count 2

The image has a chunk item which has a logical start 37748736 and length
18446744073701163008 (-8M). The calculated end 29360127 overflows.
EEXIST was caught by insert_state() because of the duplicate end and
extent_io_tree_panic() was called.

Add overflow check of chunk item end to tree checker so it can be
detected early at mount time.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:22 +01:00
Su Yue
f37fba66a4 btrfs: prevent NULL pointer dereference in extent_io_tree_panic
commit 29b665cc51e8b602bf2a275734349494776e3dbc upstream.

Some extent io trees are initialized with NULL private member (e.g.
btrfs_device::alloc_state and btrfs_fs_info::excluded_extents).
Dereference of a NULL tree->private as inode pointer will cause panic.

Pass tree->fs_info as it's known to be valid in all cases.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929
Fixes: 05912a3c04 ("btrfs: drop extent_io_ops::tree_fs_info callback")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:17 +01:00
Qu Wenruo
e883eb5d15 btrfs: reloc: fix wrong file extent type check to avoid false ENOENT
commit 50e31ef486afe60f128d42fb9620e2a63172c15c upstream.

[BUG]
There are several bug reports about recent kernel unable to relocate
certain data block groups.

Sometimes the error just goes away, but there is one reporter who can
reproduce it reliably.

The dmesg would look like:

  [438.260483] BTRFS info (device dm-10): balance: start -dvrange=34625344765952..34625344765953
  [438.269018] BTRFS info (device dm-10): relocating block group 34625344765952 flags data|raid1
  [450.439609] BTRFS info (device dm-10): found 167 extents, stage: move data extents
  [463.501781] BTRFS info (device dm-10): balance: ended with status: -2

[CAUSE]
The ENOENT error is returned from the following call chain:

  add_data_references()
  |- delete_v1_space_cache();
     |- if (!found)
	   return -ENOENT;

The variable @found is set to true if we find a data extent whose
disk bytenr matches parameter @data_bytes.

With extra debugging, the offending tree block looks like this:

  leaf bytenr = 42676709441536, data_bytenr = 34626327621632

                ctime 1567904822.739884119 (2019-09-08 03:07:02)
                mtime 0.0 (1970-01-01 01:00:00)
                otime 0.0 (1970-01-01 01:00:00)
        item 27 key (51933 EXTENT_DATA 0) itemoff 9854 itemsize 53
                generation 1517381 type 2 (prealloc)
                prealloc data disk byte 34626327621632 nr 262144 <<<
                prealloc data offset 0 nr 262144
        item 28 key (52262 ROOT_ITEM 0) itemoff 9415 itemsize 439
                generation 2618893 root_dirid 256 bytenr 42677048360960 level 3 refs 1
                lastsnap 2618893 byte_limit 0 bytes_used 5557338112 flags 0x0(none)
                uuid d0d4361f-d231-6d40-8901-fe506e4b2b53

Although item 27 has disk bytenr 34626327621632, which matches the
data_bytenr, its type is prealloc, not reg.
This makes the existing code skip that item, and return ENOENT.

[FIX]
The code is modified in commit 19b546d7a1 ("btrfs: relocation: Use
btrfs_find_all_leafs to locate data extent parent tree leaves"), before
that commit, we use something like

  "if (type == BTRFS_FILE_EXTENT_INLINE) continue;"

But in that offending commit, we use (type == BTRFS_FILE_EXTENT_REG),
ignoring BTRFS_FILE_EXTENT_PREALLOC.

Fix it by also checking BTRFS_FILE_EXTENT_PREALLOC.

Reported-by: Stéphane Lesimple <stephane_btrfs2@lesimple.fr>
Link: https://lore.kernel.org/linux-btrfs/505cabfa88575ed6dbe7cb922d8914fb@lesimple.fr
Fixes: 19b546d7a1 ("btrfs: relocation: Use btrfs_find_all_leafs to locate data extent parent tree leaves")
CC: stable@vger.kernel.org # 5.6+
Tested-By: Stéphane Lesimple <stephane_btrfs2@lesimple.fr>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:17 +01:00
Greg Kroah-Hartman
f11e1751ad Merge 5.10.8 into android12-5.10
Changes in 5.10.8
	powerpc/32s: Fix RTAS machine check with VMAP stack
	io_uring: synchronise IOPOLL on task_submit fail
	io_uring: limit {io|sq}poll submit locking scope
	io_uring: patch up IOPOLL overflow_flush sync
	RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
	iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context
	drm/panfrost: Don't corrupt the queue mutex on open/close
	io_uring: Fix return value from alloc_fixed_file_ref_node
	scsi: ufs: Fix -Wsometimes-uninitialized warning
	btrfs: skip unnecessary searches for xattrs when logging an inode
	btrfs: fix deadlock when cloning inline extent and low on free metadata space
	btrfs: shrink delalloc pages instead of full inodes
	net: cdc_ncm: correct overhead in delayed_ndp_size
	net: hns3: fix incorrect handling of sctp6 rss tuple
	net: hns3: fix the number of queues actually used by ARQ
	net: hns3: fix a phy loopback fail issue
	net: stmmac: dwmac-sun8i: Fix probe error handling
	net: stmmac: dwmac-sun8i: Balance internal PHY resource references
	net: stmmac: dwmac-sun8i: Balance internal PHY power
	net: stmmac: dwmac-sun8i: Balance syscon (de)initialization
	net: vlan: avoid leaks on register_vlan_dev() failures
	net/sonic: Fix some resource leaks in error handling paths
	net: bareudp: add missing error handling for bareudp_link_config()
	ptp: ptp_ines: prevent build when HAS_IOMEM is not set
	net: ipv6: fib: flush exceptions when purging route
	tools: selftests: add test for changing routes with PTMU exceptions
	net: fix pmtu check in nopmtudisc mode
	net: ip: always refragment ip defragmented packets
	chtls: Fix hardware tid leak
	chtls: Remove invalid set_tcb call
	chtls: Fix panic when route to peer not configured
	chtls: Avoid unnecessary freeing of oreq pointer
	chtls: Replace skb_dequeue with skb_peek
	chtls: Added a check to avoid NULL pointer dereference
	chtls: Fix chtls resources release sequence
	octeontx2-af: fix memory leak of lmac and lmac->name
	nexthop: Fix off-by-one error in error path
	nexthop: Unlink nexthop group entry in error path
	nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
	s390/qeth: fix deadlock during recovery
	s390/qeth: fix locking for discipline setup / removal
	s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
	net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
	net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
	net/mlx5e: ethtool, Fix restriction of autoneg with 56G
	net/mlx5e: In skb build skip setting mark in switchdev mode
	net/mlx5: Check if lag is supported before creating one
	scsi: lpfc: Fix variable 'vport' set but not used in lpfc_sli4_abts_err_handler()
	ionic: start queues before announcing link up
	HID: wacom: Fix memory leakage caused by kfifo_alloc
	fanotify: Fix sys_fanotify_mark() on native x86-32
	ARM: OMAP2+: omap_device: fix idling of devices during probe
	i2c: sprd: use a specific timeout to avoid system hang up issue
	dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
	selftests/bpf: Clarify build error if no vmlinux
	can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
	can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
	can: kvaser_pciefd: select CONFIG_CRC32
	spi: spi-geni-qcom: Fail new xfers if xfer/cancel/abort pending
	cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()
	spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case
	spi: stm32: FIFO threshold level - fix align packet size
	i2c: i801: Fix the i2c-mux gpiod_lookup_table not being properly terminated
	i2c: mediatek: Fix apdma and i2c hand-shake timeout
	bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
	interconnect: imx: Add a missing of_node_put after of_device_is_available
	interconnect: qcom: fix rpmh link failures
	dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function
	dmaengine: milbeaut-xdmac: Fix a resource leak in the error handling path of the probe function
	dmaengine: xilinx_dma: check dma_async_device_register return value
	dmaengine: xilinx_dma: fix incompatible param warning in _child_probe()
	dmaengine: xilinx_dma: fix mixed_enum_type coverity warning
	arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA
	qed: select CONFIG_CRC32
	phy: dp83640: select CONFIG_CRC32
	wil6210: select CONFIG_CRC32
	block: rsxx: select CONFIG_CRC32
	lightnvm: select CONFIG_CRC32
	zonefs: select CONFIG_CRC32
	iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()
	iommu/intel: Fix memleak in intel_irq_remapping_alloc
	bpftool: Fix compilation failure for net.o with older glibc
	nvme-tcp: Fix possible race of io_work and direct send
	net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
	net/mlx5e: Fix two double free cases
	regmap: debugfs: Fix a memory leak when calling regmap_attach_dev
	wan: ds26522: select CONFIG_BITREVERSE
	arm64: cpufeature: remove non-exist CONFIG_KVM_ARM_HOST
	regulator: qcom-rpmh-regulator: correct hfsmps515 definition
	net: mvpp2: disable force link UP during port init procedure
	drm/i915/dp: Track pm_qos per connector
	net: mvneta: fix error message when MTU too large for XDP
	selftests: fib_nexthops: Fix wrong mausezahn invocation
	KVM: arm64: Don't access PMCR_EL0 when no PMU is available
	xsk: Fix race in SKB mode transmit with shared cq
	xsk: Rollback reservation at NETDEV_TX_BUSY
	block/rnbd-clt: avoid module unload race with close confirmation
	can: isotp: isotp_getname(): fix kernel information leak
	block: fix use-after-free in disk_part_iter_next
	net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet
	regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init()
	drm/panfrost: Remove unused variables in panfrost_job_close()
	tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
	Linux 5.10.8

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib8272ec9f47a3c3813509bcacece3b16137332e1
2021-01-19 09:33:21 +01:00
Josef Bacik
e3b5252b5c btrfs: shrink delalloc pages instead of full inodes
[ Upstream commit e076ab2a2ca70a0270232067cd49f76cd92efe64 ]

Commit 38d715f494 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
some infrastructure we have in place to flush inodes that we use for
device replace and snapshot.  However this introduced a pretty serious
performance regression.  To reproduce the user untarred the source
tarball of Firefox (360MiB xz compressed/1.5GiB uncompressed), and would
see it take anywhere from 5 to 20 times as long to untar in 5.10
compared to 5.9. This was observed on fast devices (SSD and better) and
not on HDD.

The root cause is because before we would generally use the normal
writeback path to reclaim delalloc space, and for this we would provide
it with the number of pages we wanted to flush.  The referenced commit
changed this to flush that many inodes, which drastically increased the
amount of space we were flushing in certain cases, which severely
affected performance.

We cannot revert this patch unfortunately because of 3d45f221ce62
("btrfs: fix deadlock when cloning inline extent and low on free
metadata space") which requires the ability to skip flushing inodes that
are being cloned in certain scenarios, which means we need to keep using
our flushing infrastructure or risk re-introducing the deadlock.

Instead to fix this problem we can go back to providing
btrfs_start_delalloc_roots with a number of pages to flush, and then set
up a writeback_control and utilize sync_inode() to handle the flushing
for us.  This gives us the same behavior we had prior to the fix, while
still allowing us to avoid the deadlock that was fixed by Filipe.  I
redid the users original test and got the following results on one of
our test machines (256GiB of ram, 56 cores, 2TiB Intel NVMe drive)

  5.9		0m54.258s
  5.10		1m26.212s
  5.10+patch	0m38.800s

5.10+patch is significantly faster than plain 5.9 because of my patch
series "Change data reservations to use the ticketing infra" which
contained the patch that introduced the regression, but generally
improved the overall ENOSPC flushing mechanisms.

Additional testing on consumer-grade SSD (8GiB ram, 8 CPU) confirm
the results:

  5.10.5            4m00s
  5.10.5+patch      1m08s
  5.11-rc2	    5m14s
  5.11-rc2+patch    1m30s

Reported-by: René Rebe <rene@exactcode.de>
Fixes: 38d715f494 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
CC: stable@vger.kernel.org # 5.10
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: David Sterba <dsterba@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add my test results ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17 14:16:54 +01:00
Filipe Manana
17243f73ad btrfs: fix deadlock when cloning inline extent and low on free metadata space
[ Upstream commit 3d45f221ce627d13e2e6ef3274f06750c84a6542 ]

When cloning an inline extent there are cases where we can not just copy
the inline extent from the source range to the target range (e.g. when the
target range starts at an offset greater than zero). In such cases we copy
the inline extent's data into a page of the destination inode and then
dirty that page. However, after that we will need to start a transaction
for each processed extent and, if we are ever low on available metadata
space, we may need to flush existing delalloc for all dirty inodes in an
attempt to release metadata space - if that happens we may deadlock:

* the async reclaim task queued a delalloc work to flush delalloc for
  the destination inode of the clone operation;

* the task executing that delalloc work gets blocked waiting for the
  range with the dirty page to be unlocked, which is currently locked
  by the task doing the clone operation;

* the async reclaim task blocks waiting for the delalloc work to complete;

* the cloning task is waiting on the waitqueue of its reservation ticket
  while holding the range with the dirty page locked in the inode's
  io_tree;

* if metadata space is not released by some other task (like delalloc for
  some other inode completing for example), the clone task waits forever
  and as a consequence the delalloc work and async reclaim tasks will hang
  forever as well. Releasing more space on the other hand may require
  starting a transaction, which will hang as well when trying to reserve
  metadata space, resulting in a deadlock between all these tasks.

When this happens, traces like the following show up in dmesg/syslog:

  [87452.323003] INFO: task kworker/u16:11:1810830 blocked for more than 120 seconds.
  [87452.323644]       Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  [87452.324248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [87452.324852] task:kworker/u16:11  state:D stack:    0 pid:1810830 ppid:     2 flags:0x00004000
  [87452.325520] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
  [87452.326136] Call Trace:
  [87452.326737]  __schedule+0x5d1/0xcf0
  [87452.327390]  schedule+0x45/0xe0
  [87452.328174]  lock_extent_bits+0x1e6/0x2d0 [btrfs]
  [87452.328894]  ? finish_wait+0x90/0x90
  [87452.329474]  btrfs_invalidatepage+0x32c/0x390 [btrfs]
  [87452.330133]  ? __mod_memcg_state+0x8e/0x160
  [87452.330738]  __extent_writepage+0x2d4/0x400 [btrfs]
  [87452.331405]  extent_write_cache_pages+0x2b2/0x500 [btrfs]
  [87452.332007]  ? lock_release+0x20e/0x4c0
  [87452.332557]  ? trace_hardirqs_on+0x1b/0xf0
  [87452.333127]  extent_writepages+0x43/0x90 [btrfs]
  [87452.333653]  ? lock_acquire+0x1a3/0x490
  [87452.334177]  do_writepages+0x43/0xe0
  [87452.334699]  ? __filemap_fdatawrite_range+0xa4/0x100
  [87452.335720]  __filemap_fdatawrite_range+0xc5/0x100
  [87452.336500]  btrfs_run_delalloc_work+0x17/0x40 [btrfs]
  [87452.337216]  btrfs_work_helper+0xf1/0x600 [btrfs]
  [87452.337838]  process_one_work+0x24e/0x5e0
  [87452.338437]  worker_thread+0x50/0x3b0
  [87452.339137]  ? process_one_work+0x5e0/0x5e0
  [87452.339884]  kthread+0x153/0x170
  [87452.340507]  ? kthread_mod_delayed_work+0xc0/0xc0
  [87452.341153]  ret_from_fork+0x22/0x30
  [87452.341806] INFO: task kworker/u16:1:2426217 blocked for more than 120 seconds.
  [87452.342487]       Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  [87452.343274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [87452.344049] task:kworker/u16:1   state:D stack:    0 pid:2426217 ppid:     2 flags:0x00004000
  [87452.344974] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
  [87452.345655] Call Trace:
  [87452.346305]  __schedule+0x5d1/0xcf0
  [87452.346947]  ? kvm_clock_read+0x14/0x30
  [87452.347676]  ? wait_for_completion+0x81/0x110
  [87452.348389]  schedule+0x45/0xe0
  [87452.349077]  schedule_timeout+0x30c/0x580
  [87452.349718]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [87452.350340]  ? lock_acquire+0x1a3/0x490
  [87452.351006]  ? try_to_wake_up+0x7a/0xa20
  [87452.351541]  ? lock_release+0x20e/0x4c0
  [87452.352040]  ? lock_acquired+0x199/0x490
  [87452.352517]  ? wait_for_completion+0x81/0x110
  [87452.353000]  wait_for_completion+0xab/0x110
  [87452.353490]  start_delalloc_inodes+0x2af/0x390 [btrfs]
  [87452.353973]  btrfs_start_delalloc_roots+0x12d/0x250 [btrfs]
  [87452.354455]  flush_space+0x24f/0x660 [btrfs]
  [87452.355063]  btrfs_async_reclaim_metadata_space+0x1bb/0x480 [btrfs]
  [87452.355565]  process_one_work+0x24e/0x5e0
  [87452.356024]  worker_thread+0x20f/0x3b0
  [87452.356487]  ? process_one_work+0x5e0/0x5e0
  [87452.356973]  kthread+0x153/0x170
  [87452.357434]  ? kthread_mod_delayed_work+0xc0/0xc0
  [87452.357880]  ret_from_fork+0x22/0x30
  (...)
  < stack traces of several tasks waiting for the locks of the inodes of the
    clone operation >
  (...)
  [92867.444138] RSP: 002b:00007ffc3371bbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
  [92867.444624] RAX: ffffffffffffffda RBX: 00007ffc3371bea0 RCX: 00007f61efe73f97
  [92867.445116] RDX: 0000000000000000 RSI: 0000560fbd5d7a40 RDI: 0000560fbd5d8960
  [92867.445595] RBP: 00007ffc3371beb0 R08: 0000000000000001 R09: 0000000000000003
  [92867.446070] R10: 00007ffc3371b996 R11: 0000000000000246 R12: 0000000000000000
  [92867.446820] R13: 000000000000001f R14: 00007ffc3371bea0 R15: 00007ffc3371beb0
  [92867.447361] task:fsstress        state:D stack:    0 pid:2508238 ppid:2508153 flags:0x00004000
  [92867.447920] Call Trace:
  [92867.448435]  __schedule+0x5d1/0xcf0
  [92867.448934]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [92867.449423]  schedule+0x45/0xe0
  [92867.449916]  __reserve_bytes+0x4a4/0xb10 [btrfs]
  [92867.450576]  ? finish_wait+0x90/0x90
  [92867.451202]  btrfs_reserve_metadata_bytes+0x29/0x190 [btrfs]
  [92867.451815]  btrfs_block_rsv_add+0x1f/0x50 [btrfs]
  [92867.452412]  start_transaction+0x2d1/0x760 [btrfs]
  [92867.453216]  clone_copy_inline_extent+0x333/0x490 [btrfs]
  [92867.453848]  ? lock_release+0x20e/0x4c0
  [92867.454539]  ? btrfs_search_slot+0x9a7/0xc30 [btrfs]
  [92867.455218]  btrfs_clone+0x569/0x7e0 [btrfs]
  [92867.455952]  btrfs_clone_files+0xf6/0x150 [btrfs]
  [92867.456588]  btrfs_remap_file_range+0x324/0x3d0 [btrfs]
  [92867.457213]  do_clone_file_range+0xd4/0x1f0
  [92867.457828]  vfs_clone_file_range+0x4d/0x230
  [92867.458355]  ? lock_release+0x20e/0x4c0
  [92867.458890]  ioctl_file_clone+0x8f/0xc0
  [92867.459377]  do_vfs_ioctl+0x342/0x750
  [92867.459913]  __x64_sys_ioctl+0x62/0xb0
  [92867.460377]  do_syscall_64+0x33/0x80
  [92867.460842]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  (...)
  < stack traces of more tasks blocked on metadata reservation like the clone
    task above, because the async reclaim task has deadlocked >
  (...)

Another thing to notice is that the worker task that is deadlocked when
trying to flush the destination inode of the clone operation is at
btrfs_invalidatepage(). This is simply because the clone operation has a
destination offset greater than the i_size and we only update the i_size
of the destination file after cloning an extent (just like we do in the
buffered write path).

Since the async reclaim path uses btrfs_start_delalloc_roots() to trigger
the flushing of delalloc for all inodes that have delalloc, add a runtime
flag to an inode to signal it should not be flushed, and for inodes with
that flag set, start_delalloc_inodes() will simply skip them. When the
cloning code needs to dirty a page to copy an inline extent, set that flag
on the inode and then clear it when the clone operation finishes.

This could be sporadically triggered with test case generic/269 from
fstests, which exercises many fsstress processes running in parallel with
several dd processes filling up the entire filesystem.

CC: stable@vger.kernel.org # 5.9+
Fixes: 05a5a7621c ("Btrfs: implement full reflink support for inline extents")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17 14:16:54 +01:00
Filipe Manana
8773816459 btrfs: skip unnecessary searches for xattrs when logging an inode
[ Upstream commit f2f121ab500d0457cc9c6f54269d21ffdf5bd304 ]

Every time we log an inode we lookup in the fs/subvol tree for xattrs and
if we have any, log them into the log tree. However it is very common to
have inodes without any xattrs, so doing the search wastes times, but more
importantly it adds contention on the fs/subvol tree locks, either making
the logging code block and wait for tree locks or making the logging code
making other concurrent operations block and wait.

The most typical use cases where xattrs are used are when capabilities or
ACLs are defined for an inode, or when SELinux is enabled.

This change makes the logging code detect when an inode does not have
xattrs and skip the xattrs search the next time the inode is logged,
unless the inode is evicted and loaded again or a xattr is added to the
inode. Therefore skipping the search for xattrs on inodes that don't ever
have xattrs and are fsynced with some frequency.

The following script that calls dbench was used to measure the impact of
this change on a VM with 8 CPUs, 16Gb of ram, using a raw NVMe device
directly (no intermediary filesystem on the host) and using a non-debug
kernel (default configuration on Debian distributions):

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdk
  MNT=/mnt/sdk
  MOUNT_OPTIONS="-o ssd"

  mkfs.btrfs -f -m single -d single $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  dbench -D $MNT -t 200 40

  umount $MNT

The results before this change:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    5761605     0.172   312.057
 Close        4232452     0.002    10.927
 Rename        243937     1.406   277.344
 Unlink       1163456     0.631   298.402
 Deltree          160    11.581   221.107
 Mkdir             80     0.003     0.005
 Qpathinfo    5221410     0.065   122.309
 Qfileinfo     915432     0.001     3.333
 Qfsinfo       957555     0.003     3.992
 Sfileinfo     469244     0.023    20.494
 Find         2018865     0.448   123.659
 WriteX       2874851     0.049   118.529
 ReadX        9030579     0.004    21.654
 LockX          18754     0.003     4.423
 UnlockX        18754     0.002     0.331
 Flush         403792    10.944   359.494

Throughput 908.444 MB/sec  40 clients  40 procs  max_latency=359.500 ms

The results after this change:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    6442521     0.159   230.693
 Close        4732357     0.002    10.972
 Rename        272809     1.293   227.398
 Unlink       1301059     0.563   218.500
 Deltree          160     7.796    54.887
 Mkdir             80     0.008     0.478
 Qpathinfo    5839452     0.047   124.330
 Qfileinfo    1023199     0.001     4.996
 Qfsinfo      1070760     0.003     5.709
 Sfileinfo     524790     0.033    21.765
 Find         2257658     0.314   125.611
 WriteX       3211520     0.040   232.135
 ReadX        10098969     0.004    25.340
 LockX          20974     0.003     1.569
 UnlockX        20974     0.002     3.475
 Flush         451553    10.287   331.037

Throughput 1011.77 MB/sec  40 clients  40 procs  max_latency=331.045 ms

+10.8% throughput, -8.2% max latency

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17 14:16:53 +01:00
Greg Kroah-Hartman
7eadb0006a Merge 5.10.7 into android12-5.10
Changes in 5.10.7
	i40e: Fix Error I40E_AQ_RC_EINVAL when removing VFs
	iavf: fix double-release of rtnl_lock
	net/sched: sch_taprio: ensure to reset/destroy all child qdiscs
	net: mvpp2: Add TCAM entry to drop flow control pause frames
	net: mvpp2: prs: fix PPPoE with ipv6 packet parse
	net: systemport: set dev->max_mtu to UMAC_MAX_MTU_SIZE
	ethernet: ucc_geth: fix use-after-free in ucc_geth_remove()
	ethernet: ucc_geth: set dev->max_mtu to 1518
	ionic: account for vlan tag len in rx buffer len
	atm: idt77252: call pci_disable_device() on error path
	net: mvpp2: Fix GoP port 3 Networking Complex Control configurations
	net: stmmac: dwmac-meson8b: ignore the second clock input
	ibmvnic: fix login buffer memory leak
	ibmvnic: continue fatal error reset after passive init
	net: ethernet: mvneta: Fix error handling in mvneta_probe
	qede: fix offload for IPIP tunnel packets
	virtio_net: Fix recursive call to cpus_read_lock()
	net/ncsi: Use real net-device for response handler
	net: ethernet: Fix memleak in ethoc_probe
	net-sysfs: take the rtnl lock when storing xps_cpus
	net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc
	net-sysfs: take the rtnl lock when storing xps_rxqs
	net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc
	net: ethernet: ti: cpts: fix ethtool output when no ptp_clock registered
	tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
	e1000e: Only run S0ix flows if shutdown succeeded
	e1000e: bump up timeout to wait when ME un-configures ULP mode
	Revert "e1000e: disable s0ix entry and exit flows for ME systems"
	e1000e: Export S0ix flags to ethtool
	bnxt_en: Check TQM rings for maximum supported value.
	net: mvpp2: fix pkt coalescing int-threshold configuration
	bnxt_en: Fix AER recovery.
	ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
	net: sched: prevent invalid Scell_log shift count
	net: hns: fix return value check in __lb_other_process()
	erspan: fix version 1 check in gre_parse_header()
	net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
	bareudp: set NETIF_F_LLTX flag
	bareudp: Fix use of incorrect min_headroom size
	vhost_net: fix ubuf refcount incorrectly when sendmsg fails
	r8169: work around power-saving bug on some chip versions
	net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
	net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
	CDC-NCM: remove "connected" log message
	ibmvnic: fix: NULL pointer dereference.
	net: usb: qmi_wwan: add Quectel EM160R-GL
	selftests: mlxsw: Set headroom size of correct port
	stmmac: intel: Add PCI IDs for TGL-H platform
	selftests/vm: fix building protection keys test
	block: add debugfs stanza for QUEUE_FLAG_NOWAIT
	workqueue: Kick a worker based on the actual activation of delayed works
	scsi: ufs: Fix wrong print message in dev_err()
	scsi: ufs-pci: Fix restore from S4 for Intel controllers
	scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
	scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
	scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
	scsi: block: Introduce BLK_MQ_REQ_PM
	scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
	scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
	scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
	scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
	local64.h: make <asm/local64.h> mandatory
	lib/genalloc: fix the overflow when size is too big
	depmod: handle the case of /sbin/depmod without /sbin in PATH
	scsi: ufs: Clear UAC for FFU and RPMB LUNs
	kbuild: don't hardcode depmod path
	Bluetooth: revert: hci_h5: close serdev device and free hu in h5_close
	scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
	scsi: block: Do not accept any requests while suspended
	crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
	crypto: asym_tpm: correct zero out potential secrets
	powerpc: Handle .text.{hot,unlikely}.* in linker script
	Staging: comedi: Return -EFAULT if copy_to_user() fails
	staging: mt7621-dma: Fix a resource leak in an error handling path
	usb: gadget: enable super speed plus
	USB: cdc-acm: blacklist another IR Droid device
	USB: cdc-wdm: Fix use after free in service_outstanding_interrupt().
	usb: typec: intel_pmc_mux: Configure HPD first for HPD+IRQ request
	usb: dwc3: meson-g12a: disable clk on error handling path in probe
	usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
	usb: dwc3: gadget: Clear wait flag on dequeue
	usb: dwc3: ulpi: Use VStsDone to detect PHY regs access completion
	usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
	usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend regression
	usb: chipidea: ci_hdrc_imx: add missing put_device() call in usbmisc_get_init_data()
	USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set
	usb: usbip: vhci_hcd: protect shift size
	usb: uas: Add PNY USB Portable SSD to unusual_uas
	USB: serial: iuu_phoenix: fix DMA from stack
	USB: serial: option: add LongSung M5710 module support
	USB: serial: option: add Quectel EM160R-GL
	USB: yurex: fix control-URB timeout handling
	USB: usblp: fix DMA to stack
	ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks
	usb: gadget: select CONFIG_CRC32
	USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug
	usb: gadget: f_uac2: reset wMaxPacketSize
	usb: gadget: function: printer: Fix a memory leak for interface descriptor
	usb: gadget: u_ether: Fix MTU size mismatch with RX packet size
	USB: gadget: legacy: fix return error code in acm_ms_bind()
	usb: gadget: Fix spinlock lockup on usb_function_deactivate
	usb: gadget: configfs: Preserve function ordering after bind failure
	usb: gadget: configfs: Fix use-after-free issue with udc_name
	USB: serial: keyspan_pda: remove unused variable
	hwmon: (amd_energy) fix allocation of hwmon_channel_info config
	mm: make wait_on_page_writeback() wait for multiple pending writebacks
	x86/mm: Fix leak of pmd ptlock
	KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
	KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
	kvm: check tlbs_dirty directly
	KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
	x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
	x86/resctrl: Don't move a task to the same resource group
	blk-iocost: fix NULL iocg deref from racing against initialization
	ALSA: hda/via: Fix runtime PM for Clevo W35xSS
	ALSA: hda/conexant: add a new hda codec CX11970
	ALSA: hda/realtek - Fix speaker volume control on Lenovo C940
	ALSA: hda/realtek: Add mute LED quirk for more HP laptops
	ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7
	ALSA: hda/realtek: Add two "Intel Reference board" SSID in the ALC256.
	iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev
	btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
	btrfs: send: fix wrong file path when there is an inode with a pending rmdir
	Revert "device property: Keep secondary firmware node secondary by type"
	dmabuf: fix use-after-free of dmabuf's file->f_inode
	arm64: link with -z norelro for LLD or aarch64-elf
	drm/i915: clear the shadow batch
	drm/i915: clear the gpu reloc batch
	bcache: fix typo from SUUP to SUPP in features.h
	bcache: check unsupported feature sets for bcache register
	bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
	net/mlx5e: Fix SWP offsets when vlan inserted by driver
	ARM: dts: OMAP3: disable AES on N950/N9
	netfilter: x_tables: Update remaining dereference to RCU
	netfilter: ipset: fix shift-out-of-bounds in htable_bits()
	netfilter: xt_RATEEST: reject non-null terminated string from userspace
	netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
	dmaengine: idxd: off by one in cleanup code
	x86/mtrr: Correct the range check before performing MTRR type lookups
	KVM: x86: fix shift out of bounds reported by UBSAN
	xsk: Fix memory leak for failed bind
	rtlwifi: rise completion at the last step of firmware callback
	scsi: target: Fix XCOPY NAA identifier lookup
	Linux 5.10.7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1a7c195af35831fe362b027fe013c0c7e4dc20ea
2021-01-13 10:29:42 +01:00
Filipe Manana
5e84c99055 btrfs: send: fix wrong file path when there is an inode with a pending rmdir
commit 0b3f407e6728d990ae1630a02c7b952c21c288d3 upstream.

When doing an incremental send, if we have a new inode that happens to
have the same number that an old directory inode had in the base snapshot
and that old directory has a pending rmdir operation, we end up computing
a wrong path for the new inode, causing the receiver to fail.

Example reproducer:

  $ cat test-send-rmdir.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV >/dev/null
  mount $DEV $MNT

  mkdir $MNT/dir
  touch $MNT/dir/file1
  touch $MNT/dir/file2
  touch $MNT/dir/file3

  # Filesystem looks like:
  #
  # .                                     (ino 256)
  # |----- dir/                           (ino 257)
  #         |----- file1                  (ino 258)
  #         |----- file2                  (ino 259)
  #         |----- file3                  (ino 260)
  #

  btrfs subvolume snapshot -r $MNT $MNT/snap1
  btrfs send -f /tmp/snap1.send $MNT/snap1

  # Now remove our directory and all its files.
  rm -fr $MNT/dir

  # Unmount the filesystem and mount it again. This is to ensure that
  # the next inode that is created ends up with the same inode number
  # that our directory "dir" had, 257, which is the first free "objectid"
  # available after mounting again the filesystem.
  umount $MNT
  mount $DEV $MNT

  # Now create a new file (it could be a directory as well).
  touch $MNT/newfile

  # Filesystem now looks like:
  #
  # .                                     (ino 256)
  # |----- newfile                        (ino 257)
  #

  btrfs subvolume snapshot -r $MNT $MNT/snap2
  btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2

  # Now unmount the filesystem, create a new one, mount it and try to apply
  # both send streams to recreate both snapshots.
  umount $DEV

  mkfs.btrfs -f $DEV >/dev/null

  mount $DEV $MNT

  btrfs receive -f /tmp/snap1.send $MNT
  btrfs receive -f /tmp/snap2.send $MNT

  umount $MNT

When running the test, the receive operation for the incremental stream
fails:

  $ ./test-send-rmdir.sh
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2
  At subvol snap1
  At snapshot snap2
  ERROR: chown o257-9-0 failed: No such file or directory

So fix this by tracking directories that have a pending rmdir by inode
number and generation number, instead of only inode number.

A test case for fstests follows soon.

Reported-by: Massimo B. <massimo.b@gmx.net>
Tested-by: Massimo B. <massimo.b@gmx.net>
Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:24 +01:00
Qu Wenruo
1888e5df84 btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
commit ae5e070eaca9dbebde3459dd8f4c2756f8c097d0 upstream.

There is a chance of racing for qgroup flushing which may lead to
deadlock:

	Thread A		|	Thread B
   (not holding trans handle)	|  (holding a trans handle)
--------------------------------+--------------------------------
__btrfs_qgroup_reserve_meta()   | __btrfs_qgroup_reserve_meta()
|- try_flush_qgroup()		| |- try_flush_qgroup()
   |- QGROUP_FLUSHING bit set   |    |
   |				|    |- test_and_set_bit()
   |				|    |- wait_event()
   |- btrfs_join_transaction()	|
   |- btrfs_commit_transaction()|

			!!! DEAD LOCK !!!

Since thread A wants to commit transaction, but thread B is holding a
transaction handle, blocking the commit.
At the same time, thread B is waiting for thread A to finish its commit.

This is just a hot fix, and would lead to more EDQUOT when we're near
the qgroup limit.

The proper fix would be to make all metadata/data reservations happen
without holding a transaction handle.

CC: stable@vger.kernel.org # 5.9+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:24 +01:00
Greg Kroah-Hartman
19057a6a6b Merge 5.10.4 into android12-5.10
Changes in 5.10.4
	hwmon: (k10temp) Remove support for displaying voltage and current on Zen CPUs
	drm/gma500: fix double free of gma_connector
	iio: adc: at91_adc: add Kconfig dep on the OF symbol and remove of_match_ptr()
	drm/aspeed: Fix Kconfig warning & subsequent build errors
	drm/mcde: Fix handling of platform_get_irq() error
	drm/tve200: Fix handling of platform_get_irq() error
	arm64: dts: renesas: hihope-rzg2-ex: Drop rxc-skew-ps from ethernet-phy node
	arm64: dts: renesas: cat875: Remove rxc-skew-ps from ethernet-phy node
	soc: renesas: rmobile-sysc: Fix some leaks in rmobile_init_pm_domains()
	soc: mediatek: Check if power domains can be powered on at boot time
	arm64: dts: mediatek: mt8183: fix gce incorrect mbox-cells value
	arm64: dts: ipq6018: update the reserved-memory node
	arm64: dts: qcom: sc7180: Fix one forgotten interconnect reference
	soc: qcom: geni: More properly switch to DMA mode
	Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"
	RDMA/bnxt_re: Set queue pair state when being queried
	rtc: pcf2127: fix pcf2127_nvmem_read/write() returns
	RDMA/bnxt_re: Fix entry size during SRQ create
	selinux: fix error initialization in inode_doinit_with_dentry()
	ARM: dts: aspeed-g6: Fix the GPIO memory size
	ARM: dts: aspeed: s2600wf: Fix VGA memory region location
	RDMA/core: Fix error return in _ib_modify_qp()
	RDMA/rxe: Compute PSN windows correctly
	x86/mm/ident_map: Check for errors from ident_pud_init()
	ARM: p2v: fix handling of LPAE translation in BE mode
	RDMA/rtrs-clt: Remove destroy_con_cq_qp in case route resolving failed
	RDMA/rtrs-clt: Missing error from rtrs_rdma_conn_established
	RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex
	x86/apic: Fix x2apic enablement without interrupt remapping
	ASoC: qcom: fix unsigned int bitwidth compared to less than zero
	sched/deadline: Fix sched_dl_global_validate()
	sched: Reenable interrupts in do_sched_yield()
	drm/amdgpu: fix incorrect enum type
	crypto: talitos - Endianess in current_desc_hdr()
	crypto: talitos - Fix return type of current_desc_hdr()
	crypto: inside-secure - Fix sizeof() mismatch
	ASoC: sun4i-i2s: Fix lrck_period computation for I2S justified mode
	drm/msm: Add missing stub definition
	ARM: dts: aspeed: tiogapass: Remove vuart
	drm/amdgpu: fix build_coefficients() argument
	powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
	spi: img-spfi: fix reference leak in img_spfi_resume
	f2fs: call f2fs_get_meta_page_retry for nat page
	RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
	perf test: Use generic event for expand_libpfm_events()
	drm/msm/dp: DisplayPort PHY compliance tests fixup
	drm/msm/dsi_pll_7nm: restore VCO rate during restore_state
	drm/msm/dsi_pll_10nm: restore VCO rate during restore_state
	drm/msm/dpu: fix clock scaling on non-sc7180 board
	spi: spi-mem: fix reference leak in spi_mem_access_start
	scsi: aacraid: Improve compat_ioctl handlers
	pinctrl: core: Add missing #ifdef CONFIG_GPIOLIB
	ASoC: pcm: DRAIN support reactivation
	drm/bridge: tpd12s015: Fix irq registering in tpd12s015_probe
	crypto: arm64/poly1305-neon - reorder PAC authentication with SP update
	crypto: arm/aes-neonbs - fix usage of cbc(aes) fallback
	crypto: caam - fix printing on xts fallback allocation error path
	selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
	nl80211/cfg80211: fix potential infinite loop
	spi: stm32: fix reference leak in stm32_spi_resume
	bpf: Fix tests for local_storage
	x86/mce: Correct the detection of invalid notifier priorities
	drm/edid: Fix uninitialized variable in drm_cvt_modes()
	ath11k: Initialize complete alpha2 for regulatory change
	ath11k: Fix number of rules in filtered ETSI regdomain
	ath11k: fix wmi init configuration
	brcmfmac: Fix memory leak for unpaired brcmf_{alloc/free}
	arm64: dts: exynos: Include common syscon restart/poweroff for Exynos7
	arm64: dts: exynos: Correct psci compatible used on Exynos7
	drm/panel: simple: Add flags to boe_nv133fhm_n61
	Bluetooth: Fix null pointer dereference in hci_event_packet()
	Bluetooth: Fix: LL PRivacy BLE device fails to connect
	Bluetooth: hci_h5: fix memory leak in h5_close
	spi: stm32-qspi: fix reference leak in stm32 qspi operations
	spi: spi-ti-qspi: fix reference leak in ti_qspi_setup
	spi: mt7621: fix missing clk_disable_unprepare() on error in mt7621_spi_probe
	spi: tegra20-slink: fix reference leak in slink ops of tegra20
	spi: tegra20-sflash: fix reference leak in tegra_sflash_resume
	spi: tegra114: fix reference leak in tegra spi ops
	spi: bcm63xx-hsspi: fix missing clk_disable_unprepare() on error in bcm63xx_hsspi_resume
	spi: imx: fix reference leak in two imx operations
	ASoC: qcom: common: Fix refcounting in qcom_snd_parse_of()
	ath11k: Handle errors if peer creation fails
	mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure
	drm/msm/a6xx: Clear shadow on suspend
	drm/msm/a5xx: Clear shadow on suspend
	firmware: tegra: fix strncpy()/strncat() confusion
	drm/msm/dp: return correct connection status after suspend
	drm/msm/dp: skip checking LINK_STATUS_UPDATED bit
	drm/msm/dp: do not notify audio subsystem if sink doesn't support audio
	selftests/run_kselftest.sh: fix dry-run typo
	selftest/bpf: Add missed ip6ip6 test back
	ASoC: wm8994: Fix PM disable depth imbalance on error
	ASoC: wm8998: Fix PM disable depth imbalance on error
	spi: sprd: fix reference leak in sprd_spi_remove
	virtiofs fix leak in setup
	ASoC: arizona: Fix a wrong free in wm8997_probe
	RDMa/mthca: Work around -Wenum-conversion warning
	ASoC: SOF: Intel: fix Kconfig dependency for SND_INTEL_DSP_CONFIG
	arm64: dts: ti: k3-am65*/j721e*: Fix unit address format error for dss node
	MIPS: BCM47XX: fix kconfig dependency bug for BCM47XX_BCMA
	drm/amdgpu: fix compute queue priority if num_kcq is less than 4
	soc: ti: omap-prm: Do not check rstst bit on deassert if already deasserted
	crypto: Kconfig - CRYPTO_MANAGER_EXTRA_TESTS requires the manager
	crypto: qat - fix status check in qat_hal_put_rel_rd_xfer()
	firmware: arm_scmi: Fix missing destroy_workqueue()
	drm/udl: Fix missing error code in udl_handle_damage()
	staging: greybus: codecs: Fix reference counter leak in error handling
	staging: gasket: interrupt: fix the missed eventfd_ctx_put() in gasket_interrupt.c
	scripts: kernel-doc: Restore anonymous enum parsing
	drm/amdkfd: Put ACPI table after using it
	ionic: use mc sync for multicast filters
	ionic: flatten calls to ionic_lif_rx_mode
	ionic: change set_rx_mode from_ndo to can_sleep
	media: tm6000: Fix sizeof() mismatches
	media: platform: add missing put_device() call in mtk_jpeg_clk_init()
	media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_dec_pm()
	media: mtk-vcodec: add missing put_device() call in mtk_vcodec_release_dec_pm()
	media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_enc_pm()
	media: v4l2-fwnode: Return -EINVAL for invalid bus-type
	media: v4l2-fwnode: v4l2_fwnode_endpoint_parse caller must init vep argument
	media: ov5640: fix support of BT656 bus mode
	media: staging: rkisp1: cap: fix runtime PM imbalance on error
	media: cedrus: fix reference leak in cedrus_start_streaming
	media: platform: add missing put_device() call in mtk_jpeg_probe() and mtk_jpeg_remove()
	media: venus: core: change clk enable and disable order in resume and suspend
	media: venus: core: vote for video-mem path
	media: venus: core: vote with average bandwidth and peak bandwidth as zero
	RDMA/cma: Add missing error handling of listen_id
	ASoC: meson: fix COMPILE_TEST error
	spi: dw: fix build error by selecting MULTIPLEXER
	scsi: core: Fix VPD LUN ID designator priorities
	media: venus: put dummy vote on video-mem path after last session release
	media: solo6x10: fix missing snd_card_free in error handling case
	video: fbdev: atmel_lcdfb: fix return error code in atmel_lcdfb_of_init()
	mmc: sdhci: tegra: fix wrong unit with busy_timeout
	drm/omap: dmm_tiler: fix return error code in omap_dmm_probe()
	drm/meson: Free RDMA resources after tearing down DRM
	drm/meson: Unbind all connectors on module removal
	drm/meson: dw-hdmi: Register a callback to disable the regulator
	drm/meson: dw-hdmi: Ensure that clocks are enabled before touching the TOP registers
	ASoC: intel: SND_SOC_INTEL_KEEMBAY should depend on ARCH_KEEMBAY
	iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM
	Input: ads7846 - fix race that causes missing releases
	Input: ads7846 - fix integer overflow on Rt calculation
	Input: ads7846 - fix unaligned access on 7845
	bus: mhi: core: Remove double locking from mhi_driver_remove()
	bus: mhi: core: Fix null pointer access when parsing MHI configuration
	usb/max3421: fix return error code in max3421_probe()
	spi: mxs: fix reference leak in mxs_spi_probe
	selftests/bpf: Fix broken riscv build
	powerpc: Avoid broken GCC __attribute__((optimize))
	powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
	ARM: dts: tacoma: Fix node vs reg mismatch for flash memory
	Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
	powerpc/powernv/sriov: fix unsigned int win compared to less than zero
	mfd: htc-i2cpld: Add the missed i2c_put_adapter() in htcpld_register_chip_i2c()
	mfd: MFD_SL28CPLD should depend on ARCH_LAYERSCAPE
	mfd: stmfx: Fix dev_err_probe() call in stmfx_chip_init()
	mfd: cpcap: Fix interrupt regression with regmap clear_ack
	EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
	scsi: ufs: Avoid to call REQ_CLKS_OFF to CLKS_OFF
	scsi: ufs: Fix clkgating on/off
	rcu: Allow rcu_irq_enter_check_tick() from NMI
	rcu,ftrace: Fix ftrace recursion
	rcu/tree: Defer kvfree_rcu() allocation to a clean context
	crypto: crypto4xx - Replace bitwise OR with logical OR in crypto4xx_build_pd
	crypto: omap-aes - Fix PM disable depth imbalance in omap_aes_probe
	crypto: sun8i-ce - fix two error path's memory leak
	spi: fix resource leak for drivers without .remove callback
	drm/meson: dw-hdmi: Disable clocks on driver teardown
	drm/meson: dw-hdmi: Enable the iahb clock early enough
	PCI: Disable MSI for Pericom PCIe-USB adapter
	PCI: brcmstb: Initialize "tmp" before use
	soc: ti: knav_qmss: fix reference leak in knav_queue_probe
	soc: ti: Fix reference imbalance in knav_dma_probe
	drivers: soc: ti: knav_qmss_queue: Fix error return code in knav_queue_probe
	soc: qcom: initialize local variable
	arm64: dts: qcom: sm8250: correct compatible for sm8250-mtp
	arm64: dts: qcom: msm8916-samsung-a2015: Disable muic i2c pin bias
	Input: omap4-keypad - fix runtime PM error handling
	clk: meson: Kconfig: fix dependency for G12A
	staging: mfd: hi6421-spmi-pmic: fix error return code in hi6421_spmi_pmic_probe()
	ath11k: Fix the rx_filter flag setting for peer rssi stats
	RDMA/cxgb4: Validate the number of CQEs
	soundwire: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
	pinctrl: sunxi: fix irq bank map for the Allwinner A100 pin controller
	memstick: fix a double-free bug in memstick_check
	ARM: dts: at91: sam9x60: add pincontrol for USB Host
	ARM: dts: at91: sama5d4_xplained: add pincontrol for USB Host
	ARM: dts: at91: sama5d3_xplained: add pincontrol for USB Host
	mmc: pxamci: Fix error return code in pxamci_probe
	brcmfmac: fix error return code in brcmf_cfg80211_connect()
	orinoco: Move context allocation after processing the skb
	qtnfmac: fix error return code in qtnf_pcie_probe()
	rsi: fix error return code in rsi_reset_card()
	cw1200: fix missing destroy_workqueue() on error in cw1200_init_common
	dmaengine: mv_xor_v2: Fix error return code in mv_xor_v2_probe()
	arm64: dts: qcom: sdm845: Limit ipa iommu streams
	leds: netxbig: add missing put_device() call in netxbig_leds_get_of_pdata()
	leds: lp50xx: Fix an error handling path in 'lp50xx_probe_dt()'
	leds: turris-omnia: check for LED_COLOR_ID_RGB instead LED_COLOR_ID_MULTI
	arm64: tegra: Fix DT binding for IO High Voltage entry
	RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind
	soundwire: qcom: Fix build failure when slimbus is module
	drm/imx/dcss: fix rotations for Vivante tiled formats
	media: siano: fix memory leak of debugfs members in smsdvb_hotplug
	platform/x86: mlx-platform: Remove PSU EEPROM from default platform configuration
	platform/x86: mlx-platform: Remove PSU EEPROM from MSN274x platform configuration
	arm64: dts: qcom: sc7180: limit IPA iommu streams
	RDMA/hns: Only record vlan info for HIP08
	RDMA/hns: Fix missing fields in address vector
	RDMA/hns: Avoid setting loopback indicator when smac is same as dmac
	serial: 8250-mtk: Fix reference leak in mtk8250_probe
	samples: bpf: Fix lwt_len_hist reusing previous BPF map
	media: imx214: Fix stop streaming
	mips: cdmm: fix use-after-free in mips_cdmm_bus_discover
	media: max2175: fix max2175_set_csm_mode() error code
	slimbus: qcom-ngd-ctrl: Avoid sending power requests without QMI
	RDMA/core: Track device memory MRs
	drm/mediatek: Use correct aliases name for ovl
	HSI: omap_ssi: Don't jump to free ID in ssi_add_controller()
	ARM: dts: Remove non-existent i2c1 from 98dx3236
	arm64: dts: armada-3720-turris-mox: update ethernet-phy handle name
	power: supply: bq25890: Use the correct range for IILIM register
	arm64: dts: rockchip: Set dr_mode to "host" for OTG on rk3328-roc-cc
	power: supply: max17042_battery: Fix current_{avg,now} hiding with no current sense
	power: supply: axp288_charger: Fix HP Pavilion x2 10 DMI matching
	power: supply: bq24190_charger: fix reference leak
	genirq/irqdomain: Don't try to free an interrupt that has no mapping
	arm64: dts: ls1028a: fix ENETC PTP clock input
	arm64: dts: ls1028a: fix FlexSPI clock input
	arm64: dts: freescale: sl28: combine SPI MTD partitions
	phy: tegra: xusb: Fix usb_phy device driver field
	arm64: dts: qcom: c630: Polish i2c-hid devices
	arm64: dts: qcom: c630: Fix pinctrl pins properties
	PCI: Bounds-check command-line resource alignment requests
	PCI: Fix overflow in command-line resource alignment requests
	PCI: iproc: Fix out-of-bound array accesses
	PCI: iproc: Invalidate correct PAXB inbound windows
	arm64: dts: meson: fix spi-max-frequency on Khadas VIM2
	arm64: dts: meson-sm1: fix typo in opp table
	soc: amlogic: canvas: add missing put_device() call in meson_canvas_get()
	scsi: hisi_sas: Fix up probe error handling for v3 hw
	scsi: pm80xx: Do not sleep in atomic context
	spi: spi-fsl-dspi: Use max_native_cs instead of num_chipselect to set SPI_MCR
	ARM: dts: at91: at91sam9rl: fix ADC triggers
	RDMA/hns: Fix 0-length sge calculation error
	RDMA/hns: Bugfix for calculation of extended sge
	mailbox: arm_mhu_db: Fix mhu_db_shutdown by replacing kfree with devm_kfree
	soundwire: master: use pm_runtime_set_active() on add
	platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init
	ASoC: Intel: Boards: tgl_max98373: update TDM slot_width
	media: max9271: Fix GPIO enable/disable
	media: rdacm20: Enable GPIO1 explicitly
	media: i2c: imx219: Selection compliance fixes
	ath11k: Don't cast ath11k_skb_cb to ieee80211_tx_info.control
	ath11k: Reset ath11k_skb_cb before setting new flags
	ath11k: Fix an error handling path
	ath10k: Fix the parsing error in service available event
	ath10k: Fix an error handling path
	ath10k: Release some resources in an error handling path
	SUNRPC: rpc_wake_up() should wake up tasks in the correct order
	NFSv4.2: condition READDIR's mask for security label based on LSM state
	SUNRPC: xprt_load_transport() needs to support the netid "rdma6"
	NFSv4: Fix the alignment of page data in the getdeviceinfo reply
	net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs'
	lockd: don't use interval-based rebinding over TCP
	NFS: switch nfsiod to be an UNBOUND workqueue.
	selftests/seccomp: Update kernel config
	vfio-pci: Use io_remap_pfn_range() for PCI IO memory
	hwmon: (ina3221) Fix PM usage counter unbalance in ina3221_write_enable
	f2fs: fix double free of unicode map
	media: tvp5150: Fix wrong return value of tvp5150_parse_dt()
	media: saa7146: fix array overflow in vidioc_s_audio()
	powerpc/perf: Fix crash with is_sier_available when pmu is not set
	powerpc/64: Fix an EMIT_BUG_ENTRY in head_64.S
	powerpc/xmon: Fix build failure for 8xx
	powerpc/perf: Fix to update radix_scope_qual in power10
	powerpc/perf: Update the PMU group constraints for l2l3 events in power10
	powerpc/perf: Fix the PMU group constraints for threshold events in power10
	clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path
	clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()
	clocksource/drivers/ingenic: Fix section mismatch
	clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
	arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
	iio: hrtimer-trigger: Mark hrtimer to expire in hard interrupt context
	libbpf: Sanitise map names before pinning
	ARM: dts: at91: sam9x60ek: remove bypass property
	ARM: dts: at91: sama5d2: map securam as device
	scripts: kernel-doc: fix parsing function-like typedefs
	bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
	selftests/bpf: Fix invalid use of strncat in test_sockmap
	pinctrl: falcon: add missing put_device() call in pinctrl_falcon_probe()
	soc: rockchip: io-domain: Fix error return code in rockchip_iodomain_probe()
	arm64: dts: rockchip: Fix UART pull-ups on rk3328
	memstick: r592: Fix error return in r592_probe()
	MIPS: Don't round up kernel sections size for memblock_add()
	mt76: mt7663s: fix a possible ple quota underflow
	mt76: mt7915: set fops_sta_stats.owner to THIS_MODULE
	mt76: set fops_tx_stats.owner to THIS_MODULE
	mt76: dma: fix possible deadlock running mt76_dma_cleanup
	net/mlx5: Properly convey driver version to firmware
	mt76: fix memory leak if device probing fails
	mt76: fix tkip configuration for mt7615/7663 devices
	ASoC: jz4740-i2s: add missed checks for clk_get()
	ASoC: q6afe-clocks: Add missing parent clock rate
	dm ioctl: fix error return code in target_message
	ASoC: cros_ec_codec: fix uninitialized memory read
	ASoC: atmel: mchp-spdifrx needs COMMON_CLK
	ASoC: qcom: fix QDSP6 dependencies, attempt #3
	phy: mediatek: allow compile-testing the hdmi phy
	phy: renesas: rcar-gen3-usb2: disable runtime pm in case of failure
	memory: ti-emif-sram: only build for ARMv7
	memory: jz4780_nemc: Fix potential NULL dereference in jz4780_nemc_probe()
	drm/msm: a5xx: Make preemption reset case reentrant
	drm/msm: add IOMMU_SUPPORT dependency
	clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne
	clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI
	cpufreq: ap806: Add missing MODULE_DEVICE_TABLE
	cpufreq: highbank: Add missing MODULE_DEVICE_TABLE
	cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE
	cpufreq: qcom: Add missing MODULE_DEVICE_TABLE
	cpufreq: st: Add missing MODULE_DEVICE_TABLE
	cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE
	cpufreq: loongson1: Add missing MODULE_ALIAS
	cpufreq: scpi: Add missing MODULE_ALIAS
	cpufreq: vexpress-spc: Add missing MODULE_ALIAS
	cpufreq: imx: fix NVMEM_IMX_OCOTP dependency
	macintosh/adb-iop: Always wait for reply message from IOP
	macintosh/adb-iop: Send correct poll command
	staging: bcm2835: fix vchiq_mmal dependencies
	staging: greybus: audio: Fix possible leak free widgets in gbaudio_dapm_free_controls
	spi: dw: Fix error return code in dw_spi_bt1_probe()
	Bluetooth: btusb: Add the missed release_firmware() in btusb_mtk_setup_firmware()
	Bluetooth: btmtksdio: Add the missed release_firmware() in mtk_setup_firmware()
	Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option
	block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
	block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
	Bluetooth: btusb: Fix detection of some fake CSR controllers with a bcdDevice val of 0x0134
	platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models
	adm8211: fix error return code in adm8211_probe()
	mtd: spi-nor: sst: fix BPn bits for the SST25VF064C
	mtd: spi-nor: ignore errors in spi_nor_unlock_all()
	mtd: spi-nor: atmel: remove global protection flag
	mtd: spi-nor: atmel: fix unlock_all() for AT25FS010/040
	arm64: dts: meson: g12b: odroid-n2: fix PHY deassert timing requirements
	arm64: dts: meson: fix PHY deassert timing requirements
	ARM: dts: meson: fix PHY deassert timing requirements
	arm64: dts: meson: g12a: x96-max: fix PHY deassert timing requirements
	arm64: dts: meson: g12b: w400: fix PHY deassert timing requirements
	clk: fsl-sai: fix memory leak
	scsi: qedi: Fix missing destroy_workqueue() on error in __qedi_probe
	scsi: pm80xx: Fix error return in pm8001_pci_probe()
	scsi: iscsi: Fix inappropriate use of put_device()
	seq_buf: Avoid type mismatch for seq_buf_init
	scsi: fnic: Fix error return code in fnic_probe()
	platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems
	platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system
	ARM: 9030/1: entry: omit FP emulation for UND exceptions taken in kernel mode
	powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
	powerpc/pseries/hibernation: remove redundant cacheinfo update
	powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK
	drm/mediatek: avoid dereferencing a null hdmi_phy on an error message
	ASoC: amd: change clk_get() to devm_clk_get() and add missed checks
	coresight: remove broken __exit annotations
	ASoC: max98390: Fix error codes in max98390_dsm_init()
	powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
	usb: ehci-omap: Fix PM disable depth umbalance in ehci_hcd_omap_probe
	usb: oxu210hp-hcd: Fix memory leak in oxu_create
	speakup: fix uninitialized flush_lock
	nfsd: Fix message level for normal termination
	NFSD: Fix 5 seconds delay when doing inter server copy
	nfs_common: need lock during iterate through the list
	x86/kprobes: Restore BTF if the single-stepping is cancelled
	scsi: qla2xxx: Fix FW initialization error on big endian machines
	scsi: qla2xxx: Fix N2N and NVMe connect retry failure
	platform/chrome: cros_ec_spi: Don't overwrite spi::mode
	misc: pci_endpoint_test: fix return value of error branch
	bus: fsl-mc: add back accidentally dropped error check
	bus: fsl-mc: fix error return code in fsl_mc_object_allocate()
	fsi: Aspeed: Add mutex to protect HW access
	s390/cio: fix use-after-free in ccw_device_destroy_console
	iwlwifi: dbg-tlv: fix old length in is_trig_data_contained()
	iwlwifi: mvm: hook up missing RX handlers
	erofs: avoid using generic_block_bmap
	clk: renesas: r8a779a0: Fix R and OSC clocks
	can: m_can: m_can_config_endisable(): remove double clearing of clock stop request bit
	powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
	powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
	slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew()
	ALSA: hda/hdmi: fix silent stream for first playback to DP
	RDMA/core: Do not indicate device ready when device enablement fails
	RDMA/uverbs: Fix incorrect variable type
	remoteproc/mediatek: change MT8192 CFG register base
	remoteproc/mtk_scp: surround DT device IDs with CONFIG_OF
	remoteproc: q6v5-mss: fix error handling in q6v5_pds_enable
	remoteproc: qcom: fix reference leak in adsp_start
	remoteproc: qcom: pas: fix error handling in adsp_pds_enable
	remoteproc: k3-dsp: Fix return value check in k3_dsp_rproc_of_get_memories()
	remoteproc: qcom: Fix potential NULL dereference in adsp_init_mmio()
	remoteproc/mediatek: unprepare clk if scp_before_load fails
	clk: qcom: gcc-sc7180: Use floor ops for sdcc clks
	clk: tegra: Fix duplicated SE clock entry
	mtd: rawnand: gpmi: fix reference count leak in gpmi ops
	mtd: rawnand: meson: Fix a resource leak in init
	mtd: rawnand: gpmi: Fix the random DMA timeout issue
	samples/bpf: Fix possible hang in xdpsock with multiple threads
	fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()
	extcon: max77693: Fix modalias string
	crypto: atmel-i2c - select CONFIG_BITREVERSE
	mac80211: don't set set TDLS STA bandwidth wider than possible
	mac80211: fix a mistake check for rx_stats update
	ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control()
	irqchip/alpine-msi: Fix freeing of interrupts on allocation error path
	irqchip/ti-sci-inta: Fix printing of inta id on probe success
	irqchip/ti-sci-intr: Fix freeing of irqs
	dmaengine: ti: k3-udma: Correct normal channel offset when uchan_cnt is not 0
	RDMA/hns: Limit the length of data copied between kernel and userspace
	RDMA/hns: Normalization the judgment of some features
	RDMA/hns: Do shift on traffic class when using RoCEv2
	gpiolib: irq hooks: fix recursion in gpiochip_irq_unmask
	ath11k: Fix incorrect tlvs in scan start command
	irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
	watchdog: armada_37xx: Add missing dependency on HAS_IOMEM
	watchdog: sirfsoc: Add missing dependency on HAS_IOMEM
	watchdog: sprd: remove watchdog disable from resume fail path
	watchdog: sprd: check busy bit before new loading rather than after that
	watchdog: Fix potential dereferencing of null pointer
	ubifs: Fix error return code in ubifs_init_authentication()
	um: Monitor error events in IRQ controller
	um: tty: Fix handling of close in tty lines
	um: chan_xterm: Fix fd leak
	sunrpc: fix xs_read_xdr_buf for partial pages receive
	RDMA/mlx5: Fix MR cache memory leak
	RDMA/cma: Don't overwrite sgid_attr after device is released
	nfc: s3fwrn5: Release the nfc firmware
	drm: mxsfb: Silence -EPROBE_DEFER while waiting for bridge
	powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
	powerpc/ps3: use dma_mapping_error()
	perf test: Fix metric parsing test
	drm/amdgpu: fix regression in vbios reservation handling on headless
	mm/gup: reorganize internal_get_user_pages_fast()
	mm/gup: prevent gup_fast from racing with COW during fork
	mm/gup: combine put_compound_head() and unpin_user_page()
	mm: memcg/slab: fix return of child memcg objcg for root memcg
	mm: memcg/slab: fix use after free in obj_cgroup_charge
	mm/rmap: always do TTU_IGNORE_ACCESS
	sparc: fix handling of page table constructor failure
	mm/vmalloc: Fix unlock order in s_stop()
	mm/vmalloc.c: fix kasan shadow poisoning size
	mm,memory_failure: always pin the page in madvise_inject_error
	hugetlb: fix an error code in hugetlb_reserve_pages()
	mm: don't wake kswapd prematurely when watermark boosting is disabled
	proc: fix lookup in /proc/net subdirectories after setns(2)
	checkpatch: fix unescaped left brace
	s390/test_unwind: fix CALL_ON_STACK tests
	lan743x: fix rx_napi_poll/interrupt ping-pong
	ice, xsk: clear the status bits for the next_to_use descriptor
	i40e, xsk: clear the status bits for the next_to_use descriptor
	net: dsa: qca: ar9331: fix sleeping function called from invalid context bug
	dpaa2-eth: fix the size of the mapped SGT buffer
	net: bcmgenet: Fix a resource leak in an error handling path in the probe functin
	net: mscc: ocelot: Fix a resource leak in the error handling path of the probe function
	net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function
	block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
	block/rnbd-clt: Fix possible memleak
	NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
	net: korina: fix return value
	devlink: use _BITUL() macro instead of BIT() in the UAPI header
	libnvdimm/label: Return -ENXIO for no slot in __blk_label_update
	powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
	watchdog: qcom: Avoid context switch in restart handler
	watchdog: coh901327: add COMMON_CLK dependency
	clk: ti: Fix memleak in ti_fapll_synth_setup
	pwm: zx: Add missing cleanup in error path
	pwm: lp3943: Dynamically allocate PWM chip base
	pwm: imx27: Fix overflow for bigger periods
	pwm: sun4i: Remove erroneous else branch
	io_uring: cancel only requests of current task
	tools build: Add missing libcap to test-all.bin target
	perf record: Fix memory leak when using '--user-regs=?' to list registers
	qlcnic: Fix error code in probe
	nfp: move indirect block cleanup to flower app stop callback
	vdpa/mlx5: Use write memory barrier after updating CQ index
	virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
	virtio_net: Fix error code in probe()
	virtio_ring: Fix two use after free bugs
	vhost scsi: fix error return code in vhost_scsi_set_endpoint()
	epoll: check for events when removing a timed out thread from the wait queue
	clk: bcm: dvp: Add MODULE_DEVICE_TABLE()
	clk: at91: sama7g5: fix compilation error
	clk: at91: sam9x60: remove atmel,osc-bypass support
	clk: s2mps11: Fix a resource leak in error handling paths in the probe function
	clk: sunxi-ng: Make sure divider tables have sentinel
	clk: vc5: Use "idt,voltage-microvolt" instead of "idt,voltage-microvolts"
	kconfig: fix return value of do_error_if()
	powerpc/boot: Fix build of dts/fsl
	powerpc/smp: Add __init to init_big_cores()
	ARM: 9044/1: vfp: use undef hook for VFP support detection
	ARM: 9036/1: uncompress: Fix dbgadtb size parameter name
	perf probe: Fix memory leak when synthesizing SDT probes
	io_uring: fix racy IOPOLL flush overflow
	io_uring: cancel reqs shouldn't kill overflow list
	Smack: Handle io_uring kernel thread privileges
	proc mountinfo: make splice available again
	io_uring: fix io_cqring_events()'s noflush
	io_uring: fix racy IOPOLL completions
	io_uring: always let io_iopoll_complete() complete polled io
	vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
	vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
	media: gspca: Fix memory leak in probe
	io_uring: fix io_wqe->work_list corruption
	io_uring: fix 0-iov read buffer select
	io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()
	io_uring: fix ignoring xa_store errors
	io_uring: fix double io_uring free
	io_uring: make ctx cancel on exit targeted to actual ctx
	media: sunxi-cir: ensure IR is handled when it is continuous
	media: netup_unidvb: Don't leak SPI master in probe error path
	media: ipu3-cio2: Remove traces of returned buffers
	media: ipu3-cio2: Return actual subdev format
	media: ipu3-cio2: Serialise access to pad format
	media: ipu3-cio2: Validate mbus format in setting subdev format
	media: ipu3-cio2: Make the field on subdev format V4L2_FIELD_NONE
	Input: cyapa_gen6 - fix out-of-bounds stack access
	ALSA: hda/ca0132 - Change Input Source enum strings.
	ACPI: NFIT: Fix input validation of bus-family
	PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()
	Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks"
	ACPI: PNP: compare the string length in the matching_id()
	ALSA: hda: Fix regressions on clear and reconfig sysfs
	ALSA: hda/ca0132 - Fix AE-5 rear headphone pincfg.
	ALSA: hda/realtek: make bass spk volume adjustable on a yoga laptop
	ALSA: hda/realtek - Enable headset mic of ASUS X430UN with ALC256
	ALSA: hda/realtek - Enable headset mic of ASUS Q524UQK with ALC255
	ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button
	ALSA: pcm: oss: Fix a few more UBSAN fixes
	ALSA/hda: apply jack fixup for the Acer Veriton N4640G/N6640G/N2510G
	ALSA: hda/realtek: Add quirk for MSI-GP73
	ALSA: hda/realtek: Apply jack fixup for Quanta NL3
	ALSA: hda/realtek: Remove dummy lineout on Acer TravelMate P648/P658
	ALSA: hda/realtek - Supported Dell fixed type headset
	ALSA: usb-audio: Add VID to support native DSD reproduction on FiiO devices
	ALSA: usb-audio: Disable sample read check if firmware doesn't give back
	ALSA: usb-audio: Add alias entry for ASUS PRIME TRX40 PRO-S
	ALSA: core: memalloc: add page alignment for iram
	s390/smp: perform initial CPU reset also for SMT siblings
	s390/kexec_file: fix diag308 subcode when loading crash kernel
	s390/idle: add missing mt_cycles calculation
	s390/idle: fix accounting with machine checks
	s390/dasd: fix hanging device offline processing
	s390/dasd: prevent inconsistent LCU device data
	s390/dasd: fix list corruption of pavgroup group list
	s390/dasd: fix list corruption of lcu list
	binder: add flag to clear buffer on txn complete
	ASoC: cx2072x: Fix doubly definitions of Playback and Capture streams
	ASoC: AMD Renoir - add DMI table to avoid the ACP mic probe (broken BIOS)
	ASoC: AMD Raven/Renoir - fix the PCI probe (PCI revision)
	staging: comedi: mf6x4: Fix AI end-of-conversion detection
	z3fold: simplify freeing slots
	z3fold: stricter locking and more careful reclaim
	perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY
	perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake
	perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()
	powerpc/perf: Exclude kernel samples while counting events in user space.
	cpufreq: intel_pstate: Use most recent guaranteed performance values
	crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
	crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
	m68k: Fix WARNING splat in pmac_zilog driver
	Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g
	EDAC/i10nm: Use readl() to access MMIO registers
	EDAC/amd64: Fix PCI component registration
	cpuset: fix race between hotplug work and later CPU offline
	dyndbg: fix use before null check
	USB: serial: mos7720: fix parallel-port state restore
	USB: serial: digi_acceleport: fix write-wakeup deadlocks
	USB: serial: keyspan_pda: fix dropped unthrottle interrupts
	USB: serial: keyspan_pda: fix write deadlock
	USB: serial: keyspan_pda: fix stalled writes
	USB: serial: keyspan_pda: fix write-wakeup use-after-free
	USB: serial: keyspan_pda: fix tx-unthrottle use-after-free
	USB: serial: keyspan_pda: fix write unthrottling
	btrfs: do not shorten unpin len for caching block groups
	btrfs: update last_byte_to_unpin in switch_commit_roots
	btrfs: fix race when defragmenting leads to unnecessary IO
	ext4: fix an IS_ERR() vs NULL check
	ext4: fix a memory leak of ext4_free_data
	ext4: fix deadlock with fs freezing and EA inodes
	ext4: don't remount read-only with errors=continue on reboot
	RISC-V: Fix usage of memblock_enforce_memory_limit
	arm64: dts: ti: k3-am65: mark dss as dma-coherent
	arm64: dts: marvell: keep SMMU disabled by default for Armada 7040 and 8040
	KVM: arm64: Introduce handling of AArch32 TTBCR2 traps
	KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits
	KVM: SVM: Remove the call to sev_platform_status() during setup
	iommu/arm-smmu: Allow implementation specific write_s2cr
	iommu/arm-smmu-qcom: Read back stream mappings
	iommu/arm-smmu-qcom: Implement S2CR quirk
	ARM: dts: pandaboard: fix pinmux for gpio user button of Pandaboard ES
	ARM: dts: at91: sama5d2: fix CAN message ram offset and size
	ARM: tegra: Populate OPP table for Tegra20 Ventana
	xprtrdma: Fix XDRBUF_SPARSE_PAGES support
	powerpc/32: Fix vmap stack - Properly set r1 before activating MMU on syscall too
	powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
	powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter
	powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
	powerpc/feature: Add CPU_FTR_NOEXECUTE to G2_LE
	powerpc/xmon: Change printk() to pr_cont()
	powerpc/8xx: Fix early debug when SMC1 is relocated
	powerpc/mm: Fix verification of MMU_FTR_TYPE_44x
	powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
	powerpc/powernv/memtrace: Don't leak kernel memory to user space
	powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
	ovl: make ioctl() safe
	ima: Don't modify file descriptor mode on the fly
	um: Remove use of asprinf in umid.c
	um: Fix time-travel mode
	ceph: fix race in concurrent __ceph_remove_cap invocations
	SMB3: avoid confusing warning message on mount to Azure
	SMB3.1.1: remove confusing mount warning when no SPNEGO info on negprot rsp
	SMB3.1.1: do not log warning message if server doesn't populate salt
	ubifs: wbuf: Don't leak kernel memory to flash
	jffs2: Fix GC exit abnormally
	jffs2: Fix ignoring mounting options problem during remounting
	fsnotify: generalize handle_inode_event()
	inotify: convert to handle_inode_event() interface
	fsnotify: fix events reported to watching parent and child
	jfs: Fix array index bounds check in dbAdjTree
	drm/panfrost: Fix job timeout handling
	drm/panfrost: Move the GPU reset bits outside the timeout handler
	platform/x86: mlx-platform: remove an unused variable
	drm/amdgpu: only set DP subconnector type on DP and eDP connectors
	drm/amd/display: Fix memory leaks in S3 resume
	drm/dp_aux_dev: check aux_dev before use in drm_dp_aux_dev_get_by_minor()
	drm/i915: Fix mismatch between misplaced vma check and vma insert
	iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack
	spi: pxa2xx: Fix use-after-free on unbind
	spi: spi-sh: Fix use-after-free on unbind
	spi: atmel-quadspi: Fix use-after-free on unbind
	spi: spi-mtk-nor: Don't leak SPI master in probe error path
	spi: ar934x: Don't leak SPI master in probe error path
	spi: davinci: Fix use-after-free on unbind
	spi: fsl: fix use of spisel_boot signal on MPC8309
	spi: gpio: Don't leak SPI master in probe error path
	spi: mxic: Don't leak SPI master in probe error path
	spi: npcm-fiu: Disable clock in probe error path
	spi: pic32: Don't leak DMA channels in probe error path
	spi: rb4xx: Don't leak SPI master in probe error path
	spi: rpc-if: Fix use-after-free on unbind
	spi: sc18is602: Don't leak SPI master in probe error path
	spi: spi-geni-qcom: Fix use-after-free on unbind
	spi: spi-qcom-qspi: Fix use-after-free on unbind
	spi: st-ssc4: Fix unbalanced pm_runtime_disable() in probe error path
	spi: synquacer: Disable clock in probe error path
	spi: mt7621: Disable clock in probe error path
	spi: mt7621: Don't leak SPI master in probe error path
	spi: atmel-quadspi: Disable clock in probe error path
	spi: atmel-quadspi: Fix AHB memory accesses
	soc: qcom: smp2p: Safely acquire spinlock without IRQs
	mtd: spinand: Fix OOB read
	mtd: parser: cmdline: Fix parsing of part-names with colons
	mtd: core: Fix refcounting for unpartitioned MTDs
	mtd: rawnand: qcom: Fix DMA sync on FLASH_STATUS register read
	mtd: rawnand: meson: fix meson_nfc_dma_buffer_release() arguments
	scsi: qla2xxx: Fix crash during driver load on big endian machines
	scsi: lpfc: Fix invalid sleeping context in lpfc_sli4_nvmet_alloc()
	scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi
	scsi: lpfc: Re-fix use after free in lpfc_rq_buf_free()
	openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
	iio: buffer: Fix demux update
	iio: adc: rockchip_saradc: fix missing clk_disable_unprepare() on error in rockchip_saradc_resume
	iio: imu: st_lsm6dsx: fix edge-trigger interrupts
	iio:light:rpr0521: Fix timestamp alignment and prevent data leak.
	iio:light:st_uvis25: Fix timestamp alignment and prevent data leak.
	iio:magnetometer:mag3110: Fix alignment and data leak issues.
	iio:pressure:mpl3115: Force alignment of buffer
	iio:imu:bmi160: Fix too large a buffer.
	iio:imu:bmi160: Fix alignment and data leak issues
	iio:adc:ti-ads124s08: Fix buffer being too long.
	iio:adc:ti-ads124s08: Fix alignment and data leak issues.
	md/cluster: block reshape with remote resync job
	md/cluster: fix deadlock when node is doing resync job
	pinctrl: sunxi: Always call chained_irq_{enter, exit} in sunxi_pinctrl_irq_handler
	clk: ingenic: Fix divider calculation with div tables
	clk: mvebu: a3700: fix the XTAL MODE pin to MPP1_9
	clk: tegra: Do not return 0 on failure
	counter: microchip-tcb-capture: Fix CMR value check
	device-dax/core: Fix memory leak when rmmod dax.ko
	dma-buf/dma-resv: Respect num_fences when initializing the shared fence list.
	driver: core: Fix list corruption after device_del()
	xen-blkback: set ring->xenblkd to NULL after kthread_stop()
	xen/xenbus: Allow watches discard events before queueing
	xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()
	xen/xenbus/xen_bus_type: Support will_handle watch callback
	xen/xenbus: Count pending messages for each watch
	xenbus/xenbus_backend: Disallow pending watch messages
	memory: jz4780_nemc: Fix an error pointer vs NULL check in probe()
	memory: renesas-rpc-if: Fix a node reference leak in rpcif_probe()
	memory: renesas-rpc-if: Return correct value to the caller of rpcif_manual_xfer()
	memory: renesas-rpc-if: Fix unbalanced pm_runtime_enable in rpcif_{enable,disable}_rpm
	libnvdimm/namespace: Fix reaping of invalidated block-window-namespace labels
	platform/x86: intel-vbtn: Allow switch events on Acer Switch Alpha 12
	tracing: Disable ftrace selftests when any tracer is running
	mt76: add back the SUPPORTS_REORDERING_BUFFER flag
	of: fix linker-section match-table corruption
	PCI: Fix pci_slot_release() NULL pointer dereference
	regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x
	remoteproc: sysmon: Ensure remote notification ordering
	thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed
	rtc: ep93xx: Fix NULL pointer dereference in ep93xx_rtc_read_time
	Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
	null_blk: Fix zone size initialization
	null_blk: Fail zone append to conventional zones
	drm/edid: fix objtool warning in drm_cvt_modes()
	x86/CPU/AMD: Save AMD NodeId as cpu_die_id
	Linux 5.10.4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I25209e79d8b9faf5382087955a29b7404bdefe38
2020-12-30 12:47:03 +01:00
Filipe Manana
8f4bf6eea3 btrfs: fix race when defragmenting leads to unnecessary IO
commit 7f458a3873ae94efe1f37c8b96c97e7298769e98 upstream.

When defragmenting we skip ranges that have holes or inline extents, so that
we don't do unnecessary IO and waste space. We do this check when calling
should_defrag_range() at btrfs_defrag_file(). However we do it without
holding the inode's lock. The reason we do it like this is to avoid
blocking other tasks for too long, that possibly want to operate on other
file ranges, since after the call to should_defrag_range() and before
locking the inode, we trigger a synchronous page cache readahead. However
before we were able to lock the inode, some other task might have punched
a hole in our range, or we may now have an inline extent there, in which
case we should not set the range for defrag anymore since that would cause
unnecessary IO and make us waste space (i.e. allocating extents to contain
zeros for a hole).

So after we locked the inode and the range in the iotree, check again if
we have holes or an inline extent, and if we do, just skip the range.

I hit this while testing my next patch that fixes races when updating an
inode's number of bytes (subject "btrfs: update the number of bytes used
by an inode atomically"), and it depends on this change in order to work
correctly. Alternatively I could rework that other patch to detect holes
and flag their range with the 'new delalloc' bit, but this itself fixes
an efficiency problem due a race that from a functional point of view is
not harmful (it could be triggered with btrfs/062 from fstests).

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:13 +01:00
Josef Bacik
5c5bc5738b btrfs: update last_byte_to_unpin in switch_commit_roots
commit 27d56e62e4748c2135650c260024e9904b8c1a0a upstream.

While writing an explanation for the need of the commit_root_sem for
btrfs_prepare_extent_commit, I realized we have a slight hole that could
result in leaked space if we have to do the old style caching.  Consider
the following scenario

 commit root
 +----+----+----+----+----+----+----+
 |\\\\|    |\\\\|\\\\|    |\\\\|\\\\|
 +----+----+----+----+----+----+----+
 0    1    2    3    4    5    6    7

 new commit root
 +----+----+----+----+----+----+----+
 |    |    |    |\\\\|    |    |\\\\|
 +----+----+----+----+----+----+----+
 0    1    2    3    4    5    6    7

Prior to this patch, we run btrfs_prepare_extent_commit, which updates
the last_byte_to_unpin, and then we subsequently run
switch_commit_roots.  In this example lets assume that
caching_ctl->progress == 1 at btrfs_prepare_extent_commit() time, which
means that cache->last_byte_to_unpin == 1.  Then we go and do the
switch_commit_roots(), but in the meantime the caching thread has made
some more progress, because we drop the commit_root_sem and re-acquired
it.  Now caching_ctl->progress == 3.  We swap out the commit root and
carry on to unpin.

The race can happen like:

  1) The caching thread was running using the old commit root when it
     found the extent for [2, 3);

  2) Then it released the commit_root_sem because it was in the last
     item of a leaf and the semaphore was contended, and set ->progress
     to 3 (value of 'last'), as the last extent item in the current leaf
     was for the extent for range [2, 3);

  3) Next time it gets the commit_root_sem, will start using the new
     commit root and search for a key with offset 3, so it never finds
     the hole for [2, 3).

  So the caching thread never saw [2, 3) as free space in any of the
  commit roots, and by the time finish_extent_commit() was called for
  the range [0, 3), ->last_byte_to_unpin was 1, so it only returned the
  subrange [0, 1) to the free space cache, skipping [2, 3).

In the unpin code we have last_byte_to_unpin == 1, so we unpin [0,1),
but do not unpin [2,3).  However because caching_ctl->progress == 3 we
do not see the newly freed section of [2,3), and thus do not add it to
our free space cache.  This results in us missing a chunk of free space
in memory (on disk too, unless we have a power failure before writing
the free space cache to disk).

Fix this by making sure the ->last_byte_to_unpin is set at the same time
that we swap the commit roots, this ensures that we will always be
consistent.

CC: stable@vger.kernel.org # 5.8+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ update changelog with Filipe's review comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:13 +01:00
Josef Bacik
56d1654dc2 btrfs: do not shorten unpin len for caching block groups
commit 9076dbd5ee837c3882fc42891c14cecd0354a849 upstream.

While fixing up our ->last_byte_to_unpin locking I noticed that we will
shorten len based on ->last_byte_to_unpin if we're caching when we're
adding back the free space.  This is correct for the free space, as we
cannot unpin more than ->last_byte_to_unpin, however we use len to
adjust the ->bytes_pinned counters and such, which need to track the
actual pinned usage.  This could result in
WARN_ON(space_info->bytes_pinned) triggering at unmount time.

Fix this by using a local variable for the amount to add to free space
cache, and leave len untouched in this case.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:13 +01:00
Greg Kroah-Hartman
b19651bfcc Merge c84e1efae0 ("Merge tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic") into android-mainline
Steps on the way to 5.10-rc5

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I644783003a83186a34cdbb753aa492f4350f49ee
2020-11-28 08:23:09 +01:00
Linus Torvalds
a17a3ca55e for-5.10-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl/BGBQACgkQxWXV+ddt
 WDtRYQ/+IUGjJ4l6MyL3PwLgTVabKsSpm2R3y3M/0tVJ0FIhDXYbkpjB2CkpIdcH
 jUvHnRL4H59hwG6rwlXU2oC298FNbbrLGIU+c9DR50RuyQCDGnT02XvxwfDIEFzp
 WLNZ/CAhRkm6boj//70lV26BpeXT59KMYwNixfCNTXq9Ir0qYHHCGg4cEBQLS++2
 JUU8XVTLURIYiFOLbwmABI7V43OgDhdORIr+qnR8xjCUyhusZsjVVbvIdW3BDi/S
 wK7NJsfuqgsF0zD9URJjpwTFiJL9SvBLWR8JnM9NiLW3ZbkGBL+efL6mdWuH7534
 gruJRS2zYPMO2/Kpjy/31CWLap3PUSD3i8cKF+uo3liojWuSUhN8kfvcNxJVd5se
 NkEK+4zOjsDIVbv7gcjThSv4KTnOUO/XfN9TWUMuduaMBmGQNaQut1FpGV98utiK
 yW6x8xqcR4SI+lqY6ILqCK+qUHf19BLSsuyzZdTIontKKRA9F9hY8a4XTZuzTWml
 BGYmFGP640vOo8C9GjrQfpAwa7CB/DnF/cg1AAmuZ8vrEm9zYjmauFKK8ZPcveA3
 KGrnmIlYjhAIX16oRbfwOgj9D2xa1loBzJyHQHByvCMXGVFBnqRTRANRHFrdQWJB
 qh9+J4EJcUXPE9WGHxAW/g9vpFkV7IRABHs7aUB8zApxI9nGA0Q=
 =kcxn
 -----END PGP SIGNATURE-----

Merge tag 'for-5.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few fixes for various warnings that accumulated over past two weeks:

   - tree-checker: add missing return values for some errors

   - lockdep fixes
      - when reading qgroup config and starting quota rescan
      - reverse order of quota ioctl lock and VFS freeze lock

   - avoid accessing potentially stale fs info during device scan,
     reported by syzbot

   - add scope NOFS protection around qgroup relation changes

   - check for running transaction before flushing qgroups

   - fix tracking of new delalloc ranges for some cases"

* tag 'for-5.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix lockdep splat when enabling and disabling qgroups
  btrfs: do nofs allocations when adding and removing qgroup relations
  btrfs: fix lockdep splat when reading qgroup config on mount
  btrfs: tree-checker: add missing returns after data_ref alignment checks
  btrfs: don't access possibly stale fs_info data for printing duplicate device
  btrfs: tree-checker: add missing return after error in root_item
  btrfs: qgroup: don't commit transaction when we already hold the handle
  btrfs: fix missing delalloc new bit for new delalloc ranges
2020-11-27 12:42:13 -08:00
Filipe Manana
a855fbe692 btrfs: fix lockdep splat when enabling and disabling qgroups
When running test case btrfs/017 from fstests, lockdep reported the
following splat:

  [ 1297.067385] ======================================================
  [ 1297.067708] WARNING: possible circular locking dependency detected
  [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 1297.068322] ------------------------------------------------------
  [ 1297.068629] btrfs/189080 is trying to acquire lock:
  [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.069274]
		 but task is already holding lock:
  [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
  [ 1297.070219]
		 which lock already depends on the new lock.

  [ 1297.071131]
		 the existing dependency chain (in reverse order) is:
  [ 1297.071721]
		 -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}:
  [ 1297.072375]        lock_acquire+0xd8/0x490
  [ 1297.072710]        __mutex_lock+0xa3/0xb30
  [ 1297.073061]        btrfs_qgroup_inherit+0x59/0x6a0 [btrfs]
  [ 1297.073421]        create_subvol+0x194/0x990 [btrfs]
  [ 1297.073780]        btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
  [ 1297.074133]        __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
  [ 1297.074498]        btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
  [ 1297.074872]        btrfs_ioctl+0x1a90/0x36f0 [btrfs]
  [ 1297.075245]        __x64_sys_ioctl+0x83/0xb0
  [ 1297.075617]        do_syscall_64+0x33/0x80
  [ 1297.075993]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.076380]
		 -> #0 (sb_internal#2){.+.+}-{0:0}:
  [ 1297.077166]        check_prev_add+0x91/0xc60
  [ 1297.077572]        __lock_acquire+0x1740/0x3110
  [ 1297.077984]        lock_acquire+0xd8/0x490
  [ 1297.078411]        start_transaction+0x3c5/0x760 [btrfs]
  [ 1297.078853]        btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.079323]        btrfs_ioctl+0x2c60/0x36f0 [btrfs]
  [ 1297.079789]        __x64_sys_ioctl+0x83/0xb0
  [ 1297.080232]        do_syscall_64+0x33/0x80
  [ 1297.080680]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.081139]
		 other info that might help us debug this:

  [ 1297.082536]  Possible unsafe locking scenario:

  [ 1297.083510]        CPU0                    CPU1
  [ 1297.084005]        ----                    ----
  [ 1297.084500]   lock(&fs_info->qgroup_ioctl_lock);
  [ 1297.084994]                                lock(sb_internal#2);
  [ 1297.085485]                                lock(&fs_info->qgroup_ioctl_lock);
  [ 1297.085974]   lock(sb_internal#2);
  [ 1297.086454]
		  *** DEADLOCK ***
  [ 1297.087880] 3 locks held by btrfs/189080:
  [ 1297.088324]  #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs]
  [ 1297.088799]  #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
  [ 1297.089284]  #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
  [ 1297.089771]
		 stack backtrace:
  [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 1297.092123] Call Trace:
  [ 1297.092629]  dump_stack+0x8d/0xb5
  [ 1297.093115]  check_noncircular+0xff/0x110
  [ 1297.093596]  check_prev_add+0x91/0xc60
  [ 1297.094076]  ? kvm_clock_read+0x14/0x30
  [ 1297.094553]  ? kvm_sched_clock_read+0x5/0x10
  [ 1297.095029]  __lock_acquire+0x1740/0x3110
  [ 1297.095510]  lock_acquire+0xd8/0x490
  [ 1297.095993]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.096476]  start_transaction+0x3c5/0x760 [btrfs]
  [ 1297.096962]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.097451]  btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.097941]  ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
  [ 1297.098429]  btrfs_ioctl+0x2c60/0x36f0 [btrfs]
  [ 1297.098904]  ? do_user_addr_fault+0x20c/0x430
  [ 1297.099382]  ? kvm_clock_read+0x14/0x30
  [ 1297.099854]  ? kvm_sched_clock_read+0x5/0x10
  [ 1297.100328]  ? sched_clock+0x5/0x10
  [ 1297.100801]  ? sched_clock_cpu+0x12/0x180
  [ 1297.101272]  ? __x64_sys_ioctl+0x83/0xb0
  [ 1297.101739]  __x64_sys_ioctl+0x83/0xb0
  [ 1297.102207]  do_syscall_64+0x33/0x80
  [ 1297.102673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.103148] RIP: 0033:0x7f773ff65d87

This is because during the quota enable ioctl we lock first the mutex
qgroup_ioctl_lock and then start a transaction, and starting a transaction
acquires a fs freeze semaphore (at the VFS level). However, every other
code path, except for the quota disable ioctl path, we do the opposite:
we start a transaction and then lock the mutex.

So fix this by making the quota enable and disable paths to start the
transaction without having the mutex locked, and then, after starting the
transaction, lock the mutex and check if some other task already enabled
or disabled the quotas, bailing with success if that was the case.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:43 +01:00
Filipe Manana
7aa6d35984 btrfs: do nofs allocations when adding and removing qgroup relations
When adding or removing a qgroup relation we are doing a GFP_KERNEL
allocation which is not safe because we are holding a transaction
handle open and that can make us deadlock if the allocator needs to
recurse into the filesystem. So just surround those calls with a
nofs context.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:40 +01:00
Filipe Manana
3d05cad3c3 btrfs: fix lockdep splat when reading qgroup config on mount
Lockdep reported the following splat when running test btrfs/190 from
fstests:

  [ 9482.126098] ======================================================
  [ 9482.126184] WARNING: possible circular locking dependency detected
  [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 9482.126365] ------------------------------------------------------
  [ 9482.126456] mount/24187 is trying to acquire lock:
  [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.126647]
		 but task is already holding lock:
  [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.126886]
		 which lock already depends on the new lock.

  [ 9482.127078]
		 the existing dependency chain (in reverse order) is:
  [ 9482.127213]
		 -> #1 (btrfs-quota-00){++++}-{3:3}:
  [ 9482.127366]        lock_acquire+0xd8/0x490
  [ 9482.127436]        down_read_nested+0x45/0x220
  [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
  [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
  [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
  [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
  [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
  [ 9482.128039]        process_one_work+0x24e/0x5e0
  [ 9482.128110]        worker_thread+0x50/0x3b0
  [ 9482.128181]        kthread+0x153/0x170
  [ 9482.128256]        ret_from_fork+0x22/0x30
  [ 9482.128327]
		 -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
  [ 9482.128464]        check_prev_add+0x91/0xc60
  [ 9482.128551]        __lock_acquire+0x1740/0x3110
  [ 9482.128623]        lock_acquire+0xd8/0x490
  [ 9482.130029]        __mutex_lock+0xa3/0xb30
  [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.133325]        legacy_get_tree+0x30/0x60
  [ 9482.133866]        vfs_get_tree+0x28/0xe0
  [ 9482.134392]        fc_mount+0xe/0x40
  [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
  [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.135942]        legacy_get_tree+0x30/0x60
  [ 9482.136444]        vfs_get_tree+0x28/0xe0
  [ 9482.136949]        path_mount+0x2d7/0xa70
  [ 9482.137438]        do_mount+0x75/0x90
  [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
  [ 9482.138400]        do_syscall_64+0x33/0x80
  [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.139346]
		 other info that might help us debug this:

  [ 9482.140735]  Possible unsafe locking scenario:

  [ 9482.141594]        CPU0                    CPU1
  [ 9482.142011]        ----                    ----
  [ 9482.142411]   lock(btrfs-quota-00);
  [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
  [ 9482.143216]                                lock(btrfs-quota-00);
  [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
  [ 9482.144056]
		  *** DEADLOCK ***

  [ 9482.145242] 2 locks held by mount/24187:
  [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
  [ 9482.146061]  #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.146509]
		 stack backtrace:
  [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 9482.148709] Call Trace:
  [ 9482.149169]  dump_stack+0x8d/0xb5
  [ 9482.149628]  check_noncircular+0xff/0x110
  [ 9482.150090]  check_prev_add+0x91/0xc60
  [ 9482.150561]  ? kvm_clock_read+0x14/0x30
  [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
  [ 9482.151470]  __lock_acquire+0x1740/0x3110
  [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.152402]  lock_acquire+0xd8/0x490
  [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.153354]  __mutex_lock+0xa3/0xb30
  [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.157567]  ? kfree+0x31f/0x3e0
  [ 9482.158030]  legacy_get_tree+0x30/0x60
  [ 9482.158489]  vfs_get_tree+0x28/0xe0
  [ 9482.158947]  fc_mount+0xe/0x40
  [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
  [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.160805]  ? kfree+0x31f/0x3e0
  [ 9482.161260]  ? legacy_get_tree+0x30/0x60
  [ 9482.161714]  legacy_get_tree+0x30/0x60
  [ 9482.162166]  vfs_get_tree+0x28/0xe0
  [ 9482.162616]  path_mount+0x2d7/0xa70
  [ 9482.163070]  do_mount+0x75/0x90
  [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
  [ 9482.163986]  do_syscall_64+0x33/0x80
  [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.164902] RIP: 0033:0x7f51e907caaa

This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.

A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.

Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:30 +01:00
David Sterba
6d06b0ad94 btrfs: tree-checker: add missing returns after data_ref alignment checks
There are sectorsize alignment checks that are reported but then
check_extent_data_ref continues. This was not intended, wrong alignment
is not a minor problem and we should return with error.

CC: stable@vger.kernel.org # 5.4+
Fixes: 0785a9aacf ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:21 +01:00
Johannes Thumshirn
0697d9a610 btrfs: don't access possibly stale fs_info data for printing duplicate device
Syzbot reported a possible use-after-free when printing a duplicate device
warning device_list_add().

At this point it can happen that a btrfs_device::fs_info is not correctly
setup yet, so we're accessing stale data, when printing the warning
message using the btrfs_printk() wrappers.

  ==================================================================
  BUG: KASAN: use-after-free in btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
  Read of size 8 at addr ffff8880878e06a8 by task syz-executor225/7068

  CPU: 1 PID: 7068 Comm: syz-executor225 Not tainted 5.9.0-rc5-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x1d6/0x29e lib/dump_stack.c:118
   print_address_description+0x66/0x620 mm/kasan/report.c:383
   __kasan_report mm/kasan/report.c:513 [inline]
   kasan_report+0x132/0x1d0 mm/kasan/report.c:530
   btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
   device_list_add+0x1a88/0x1d60 fs/btrfs/volumes.c:943
   btrfs_scan_one_device+0x196/0x490 fs/btrfs/volumes.c:1359
   btrfs_mount_root+0x48f/0xb60 fs/btrfs/super.c:1634
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
   btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x179d/0x29e0 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount+0x126/0x180 fs/namespace.c:3390
   do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x44840a
  RSP: 002b:00007ffedfffd608 EFLAGS: 00000293 ORIG_RAX: 00000000000000a5
  RAX: ffffffffffffffda RBX: 00007ffedfffd670 RCX: 000000000044840a
  RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffedfffd630
  RBP: 00007ffedfffd630 R08: 00007ffedfffd670 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000001a
  R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000003

  Allocated by task 6945:
   kasan_save_stack mm/kasan/common.c:48 [inline]
   kasan_set_track mm/kasan/common.c:56 [inline]
   __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
   kmalloc_node include/linux/slab.h:577 [inline]
   kvmalloc_node+0x81/0x110 mm/util.c:574
   kvmalloc include/linux/mm.h:757 [inline]
   kvzalloc include/linux/mm.h:765 [inline]
   btrfs_mount_root+0xd0/0xb60 fs/btrfs/super.c:1613
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
   btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x179d/0x29e0 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount+0x126/0x180 fs/namespace.c:3390
   do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 6945:
   kasan_save_stack mm/kasan/common.c:48 [inline]
   kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
   kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
   __kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
   __cache_free mm/slab.c:3418 [inline]
   kfree+0x113/0x200 mm/slab.c:3756
   deactivate_locked_super+0xa7/0xf0 fs/super.c:335
   btrfs_mount_root+0x72b/0xb60 fs/btrfs/super.c:1678
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
   btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x179d/0x29e0 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount+0x126/0x180 fs/namespace.c:3390
   do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff8880878e0000
   which belongs to the cache kmalloc-16k of size 16384
  The buggy address is located 1704 bytes inside of
   16384-byte region [ffff8880878e0000, ffff8880878e4000)
  The buggy address belongs to the page:
  page:0000000060704f30 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x878e0
  head:0000000060704f30 order:3 compound_mapcount:0 compound_pincount:0
  flags: 0xfffe0000010200(slab|head)
  raw: 00fffe0000010200 ffffea00028e9a08 ffffea00021e3608 ffff8880aa440b00
  raw: 0000000000000000 ffff8880878e0000 0000000100000001 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff8880878e0580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880878e0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  >ffff8880878e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
				    ^
   ffff8880878e0700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880878e0780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================

The syzkaller reproducer for this use-after-free crafts a filesystem image
and loop mounts it twice in a loop. The mount will fail as the crafted
image has an invalid chunk tree. When this happens btrfs_mount_root() will
call deactivate_locked_super(), which then cleans up fs_info and
fs_info::sb. If a second thread now adds the same block-device to the
filesystem, it will get detected as a duplicate device and
device_list_add() will reject the duplicate and print a warning. But as
the fs_info pointer passed in is non-NULL this will result in a
use-after-free.

Instead of printing possibly uninitialized or already freed memory in
btrfs_printk(), explicitly pass in a NULL fs_info so the printing of the
device name will be skipped altogether.

There was a slightly different approach discussed in
https://lore.kernel.org/linux-btrfs/20200114060920.4527-1-anand.jain@oracle.com/t/#u

Link: https://lore.kernel.org/linux-btrfs/000000000000c9e14b05afcc41ba@google.com
Reported-by: syzbot+582e66e5edf36a22c7b0@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:12 +01:00
Daniel Xu
1a49a97df6 btrfs: tree-checker: add missing return after error in root_item
There's a missing return statement after an error is found in the
root_item, this can cause further problems when a crafted image triggers
the error.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210181
Fixes: 259ee7754b ("btrfs: tree-checker: Add ROOT_ITEM check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13 22:18:10 +01:00
Qu Wenruo
6f23277a49 btrfs: qgroup: don't commit transaction when we already hold the handle
[BUG]
When running the following script, btrfs will trigger an ASSERT():

  #/bin/bash
  mkfs.btrfs -f $dev
  mount $dev $mnt
  xfs_io -f -c "pwrite 0 1G" $mnt/file
  sync
  btrfs quota enable $mnt
  btrfs quota rescan -w $mnt

  # Manually set the limit below current usage
  btrfs qgroup limit 512M $mnt $mnt

  # Crash happens
  touch $mnt/file

The dmesg looks like this:

  assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3230!
  invalid opcode: 0000 [#1] SMP PTI
  RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
   btrfs_commit_transaction.cold+0x11/0x5d [btrfs]
   try_flush_qgroup+0x67/0x100 [btrfs]
   __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs]
   btrfs_delayed_update_inode+0xaa/0x350 [btrfs]
   btrfs_update_inode+0x9d/0x110 [btrfs]
   btrfs_dirty_inode+0x5d/0xd0 [btrfs]
   touch_atime+0xb5/0x100
   iterate_dir+0xf1/0x1b0
   __x64_sys_getdents64+0x78/0x110
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fb5afe588db

[CAUSE]
In try_flush_qgroup(), we assume we don't hold a transaction handle at
all.  This is true for data reservation and mostly true for metadata.
Since data space reservation always happens before we start a
transaction, and for most metadata operation we reserve space in
start_transaction().

But there is an exception, btrfs_delayed_inode_reserve_metadata().
It holds a transaction handle, while still trying to reserve extra
metadata space.

When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
will join current transaction and commit, while we still have
transaction handle from qgroup code.

[FIX]
Let's check current->journal before we join the transaction.

If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
we are not holding a transaction, thus are able to join and then commit
transaction.

If current->journal is a valid transaction handle, we avoid committing
transaction and just end it

This is less effective than committing current transaction, as it won't
free metadata reserved space, but we may still free some data space
before new data writes.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634
Fixes: c53e965360 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13 22:17:57 +01:00
Filipe Manana
c334730988 btrfs: fix missing delalloc new bit for new delalloc ranges
When doing a buffered write, through one of the write family syscalls, we
look for ranges which currently don't have allocated extents and set the
'delalloc new' bit on them, so that we can report a correct number of used
blocks to the stat(2) syscall until delalloc is flushed and ordered extents
complete.

However there are a few other places where we can do a buffered write
against a range that is mapped to a hole (no extent allocated) and where
we do not set the 'new delalloc' bit. Those places are:

- Doing a memory mapped write against a hole;

- Cloning an inline extent into a hole starting at file offset 0;

- Calling btrfs_cont_expand() when the i_size of the file is not aligned
  to the sector size and is located in a hole. For example when cloning
  to a destination offset beyond EOF.

So after such cases, until the corresponding delalloc range is flushed and
the respective ordered extents complete, we can report an incorrect number
of blocks used through the stat(2) syscall.

In some cases we can end up reporting 0 used blocks to stat(2), which is a
particular bad value to report as it may mislead tools to think a file is
completely sparse when its i_size is not zero, making them skip reading
any data, an undesired consequence for tools such as archivers and other
backup tools, as reported a long time ago in the following thread (and
other past threads):

  https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00001.html

Example reproducer:

  $ cat reproducer.sh
  #!/bin/bash

  MNT=/mnt/sdi
  DEV=/dev/sdi

  mkfs.btrfs -f $DEV > /dev/null
  # mkfs.xfs -f $DEV > /dev/null
  # mkfs.ext4 -F $DEV > /dev/null
  # mkfs.f2fs -f $DEV > /dev/null
  mount $DEV $MNT

  xfs_io -f -c "truncate 64K"   \
      -c "mmap -w 0 64K"        \
      -c "mwrite -S 0xab 0 64K" \
      -c "munmap"               \
      $MNT/foo

  blocks_used=$(stat -c %b $MNT/foo)
  echo "blocks used: $blocks_used"

  if [ $blocks_used -eq 0 ]; then
      echo "ERROR: blocks used is 0"
  fi

  umount $DEV

  $ ./reproducer.sh
  blocks used: 0
  ERROR: blocks used is 0

So move the logic that decides to set the 'delalloc bit' bit into the
function btrfs_set_extent_delalloc(), since that is what we use for all
those missing cases as well as for the cases that currently work well.

This change is also preparatory work for an upcoming patch that fixes
other problems related to tracking and reporting the number of bytes used
by an inode.

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13 22:15:59 +01:00
Greg Kroah-Hartman
7f6480e40c Merge eccc876724 ("Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs") into android-mainline
Steps on the way to 5.10-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9e0fa89c0f6f306fe802ae95c8d01d9ba558e111
2020-11-12 08:02:21 +01:00
Linus Torvalds
e2f0c565ec for-5.10-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl+qtyAACgkQxWXV+ddt
 WDusYg/+P1SroRe2n33Pi6v9w47Luqjfi5qMfdrV/ex7g9bP/7dVPvAa0YJr7CpA
 UHpvcgWMFK2e29oOoeEoYXukHQ4BKtC6F5L0MPgVJocDT6xsAWM3v98VZn69Olcu
 TcUNkxUVbj7OBbQDaINV8dXnTbQWNOsfNlYXH+nPgWqrjSDbPoLkXIEVfZ9CTBeZ
 P/qEshkTXqvx1Pux/uRcKrMSu+lSFICKlLvky0b9gRpg4usVlF8jlGQrvJHQvqnP
 lECR1cb7/nf2PQ+HdPpgigD24bddiiORoyGW68Q1zZHgs+kGfL6p4M3WF04WINrV
 Taiv7WVZ6qHiEB+LDxlOx2cy0Z6YzFOaGSASz+Hh64hvOezBOVGvCF1U9U/Dp+MC
 n6QjUiw6c0rIjbdoxpTfQETCdIt/l3qXfOVEr9Zjr2KEbasLsZXdGSf3ydr81Uff
 94CwrXp2wq429zu4mdCfOwihF/288+VrN8XRfkSy5RFQ5hHVnZBFQO4KbRIQ4i5X
 ZIjHQPX0jA/XN/jpUde/RJL5AyLz20n0o9I3frjXwSs+rvU3f0wD//fxmXlRUshM
 hsFXFKO0VdaFtoywVIf7VK/fDsKQhiq+9Yg48A8ylpk+W7meMjeDYuYMIEhMQX3m
 S1OMG1Qf27pWXD6KEzpaqzI4SrYBOGVDsX8qxMxRws7n55koVJc=
 =3FTR
 -----END PGP SIGNATURE-----

Merge tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A handful of minor fixes and updates:

   - handle missing device replace item on mount (syzbot report)

   - fix space reservation calculation when finishing relocation

   - fix memory leak on error path in ref-verify (debugging feature)

   - fix potential overflow during defrag on 32bit arches

   - minor code update to silence smatch warning

   - minor error message updates"

* tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod
  btrfs: dev-replace: fail mount if we don't have replace item with target device
  btrfs: scrub: update message regarding read-only status
  btrfs: clean up NULL checks in qgroup_unreserve_range()
  btrfs: fix min reserved size calculation in merge_reloc_root
  btrfs: print the block rsv type when we fail our reservation
  btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
2020-11-10 10:07:15 -08:00
Dinghao Liu
468600c6ec btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod
There is one error handling path that does not free ref, which may cause
a minor memory leak.

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:03:39 +01:00
Anand Jain
cf89af146b btrfs: dev-replace: fail mount if we don't have replace item with target device
If there is a device BTRFS_DEV_REPLACE_DEVID without the device replace
item, then it means the filesystem is inconsistent state. This is either
corruption or a crafted image.  Fail the mount as this needs a closer
look what is actually wrong.

As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace
item, in __btrfs_free_extra_devids() we determine that there is an
extra device, and free those extra devices but continue to mount the
device.
However, we were wrong in keeping tack of the rw_devices so the syzbot
testcase failed:

  WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x198/0x1fd lib/dump_stack.c:118
   panic+0x347/0x7c0 kernel/panic.c:231
   __warn.cold+0x20/0x46 kernel/panic.c:600
   report_bug+0x1bd/0x210 lib/bug.c:198
   handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
   exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
   asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
  RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
  RSP: 0018:ffffc900091777e0 EFLAGS: 00010246
  RAX: 0000000000040000 RBX: ffffffffffffffff RCX: ffffc9000c8b7000
  RDX: 0000000000040000 RSI: ffffffff83097f47 RDI: 0000000000000007
  RBP: dffffc0000000000 R08: 0000000000000001 R09: ffff8880988a187f
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff88809593a130
  R13: ffff88809593a1ec R14: ffff8880988a1908 R15: ffff88809593a050
   close_fs_devices fs/btrfs/volumes.c:1193 [inline]
   btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
   open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
   btrfs_fill_super fs/btrfs/super.c:1316 [inline]
   btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672

The fix here is, when we determine that there isn't a replace item
then fail the mount if there is a replace target device (devid 0).

CC: stable@vger.kernel.org # 4.19+
Reported-by: syzbot+4cfe71a4da060be47502@syzkaller.appspotmail.com
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:03:31 +01:00
David Sterba
a4852cf268 btrfs: scrub: update message regarding read-only status
Based on user feedback update the message printed when scrub fails to
start due to write requirements. To make a distinction add a device id
to the messages.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:02:58 +01:00
Dan Carpenter
f07728d541 btrfs: clean up NULL checks in qgroup_unreserve_range()
Smatch complains that this code dereferences "entry" before checking
whether it's NULL on the next line.  Fortunately, rb_entry() will never
return NULL so it doesn't cause a problem.  We can clean up the NULL
checking a bit to silence the warning and make the code more clear.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:02:20 +01:00
Josef Bacik
fca3a45d08 btrfs: fix min reserved size calculation in merge_reloc_root
The minimum reserve size was adjusted to take into account the height of
the tree we are merging, however we can have a root with a level == 0.
What we want is root_level + 1 to get the number of nodes we may have to
cow.  This fixes the enospc_debug warning pops with btrfs/101.

Nikolay: this fixes failures on btrfs/060 btrfs/062 btrfs/063 and
btrfs/195 That I was seeing, the call trace was:

  [ 3680.515564] ------------[ cut here ]------------
  [ 3680.515566] BTRFS: block rsv returned -28
  [ 3680.515585] WARNING: CPU: 2 PID: 8339 at fs/btrfs/block-rsv.c:521 btrfs_use_block_rsv+0x162/0x180
  [ 3680.515587] Modules linked in:
  [ 3680.515591] CPU: 2 PID: 8339 Comm: btrfs Tainted: G        W         5.9.0-rc8-default #95
  [ 3680.515593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
  [ 3680.515595] RIP: 0010:btrfs_use_block_rsv+0x162/0x180
  [ 3680.515600] RSP: 0018:ffffa01ac9753910 EFLAGS: 00010282
  [ 3680.515602] RAX: 0000000000000000 RBX: ffff984b34200000 RCX: 0000000000000027
  [ 3680.515604] RDX: 0000000000000027 RSI: 0000000000000000 RDI: ffff984b3bd19e28
  [ 3680.515606] RBP: 0000000000004000 R08: ffff984b3bd19e20 R09: 0000000000000001
  [ 3680.515608] R10: 0000000000000004 R11: 0000000000000046 R12: ffff984b264fdc00
  [ 3680.515609] R13: ffff984b13149000 R14: 00000000ffffffe4 R15: ffff984b34200000
  [ 3680.515613] FS:  00007f4e2912b8c0(0000) GS:ffff984b3bd00000(0000) knlGS:0000000000000000
  [ 3680.515615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 3680.515617] CR2: 00007fab87122150 CR3: 0000000118e42000 CR4: 00000000000006e0
  [ 3680.515620] Call Trace:
  [ 3680.515627]  btrfs_alloc_tree_block+0x8b/0x340
  [ 3680.515633]  ? __lock_acquire+0x51a/0xac0
  [ 3680.515646]  alloc_tree_block_no_bg_flush+0x4f/0x60
  [ 3680.515651]  __btrfs_cow_block+0x14e/0x7e0
  [ 3680.515662]  btrfs_cow_block+0x144/0x2c0
  [ 3680.515670]  merge_reloc_root+0x4d4/0x610
  [ 3680.515675]  ? btrfs_lookup_fs_root+0x78/0x90
  [ 3680.515686]  merge_reloc_roots+0xee/0x280
  [ 3680.515695]  relocate_block_group+0x2ce/0x5e0
  [ 3680.515704]  btrfs_relocate_block_group+0x16e/0x310
  [ 3680.515711]  btrfs_relocate_chunk+0x38/0xf0
  [ 3680.515716]  btrfs_shrink_device+0x200/0x560
  [ 3680.515728]  btrfs_rm_device+0x1ae/0x6a6
  [ 3680.515744]  ? _copy_from_user+0x6e/0xb0
  [ 3680.515750]  btrfs_ioctl+0x1afe/0x28c0
  [ 3680.515755]  ? find_held_lock+0x2b/0x80
  [ 3680.515760]  ? do_user_addr_fault+0x1f8/0x418
  [ 3680.515773]  ? __x64_sys_ioctl+0x77/0xb0
  [ 3680.515775]  __x64_sys_ioctl+0x77/0xb0
  [ 3680.515781]  do_syscall_64+0x31/0x70
  [ 3680.515785]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: Nikolay Borisov <nborisov@suse.com>
Fixes: 44d354abf3 ("btrfs: relocation: review the call sites which can be interrupted by signal")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:02:07 +01:00
Josef Bacik
e38fdb7167 btrfs: print the block rsv type when we fail our reservation
To help with debugging, print the type of the block rsv when we fail to
use our target block rsv in btrfs_use_block_rsv.

This now produces:

 [  544.672035] BTRFS: block rsv 1 returned -28

which is still cryptic without consulting the enum in block-rsv.h but I
guess it's better than nothing.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note from Nikolay ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:02:05 +01:00
Matthew Wilcox (Oracle)
a1fbc6750e btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
On 32-bit systems, this shift will overflow for files larger than 4GB as
start_index is unsigned long while the calls to btrfs_delalloc_*_space
expect u64.

CC: stable@vger.kernel.org # 4.4+
Fixes: df480633b8 ("btrfs: extent-tree: Switch to new delalloc space reserve and release")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
[ define the variable instead of repeating the shift ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:01:42 +01:00
Greg Kroah-Hartman
b0748cf3e1 Merge c2dc4c073f ("Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost") into android-mainline
Steps on the way to 5.10-rc2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I686e55205b113f69b9ea8a22c56d86a572e8c603
2020-11-02 11:59:56 +01:00
Linus Torvalds
f5d808567a for-5.10-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl+cNwcACgkQxWXV+ddt
 WDth+g//esuxzGaUenIuMgnT8ofnte4I9Kst8ShBAl1Asglq1p1WBJfYtHbMMqeE
 CmJ32hGs6JBoaB/Wdta41F840BxRbaOCyo10oB6jx5QV2rCh99PvPhwmsmaUGQHG
 02umRPRJcILReB53LwJvLQTQMVfuUqfBV2TyyLHrY8R8pIGsG61p1d3cgg+NWtKQ
 c3RC2eH/uIeQTDaZX0ZOpa6TBPOs+MNDiF5d3UxvpiBXWum3yijdXEfhpfiOom4A
 eCH+lj+iQQ4EtoKjXi0q7ziU1eAKWkQ3A4rMo9fr7iQkQIVkvZc2d9WALsF+znXi
 f2ochi3msemX19I5g0RQ1s5XExCKBSbr6v934BDlpAZ8Pc4IFz9rkLZwlMYp/SVJ
 9aYZOG9Rm0P3DaiYPvKZBcxsTRvxXlXlVWfMCGUB4KKaGyawJgSZxnegxP8nWb2C
 +VrvVw2NJsoioQTX+2OqUc2FCuDib7ehjH80q9IXLlozfoKA4Lfj1G6qCUEsuffI
 NbW5Ndkaza/qw3mOTEn/sU9+kzr1P8CVtSFWcI/GJqp/kisTAYtyfU/GWD9JLi8s
 uaHGAZXdCEVNJ2opgnLiW0ZPNMm4oSernn8JckhsUUGUJwJ4c1pGEHSzIHEF+d9E
 HBmjQN6qcW4DDUb1rSQEBsFoKXeebp7P6usNNrUqau1wWKBworA=
 =bR/o
 -----END PGP SIGNATURE-----

Merge tag 'for-5.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - lockdep fixes:
     - drop path locks before manipulating sysfs objects or qgroups
     - preliminary fixes before tree locks get switched to rwsem
     - use annotated seqlock

 - build warning fixes (printk format)

 - fix relocation vs fallocate race

 - tree checker properly validates number of stripes and parity

 - readahead vs device replace fixes

 - iomap dio fix for unnecessary buffered io fallback

* tag 'for-5.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: convert data_seqcount to seqcount_mutex_t
  btrfs: don't fallback to buffered read if we don't need to
  btrfs: add a helper to read the tree_root commit root for backref lookup
  btrfs: drop the path before adding qgroup items when enabling qgroups
  btrfs: fix readahead hang and use-after-free after removing a device
  btrfs: fix use-after-free on readahead extent after failure to create it
  btrfs: tree-checker: validate number of chunk stripes and parity
  btrfs: tree-checker: fix incorrect printk format
  btrfs: drop the path before adding block group sysfs files
  btrfs: fix relocation failure due to race with fallocate
2020-10-30 13:29:49 -07:00
Davidlohr Bueso
d5c8238849 btrfs: convert data_seqcount to seqcount_mutex_t
By doing so we can associate the sequence counter to the chunk_mutex
for lockdep purposes (compiled-out otherwise), the mutex is otherwise
used on the write side.
Also avoid explicitly disabling preemption around the write region as it
will now be done automatically by the seqcount machinery based on the
lock type.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-27 15:11:51 +01:00
Johannes Thumshirn
0425e7badb btrfs: don't fallback to buffered read if we don't need to
Since we switched to the iomap infrastructure in b5ff9f1a96e8f ("btrfs:
switch to iomap for direct IO") we're calling generic_file_buffered_read()
directly and not via generic_file_read_iter() anymore.

If the read could read everything there is no need to bother calling
generic_file_buffered_read(), like it is handled in
generic_file_read_iter().

If we call generic_file_buffered_read() in this case we can hit a
situation where we do an invalid readahead and cause this UBSAN splat
in fstest generic/091:

  run fstests generic/091 at 2020-10-21 10:52:32
  ================================================================================
  UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
  shift exponent 64 is too large for 64-bit type 'long unsigned int'
  CPU: 0 PID: 656 Comm: fsx Not tainted 5.9.0-rc7+ #821
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77
   dump_stack+0x57/0x70 lib/dump_stack.c:118
   ubsan_epilogue+0x5/0x40 lib/ubsan.c:148
   __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe9 lib/ubsan.c:395
   __roundup_pow_of_two ./include/linux/log2.h:57
   get_init_ra_size mm/readahead.c:318
   ondemand_readahead.cold+0x16/0x2c mm/readahead.c:530
   generic_file_buffered_read+0x3ac/0x840 mm/filemap.c:2199
   call_read_iter ./include/linux/fs.h:1876
   new_sync_read+0x102/0x180 fs/read_write.c:415
   vfs_read+0x11c/0x1a0 fs/read_write.c:481
   ksys_read+0x4f/0xc0 fs/read_write.c:615
   do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9 arch/x86/entry/entry_64.S:118
  RIP: 0033:0x7fe87fee992e
  RSP: 002b:00007ffe01605278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  RAX: ffffffffffffffda RBX: 000000000004f000 RCX: 00007fe87fee992e
  RDX: 0000000000004000 RSI: 0000000001677000 RDI: 0000000000000003
  RBP: 000000000004f000 R08: 0000000000004000 R09: 000000000004f000
  R10: 0000000000053000 R11: 0000000000000246 R12: 0000000000004000
  R13: 0000000000000000 R14: 000000000007a120 R15: 0000000000000000
  ================================================================================
  BTRFS info (device nullb0): has skinny extents
  BTRFS info (device nullb0): ZONED mode enabled, zone size 268435456 B
  BTRFS info (device nullb0): enabling ssd optimizations

Fixes: f85781fb50 ("btrfs: switch to iomap for direct IO")
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-27 15:11:37 +01:00
Josef Bacik
49d11bead7 btrfs: add a helper to read the tree_root commit root for backref lookup
I got the following lockdep splat with tree locks converted to rwsem
patches on btrfs/104:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0+ #102 Not tainted
  ------------------------------------------------------
  btrfs-cleaner/903 is trying to acquire lock:
  ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170

  but task is already holding lock:
  ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
	 down_read+0x40/0x130
	 caching_thread+0x53/0x5a0
	 btrfs_work_helper+0xfa/0x520
	 process_one_work+0x238/0x540
	 worker_thread+0x55/0x3c0
	 kthread+0x13a/0x150
	 ret_from_fork+0x1f/0x30

  -> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7b0
	 btrfs_cache_block_group+0x1e0/0x510
	 find_free_extent+0xb6e/0x12f0
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb1/0x330
	 alloc_tree_block_no_bg_flush+0x4f/0x60
	 __btrfs_cow_block+0x11d/0x580
	 btrfs_cow_block+0x10c/0x220
	 commit_cowonly_roots+0x47/0x2e0
	 btrfs_commit_transaction+0x595/0xbd0
	 sync_filesystem+0x74/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20
	 deactivate_locked_super+0x36/0xa0
	 cleanup_mnt+0x12d/0x190
	 task_work_run+0x5c/0xa0
	 exit_to_user_mode_prepare+0x1df/0x200
	 syscall_exit_to_user_mode+0x54/0x280
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&space_info->groups_sem){++++}-{3:3}:
	 down_read+0x40/0x130
	 find_free_extent+0x2ed/0x12f0
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb1/0x330
	 alloc_tree_block_no_bg_flush+0x4f/0x60
	 __btrfs_cow_block+0x11d/0x580
	 btrfs_cow_block+0x10c/0x220
	 commit_cowonly_roots+0x47/0x2e0
	 btrfs_commit_transaction+0x595/0xbd0
	 sync_filesystem+0x74/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20
	 deactivate_locked_super+0x36/0xa0
	 cleanup_mnt+0x12d/0x190
	 task_work_run+0x5c/0xa0
	 exit_to_user_mode_prepare+0x1df/0x200
	 syscall_exit_to_user_mode+0x54/0x280
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (btrfs-root-00){++++}-{3:3}:
	 __lock_acquire+0x1167/0x2150
	 lock_acquire+0xb9/0x3d0
	 down_read_nested+0x43/0x130
	 __btrfs_tree_read_lock+0x32/0x170
	 __btrfs_read_lock_root_node+0x3a/0x50
	 btrfs_search_slot+0x614/0x9d0
	 btrfs_find_root+0x35/0x1b0
	 btrfs_read_tree_root+0x61/0x120
	 btrfs_get_root_ref+0x14b/0x600
	 find_parent_nodes+0x3e6/0x1b30
	 btrfs_find_all_roots_safe+0xb4/0x130
	 btrfs_find_all_roots+0x60/0x80
	 btrfs_qgroup_trace_extent_post+0x27/0x40
	 btrfs_add_delayed_data_ref+0x3fd/0x460
	 btrfs_free_extent+0x42/0x100
	 __btrfs_mod_ref+0x1d7/0x2f0
	 walk_up_proc+0x11c/0x400
	 walk_up_tree+0xf0/0x180
	 btrfs_drop_snapshot+0x1c7/0x780
	 btrfs_clean_one_deleted_snapshot+0xfb/0x110
	 cleaner_kthread+0xd4/0x140
	 kthread+0x13a/0x150
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&fs_info->commit_root_sem);
				 lock(&caching_ctl->mutex);
				 lock(&fs_info->commit_root_sem);
    lock(btrfs-root-00);

   *** DEADLOCK ***

  3 locks held by btrfs-cleaner/903:
   #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
   #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
   #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

  stack backtrace:
  CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x8b/0xb0
   check_noncircular+0xcf/0xf0
   __lock_acquire+0x1167/0x2150
   ? __bfs+0x42/0x210
   lock_acquire+0xb9/0x3d0
   ? __btrfs_tree_read_lock+0x32/0x170
   down_read_nested+0x43/0x130
   ? __btrfs_tree_read_lock+0x32/0x170
   __btrfs_tree_read_lock+0x32/0x170
   __btrfs_read_lock_root_node+0x3a/0x50
   btrfs_search_slot+0x614/0x9d0
   ? find_held_lock+0x2b/0x80
   btrfs_find_root+0x35/0x1b0
   ? do_raw_spin_unlock+0x4b/0xa0
   btrfs_read_tree_root+0x61/0x120
   btrfs_get_root_ref+0x14b/0x600
   find_parent_nodes+0x3e6/0x1b30
   btrfs_find_all_roots_safe+0xb4/0x130
   btrfs_find_all_roots+0x60/0x80
   btrfs_qgroup_trace_extent_post+0x27/0x40
   btrfs_add_delayed_data_ref+0x3fd/0x460
   btrfs_free_extent+0x42/0x100
   __btrfs_mod_ref+0x1d7/0x2f0
   walk_up_proc+0x11c/0x400
   walk_up_tree+0xf0/0x180
   btrfs_drop_snapshot+0x1c7/0x780
   ? btrfs_clean_one_deleted_snapshot+0x73/0x110
   btrfs_clean_one_deleted_snapshot+0xfb/0x110
   cleaner_kthread+0xd4/0x140
   ? btrfs_alloc_root+0x50/0x50
   kthread+0x13a/0x150
   ? kthread_create_worker_on_cpu+0x40/0x40
   ret_from_fork+0x1f/0x30
  BTRFS info (device sdb): disk space caching is enabled
  BTRFS info (device sdb): has skinny extents

This happens because qgroups does a backref lookup when we create a
delayed ref.  From here it may have to look up a root from an indirect
ref, which does a normal lookup on the tree_root, which takes the read
lock on the tree_root nodes.

To fix this we need to add a variant for looking up roots that searches
the commit root of the tree_root.  Then when we do the backref search
using the commit root we are sure to not take any locks on the tree_root
nodes.  This gets rid of the lockdep splat when running btrfs/104.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:04:57 +01:00
Josef Bacik
5223cc60b4 btrfs: drop the path before adding qgroup items when enabling qgroups
When enabling qgroups we walk the tree_root and then add a qgroup item
for every root that we have.  This creates a lock dependency on the
tree_root and qgroup_root, which results in the following lockdep splat
(with tree locks using rwsem), eg. in tests btrfs/017 or btrfs/022:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-default+ #1299 Not tainted
  ------------------------------------------------------
  btrfs/24552 is trying to acquire lock:
  ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

  but task is already holding lock:
  ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (btrfs-root-00){++++}-{3:3}:
	 __lock_acquire+0x3fb/0x730
	 lock_acquire.part.0+0x6a/0x130
	 down_read_nested+0x46/0x130
	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
	 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
	 btrfs_search_slot+0xc3/0x9f0 [btrfs]
	 btrfs_insert_item+0x6e/0x140 [btrfs]
	 btrfs_create_tree+0x1cb/0x240 [btrfs]
	 btrfs_quota_enable+0xcd/0x790 [btrfs]
	 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
	 __x64_sys_ioctl+0x83/0xa0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (btrfs-quota-00){++++}-{3:3}:
	 check_prev_add+0x91/0xc30
	 validate_chain+0x491/0x750
	 __lock_acquire+0x3fb/0x730
	 lock_acquire.part.0+0x6a/0x130
	 down_read_nested+0x46/0x130
	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
	 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
	 btrfs_search_slot+0xc3/0x9f0 [btrfs]
	 btrfs_insert_empty_items+0x58/0xa0 [btrfs]
	 add_qgroup_item.part.0+0x72/0x210 [btrfs]
	 btrfs_quota_enable+0x3bb/0x790 [btrfs]
	 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
	 __x64_sys_ioctl+0x83/0xa0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(btrfs-root-00);
				 lock(btrfs-quota-00);
				 lock(btrfs-root-00);
    lock(btrfs-quota-00);

   *** DEADLOCK ***

  5 locks held by btrfs/24552:
   #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
   #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
   #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
   #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
   #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

  stack backtrace:
  CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   dump_stack+0x77/0x97
   check_noncircular+0xf3/0x110
   check_prev_add+0x91/0xc30
   validate_chain+0x491/0x750
   __lock_acquire+0x3fb/0x730
   lock_acquire.part.0+0x6a/0x130
   ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   ? lock_acquire+0xc4/0x140
   ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   down_read_nested+0x46/0x130
   ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   ? btrfs_root_node+0xd9/0x200 [btrfs]
   __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
   btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
   btrfs_search_slot+0xc3/0x9f0 [btrfs]
   btrfs_insert_empty_items+0x58/0xa0 [btrfs]
   add_qgroup_item.part.0+0x72/0x210 [btrfs]
   btrfs_quota_enable+0x3bb/0x790 [btrfs]
   btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
   __x64_sys_ioctl+0x83/0xa0
   do_syscall_64+0x2d/0x70
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by dropping the path whenever we find a root item, add the
qgroup item, and then re-lookup the root item we found and continue
processing roots.

Reported-by: David Sterba <dsterba@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:04:57 +01:00
Filipe Manana
66d204a16c btrfs: fix readahead hang and use-after-free after removing a device
Very sporadically I had test case btrfs/069 from fstests hanging (for
years, it is not a recent regression), with the following traces in
dmesg/syslog:

  [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started
  [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
  [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished
  [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds.
  [162513.514318]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.514747] task:btrfs-transacti state:D stack:    0 pid:1356167 ppid:     2 flags:0x00004000
  [162513.514751] Call Trace:
  [162513.514761]  __schedule+0x5ce/0xd00
  [162513.514765]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.514771]  schedule+0x46/0xf0
  [162513.514844]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.514850]  ? finish_wait+0x90/0x90
  [162513.514864]  start_transaction+0x37c/0x5f0 [btrfs]
  [162513.514879]  transaction_kthread+0xa4/0x170 [btrfs]
  [162513.514891]  ? btrfs_cleanup_transaction+0x660/0x660 [btrfs]
  [162513.514894]  kthread+0x153/0x170
  [162513.514897]  ? kthread_stop+0x2c0/0x2c0
  [162513.514902]  ret_from_fork+0x22/0x30
  [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds.
  [162513.515192]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.515680] task:fsstress        state:D stack:    0 pid:1356184 ppid:1356177 flags:0x00004000
  [162513.515682] Call Trace:
  [162513.515688]  __schedule+0x5ce/0xd00
  [162513.515691]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.515697]  schedule+0x46/0xf0
  [162513.515712]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.515716]  ? finish_wait+0x90/0x90
  [162513.515729]  start_transaction+0x37c/0x5f0 [btrfs]
  [162513.515743]  btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
  [162513.515753]  btrfs_sync_fs+0x61/0x1c0 [btrfs]
  [162513.515758]  ? __ia32_sys_fdatasync+0x20/0x20
  [162513.515761]  iterate_supers+0x87/0xf0
  [162513.515765]  ksys_sync+0x60/0xb0
  [162513.515768]  __do_sys_sync+0xa/0x10
  [162513.515771]  do_syscall_64+0x33/0x80
  [162513.515774]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.515781] RIP: 0033:0x7f5238f50bd7
  [162513.515782] Code: Bad RIP value.
  [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
  [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7
  [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a
  [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0
  [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a
  [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340
  [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds.
  [162513.516064]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.516617] task:fsstress        state:D stack:    0 pid:1356185 ppid:1356177 flags:0x00000000
  [162513.516620] Call Trace:
  [162513.516625]  __schedule+0x5ce/0xd00
  [162513.516628]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.516634]  schedule+0x46/0xf0
  [162513.516647]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.516650]  ? finish_wait+0x90/0x90
  [162513.516662]  start_transaction+0x4d7/0x5f0 [btrfs]
  [162513.516679]  btrfs_setxattr_trans+0x3c/0x100 [btrfs]
  [162513.516686]  __vfs_setxattr+0x66/0x80
  [162513.516691]  __vfs_setxattr_noperm+0x70/0x200
  [162513.516697]  vfs_setxattr+0x6b/0x120
  [162513.516703]  setxattr+0x125/0x240
  [162513.516709]  ? lock_acquire+0xb1/0x480
  [162513.516712]  ? mnt_want_write+0x20/0x50
  [162513.516721]  ? rcu_read_lock_any_held+0x8e/0xb0
  [162513.516723]  ? preempt_count_add+0x49/0xa0
  [162513.516725]  ? __sb_start_write+0x19b/0x290
  [162513.516727]  ? preempt_count_add+0x49/0xa0
  [162513.516732]  path_setxattr+0xba/0xd0
  [162513.516739]  __x64_sys_setxattr+0x27/0x30
  [162513.516741]  do_syscall_64+0x33/0x80
  [162513.516743]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.516745] RIP: 0033:0x7f5238f56d5a
  [162513.516746] Code: Bad RIP value.
  [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc
  [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a
  [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470
  [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700
  [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004
  [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0
  [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds.
  [162513.517064]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.517763] task:fsstress        state:D stack:    0 pid:1356196 ppid:1356177 flags:0x00004000
  [162513.517780] Call Trace:
  [162513.517786]  __schedule+0x5ce/0xd00
  [162513.517789]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.517796]  schedule+0x46/0xf0
  [162513.517810]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.517814]  ? finish_wait+0x90/0x90
  [162513.517829]  start_transaction+0x37c/0x5f0 [btrfs]
  [162513.517845]  btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
  [162513.517857]  btrfs_sync_fs+0x61/0x1c0 [btrfs]
  [162513.517862]  ? __ia32_sys_fdatasync+0x20/0x20
  [162513.517865]  iterate_supers+0x87/0xf0
  [162513.517869]  ksys_sync+0x60/0xb0
  [162513.517872]  __do_sys_sync+0xa/0x10
  [162513.517875]  do_syscall_64+0x33/0x80
  [162513.517878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.517881] RIP: 0033:0x7f5238f50bd7
  [162513.517883] Code: Bad RIP value.
  [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
  [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7
  [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053
  [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0
  [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053
  [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340
  [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds.
  [162513.518298]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.519157] task:fsstress        state:D stack:    0 pid:1356197 ppid:1356177 flags:0x00000000
  [162513.519160] Call Trace:
  [162513.519165]  __schedule+0x5ce/0xd00
  [162513.519168]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.519174]  schedule+0x46/0xf0
  [162513.519190]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.519193]  ? finish_wait+0x90/0x90
  [162513.519206]  start_transaction+0x4d7/0x5f0 [btrfs]
  [162513.519222]  btrfs_create+0x57/0x200 [btrfs]
  [162513.519230]  lookup_open+0x522/0x650
  [162513.519246]  path_openat+0x2b8/0xa50
  [162513.519270]  do_filp_open+0x91/0x100
  [162513.519275]  ? find_held_lock+0x32/0x90
  [162513.519280]  ? lock_acquired+0x33b/0x470
  [162513.519285]  ? do_raw_spin_unlock+0x4b/0xc0
  [162513.519287]  ? _raw_spin_unlock+0x29/0x40
  [162513.519295]  do_sys_openat2+0x20d/0x2d0
  [162513.519300]  do_sys_open+0x44/0x80
  [162513.519304]  do_syscall_64+0x33/0x80
  [162513.519307]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.519309] RIP: 0033:0x7f5238f4a903
  [162513.519310] Code: Bad RIP value.
  [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
  [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903
  [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470
  [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002
  [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013
  [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620
  [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds.
  [162513.519727]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.520508] task:btrfs           state:D stack:    0 pid:1356211 ppid:1356178 flags:0x00004002
  [162513.520511] Call Trace:
  [162513.520516]  __schedule+0x5ce/0xd00
  [162513.520519]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.520525]  schedule+0x46/0xf0
  [162513.520544]  btrfs_scrub_pause+0x11f/0x180 [btrfs]
  [162513.520548]  ? finish_wait+0x90/0x90
  [162513.520562]  btrfs_commit_transaction+0x45a/0xc30 [btrfs]
  [162513.520574]  ? start_transaction+0xe0/0x5f0 [btrfs]
  [162513.520596]  btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs]
  [162513.520619]  btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs]
  [162513.520639]  btrfs_ioctl+0x2a25/0x36f0 [btrfs]
  [162513.520643]  ? do_sigaction+0xf3/0x240
  [162513.520645]  ? find_held_lock+0x32/0x90
  [162513.520648]  ? do_sigaction+0xf3/0x240
  [162513.520651]  ? lock_acquired+0x33b/0x470
  [162513.520655]  ? _raw_spin_unlock_irq+0x24/0x50
  [162513.520657]  ? lockdep_hardirqs_on+0x7d/0x100
  [162513.520660]  ? _raw_spin_unlock_irq+0x35/0x50
  [162513.520662]  ? do_sigaction+0xf3/0x240
  [162513.520671]  ? __x64_sys_ioctl+0x83/0xb0
  [162513.520672]  __x64_sys_ioctl+0x83/0xb0
  [162513.520677]  do_syscall_64+0x33/0x80
  [162513.520679]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.520681] RIP: 0033:0x7fc3cd307d87
  [162513.520682] Code: Bad RIP value.
  [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87
  [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003
  [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003
  [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001
  [162513.520703]
		  Showing all locks held in the system:
  [162513.520712] 1 lock held by khungtaskd/54:
  [162513.520713]  #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197
  [162513.520728] 1 lock held by in:imklog/596:
  [162513.520729]  #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60
  [162513.520782] 1 lock held by btrfs-transacti/1356167:
  [162513.520784]  #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs]
  [162513.520798] 1 lock held by btrfs/1356190:
  [162513.520800]  #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60
  [162513.520805] 1 lock held by fsstress/1356184:
  [162513.520806]  #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0
  [162513.520811] 3 locks held by fsstress/1356185:
  [162513.520812]  #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50
  [162513.520815]  #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120
  [162513.520820]  #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
  [162513.520833] 1 lock held by fsstress/1356196:
  [162513.520834]  #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0
  [162513.520838] 3 locks held by fsstress/1356197:
  [162513.520839]  #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50
  [162513.520843]  #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50
  [162513.520846]  #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
  [162513.520858] 2 locks held by btrfs/1356211:
  [162513.520859]  #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs]
  [162513.520877]  #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]

This was weird because the stack traces show that a transaction commit,
triggered by a device replace operation, is blocking trying to pause any
running scrubs but there are no stack traces of blocked tasks doing a
scrub.

After poking around with drgn, I noticed there was a scrub task that was
constantly running and blocking for shorts periods of time:

  >>> t = find_task(prog, 1356190)
  >>> prog.stack_trace(t)
  #0  __schedule+0x5ce/0xcfc
  #1  schedule+0x46/0xe4
  #2  schedule_timeout+0x1df/0x475
  #3  btrfs_reada_wait+0xda/0x132
  #4  scrub_stripe+0x2a8/0x112f
  #5  scrub_chunk+0xcd/0x134
  #6  scrub_enumerate_chunks+0x29e/0x5ee
  #7  btrfs_scrub_dev+0x2d5/0x91b
  #8  btrfs_ioctl+0x7f5/0x36e7
  #9  __x64_sys_ioctl+0x83/0xb0
  #10 do_syscall_64+0x33/0x77
  #11 entry_SYSCALL_64+0x7c/0x156

Which corresponds to:

int btrfs_reada_wait(void *handle)
{
    struct reada_control *rc = handle;
    struct btrfs_fs_info *fs_info = rc->fs_info;

    while (atomic_read(&rc->elems)) {
        if (!atomic_read(&fs_info->reada_works_cnt))
            reada_start_machine(fs_info);
        wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0,
                          (HZ + 9) / 10);
    }
(...)

So the counter "rc->elems" was set to 1 and never decreased to 0, causing
the scrub task to loop forever in that function. Then I used the following
script for drgn to check the readahead requests:

  $ cat dump_reada.py
  import sys
  import drgn
  from drgn import NULL, Object, cast, container_of, execscript, \
      reinterpret, sizeof
  from drgn.helpers.linux import *

  mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1"

  mnt = None
  for mnt in for_each_mount(prog, dst = mnt_path):
      pass

  if mnt is None:
      sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
      sys.exit(1)

  fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info)

  def dump_re(re):
      nzones = re.nzones.value_()
      print(f're at {hex(re.value_())}')
      print(f'\t logical {re.logical.value_()}')
      print(f'\t refcnt {re.refcnt.value_()}')
      print(f'\t nzones {nzones}')
      for i in range(nzones):
          dev = re.zones[i].device
          name = dev.name.str.string_()
          print(f'\t\t dev id {dev.devid.value_()} name {name}')
      print()

  for _, e in radix_tree_for_each(fs_info.reada_tree):
      re = cast('struct reada_extent *', e)
      dump_re(re)

  $ drgn dump_reada.py
  re at 0xffff8f3da9d25ad8
          logical 38928384
          refcnt 1
          nzones 1
                 dev id 0 name b'/dev/sdd'
  $

So there was one readahead extent with a single zone corresponding to the
source device of that last device replace operation logged in dmesg/syslog.
Also the ID of that zone's device was 0 which is a special value set in
the source device of a device replace operation when the operation finishes
(constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()),
confirming again that device /dev/sdd was the source of a device replace
operation.

Normally there should be as many zones in the readahead extent as there are
devices, and I wasn't expecting the extent to be in a block group with a
'single' profile, so I went and confirmed with the following drgn script
that there weren't any single profile block groups:

  $ cat dump_block_groups.py
  import sys
  import drgn
  from drgn import NULL, Object, cast, container_of, execscript, \
      reinterpret, sizeof
  from drgn.helpers.linux import *

  mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1"

  mnt = None
  for mnt in for_each_mount(prog, dst = mnt_path):
      pass

  if mnt is None:
      sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
      sys.exit(1)

  fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info)

  BTRFS_BLOCK_GROUP_DATA = (1 << 0)
  BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1)
  BTRFS_BLOCK_GROUP_METADATA = (1 << 2)
  BTRFS_BLOCK_GROUP_RAID0 = (1 << 3)
  BTRFS_BLOCK_GROUP_RAID1 = (1 << 4)
  BTRFS_BLOCK_GROUP_DUP = (1 << 5)
  BTRFS_BLOCK_GROUP_RAID10 = (1 << 6)
  BTRFS_BLOCK_GROUP_RAID5 = (1 << 7)
  BTRFS_BLOCK_GROUP_RAID6 = (1 << 8)
  BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9)
  BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10)

  def bg_flags_string(bg):
      flags = bg.flags.value_()
      ret = ''
      if flags & BTRFS_BLOCK_GROUP_DATA:
          ret = 'data'
      if flags & BTRFS_BLOCK_GROUP_METADATA:
          if len(ret) > 0:
              ret += '|'
          ret += 'meta'
      if flags & BTRFS_BLOCK_GROUP_SYSTEM:
          if len(ret) > 0:
              ret += '|'
          ret += 'system'
      if flags & BTRFS_BLOCK_GROUP_RAID0:
          ret += ' raid0'
      elif flags & BTRFS_BLOCK_GROUP_RAID1:
          ret += ' raid1'
      elif flags & BTRFS_BLOCK_GROUP_DUP:
          ret += ' dup'
      elif flags & BTRFS_BLOCK_GROUP_RAID10:
          ret += ' raid10'
      elif flags & BTRFS_BLOCK_GROUP_RAID5:
          ret += ' raid5'
      elif flags & BTRFS_BLOCK_GROUP_RAID6:
          ret += ' raid6'
      elif flags & BTRFS_BLOCK_GROUP_RAID1C3:
          ret += ' raid1c3'
      elif flags & BTRFS_BLOCK_GROUP_RAID1C4:
          ret += ' raid1c4'
      else:
          ret += ' single'

      return ret

  def dump_bg(bg):
      print()
      print(f'block group at {hex(bg.value_())}')
      print(f'\t start {bg.start.value_()} length {bg.length.value_()}')
      print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}')

  bg_root = fs_info.block_group_cache_tree.address_of_()
  for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'):
      dump_bg(bg)

  $ drgn dump_block_groups.py

  block group at 0xffff8f3d673b0400
         start 22020096 length 16777216
         flags 258 - system raid6

  block group at 0xffff8f3d53ddb400
         start 38797312 length 536870912
         flags 260 - meta raid6

  block group at 0xffff8f3d5f4d9c00
         start 575668224 length 2147483648
         flags 257 - data raid6

  block group at 0xffff8f3d08189000
         start 2723151872 length 67108864
         flags 258 - system raid6

  block group at 0xffff8f3db70ff000
         start 2790260736 length 1073741824
         flags 260 - meta raid6

  block group at 0xffff8f3d5f4dd800
         start 3864002560 length 67108864
         flags 258 - system raid6

  block group at 0xffff8f3d67037000
         start 3931111424 length 2147483648
         flags 257 - data raid6
  $

So there were only 2 reasons left for having a readahead extent with a
single zone: reada_find_zone(), called when creating a readahead extent,
returned NULL either because we failed to find the corresponding block
group or because a memory allocation failed. With some additional and
custom tracing I figured out that on every further ocurrence of the
problem the block group had just been deleted when we were looping to
create the zones for the readahead extent (at reada_find_extent()), so we
ended up with only one zone in the readahead extent, corresponding to a
device that ends up getting replaced.

So after figuring that out it became obvious why the hang happens:

1) Task A starts a scrub on any device of the filesystem, except for
   device /dev/sdd;

2) Task B starts a device replace with /dev/sdd as the source device;

3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently
   starting to scrub a stripe from block group X. This call to
   btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add()
   calls reada_add_block(), it passes the logical address of the extent
   tree's root node as its 'logical' argument - a value of 38928384;

4) Task A then enters reada_find_extent(), called from reada_add_block().
   It finds there isn't any existing readahead extent for the logical
   address 38928384, so it proceeds to the path of creating a new one.

   It calls btrfs_map_block() to find out which stripes exist for the block
   group X. On the first iteration of the for loop that iterates over the
   stripes, it finds the stripe for device /dev/sdd, so it creates one
   zone for that device and adds it to the readahead extent. Before getting
   into the second iteration of the loop, the cleanup kthread deletes block
   group X because it was empty. So in the iterations for the remaining
   stripes it does not add more zones to the readahead extent, because the
   calls to reada_find_zone() returned NULL because they couldn't find
   block group X anymore.

   As a result the new readahead extent has a single zone, corresponding to
   the device /dev/sdd;

4) Before task A returns to btrfs_reada_add() and queues the readahead job
   for the readahead work queue, task B finishes the device replace and at
   btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new
   device /dev/sdg;

5) Task A returns to reada_add_block(), which increments the counter
   "->elems" of the reada_control structure allocated at btrfs_reada_add().

   Then it returns back to btrfs_reada_add() and calls
   reada_start_machine(). This queues a job in the readahead work queue to
   run the function reada_start_machine_worker(), which calls
   __reada_start_machine().

   At __reada_start_machine() we take the device list mutex and for each
   device found in the current device list, we call
   reada_start_machine_dev() to start the readahead work. However at this
   point the device /dev/sdd was already freed and is not in the device
   list anymore.

   This means the corresponding readahead for the extent at 38928384 is
   never started, and therefore the "->elems" counter of the reada_control
   structure allocated at btrfs_reada_add() never goes down to 0, causing
   the call to btrfs_reada_wait(), done by the scrub task, to wait forever.

Note that the readahead request can be made either after the device replace
started or before it started, however in pratice it is very unlikely that a
device replace is able to start after a readahead request is made and is
able to complete before the readahead request completes - maybe only on a
very small and nearly empty filesystem.

This hang however is not the only problem we can have with readahead and
device removals. When the readahead extent has other zones other than the
one corresponding to the device that is being removed (either by a device
replace or a device remove operation), we risk having a use-after-free on
the device when dropping the last reference of the readahead extent.

For example if we create a readahead extent with two zones, one for the
device /dev/sdd and one for the device /dev/sde:

1) Before the readahead worker starts, the device /dev/sdd is removed,
   and the corresponding btrfs_device structure is freed. However the
   readahead extent still has the zone pointing to the device structure;

2) When the readahead worker starts, it only finds device /dev/sde in the
   current device list of the filesystem;

3) It starts the readahead work, at reada_start_machine_dev(), using the
   device /dev/sde;

4) Then when it finishes reading the extent from device /dev/sde, it calls
   __readahead_hook() which ends up dropping the last reference on the
   readahead extent through the last call to reada_extent_put();

5) At reada_extent_put() it iterates over each zone of the readahead extent
   and attempts to delete an element from the device's 'reada_extents'
   radix tree, resulting in a use-after-free, as the device pointer of the
   zone for /dev/sdd is now stale. We can also access the device after
   dropping the last reference of a zone, through reada_zone_release(),
   also called by reada_extent_put().

And a device remove suffers the same problem, however since it shrinks the
device size down to zero before removing the device, it is very unlikely to
still have readahead requests not completed by the time we free the device,
the only possibility is if the device has a very little space allocated.

While the hang problem is exclusive to scrub, since it is currently the
only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free
problem affects any path that triggers readhead, which includes
btree_readahead_hook() and __readahead_hook() (a readahead worker can
trigger readahed for the children of a node) for example - any path that
ends up calling reada_add_block() can trigger the use-after-free after a
device is removed.

So fix this by waiting for any readahead requests for a device to complete
before removing a device, ensuring that while waiting for existing ones no
new ones can be made.

This problem has been around for a very long time - the readahead code was
added in 2011, device remove exists since 2008 and device replace was
introduced in 2013, hard to pick a specific commit for a git Fixes tag.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:03:59 +01:00
Filipe Manana
83bc1560e0 btrfs: fix use-after-free on readahead extent after failure to create it
If we fail to find suitable zones for a new readahead extent, we end up
leaving a stale pointer in the global readahead extents radix tree
(fs_info->reada_tree), which can trigger the following trace later on:

  [13367.696354] BUG: kernel NULL pointer dereference, address: 00000000000000b0
  [13367.696802] #PF: supervisor read access in kernel mode
  [13367.697249] #PF: error_code(0x0000) - not-present page
  [13367.697721] PGD 0 P4D 0
  [13367.698171] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [13367.698632] CPU: 6 PID: 851214 Comm: btrfs Tainted: G        W         5.9.0-rc6-btrfs-next-69 #1
  [13367.699100] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [13367.700069] RIP: 0010:__lock_acquire+0x20a/0x3970
  [13367.700562] Code: ff 1f 0f b7 c0 48 0f (...)
  [13367.701609] RSP: 0018:ffffb14448f57790 EFLAGS: 00010046
  [13367.702140] RAX: 0000000000000000 RBX: 29b935140c15e8cf RCX: 0000000000000000
  [13367.702698] RDX: 0000000000000002 RSI: ffffffffb3d66bd0 RDI: 0000000000000046
  [13367.703240] RBP: ffff8a52ba8ac040 R08: 00000c2866ad9288 R09: 0000000000000001
  [13367.703783] R10: 0000000000000001 R11: 00000000b66d9b53 R12: ffff8a52ba8ac9b0
  [13367.704330] R13: 0000000000000000 R14: ffff8a532b6333e8 R15: 0000000000000000
  [13367.704880] FS:  00007fe1df6b5700(0000) GS:ffff8a5376600000(0000) knlGS:0000000000000000
  [13367.705438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [13367.705995] CR2: 00000000000000b0 CR3: 000000022cca8004 CR4: 00000000003706e0
  [13367.706565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [13367.707127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [13367.707686] Call Trace:
  [13367.708246]  ? ___slab_alloc+0x395/0x740
  [13367.708820]  ? reada_add_block+0xae/0xee0 [btrfs]
  [13367.709383]  lock_acquire+0xb1/0x480
  [13367.709955]  ? reada_add_block+0xe0/0xee0 [btrfs]
  [13367.710537]  ? reada_add_block+0xae/0xee0 [btrfs]
  [13367.711097]  ? rcu_read_lock_sched_held+0x5d/0x90
  [13367.711659]  ? kmem_cache_alloc_trace+0x8d2/0x990
  [13367.712221]  ? lock_acquired+0x33b/0x470
  [13367.712784]  _raw_spin_lock+0x34/0x80
  [13367.713356]  ? reada_add_block+0xe0/0xee0 [btrfs]
  [13367.713966]  reada_add_block+0xe0/0xee0 [btrfs]
  [13367.714529]  ? btrfs_root_node+0x15/0x1f0 [btrfs]
  [13367.715077]  btrfs_reada_add+0x117/0x170 [btrfs]
  [13367.715620]  scrub_stripe+0x21e/0x10d0 [btrfs]
  [13367.716141]  ? kvm_sched_clock_read+0x5/0x10
  [13367.716657]  ? __lock_acquire+0x41e/0x3970
  [13367.717184]  ? scrub_chunk+0x60/0x140 [btrfs]
  [13367.717697]  ? find_held_lock+0x32/0x90
  [13367.718254]  ? scrub_chunk+0x60/0x140 [btrfs]
  [13367.718773]  ? lock_acquired+0x33b/0x470
  [13367.719278]  ? scrub_chunk+0xcd/0x140 [btrfs]
  [13367.719786]  scrub_chunk+0xcd/0x140 [btrfs]
  [13367.720291]  scrub_enumerate_chunks+0x270/0x5c0 [btrfs]
  [13367.720787]  ? finish_wait+0x90/0x90
  [13367.721281]  btrfs_scrub_dev+0x1ee/0x620 [btrfs]
  [13367.721762]  ? rcu_read_lock_any_held+0x8e/0xb0
  [13367.722235]  ? preempt_count_add+0x49/0xa0
  [13367.722710]  ? __sb_start_write+0x19b/0x290
  [13367.723192]  btrfs_ioctl+0x7f5/0x36f0 [btrfs]
  [13367.723660]  ? __fget_files+0x101/0x1d0
  [13367.724118]  ? find_held_lock+0x32/0x90
  [13367.724559]  ? __fget_files+0x101/0x1d0
  [13367.724982]  ? __x64_sys_ioctl+0x83/0xb0
  [13367.725399]  __x64_sys_ioctl+0x83/0xb0
  [13367.725802]  do_syscall_64+0x33/0x80
  [13367.726188]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [13367.726574] RIP: 0033:0x7fe1df7add87
  [13367.726948] Code: 00 00 00 48 8b 05 09 91 (...)
  [13367.727763] RSP: 002b:00007fe1df6b4d48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [13367.728179] RAX: ffffffffffffffda RBX: 000055ce1fb596a0 RCX: 00007fe1df7add87
  [13367.728604] RDX: 000055ce1fb596a0 RSI: 00000000c400941b RDI: 0000000000000003
  [13367.729021] RBP: 0000000000000000 R08: 00007fe1df6b5700 R09: 0000000000000000
  [13367.729431] R10: 00007fe1df6b5700 R11: 0000000000000246 R12: 00007ffd922b07de
  [13367.729842] R13: 00007ffd922b07df R14: 00007fe1df6b4e40 R15: 0000000000802000
  [13367.730275] Modules linked in: btrfs blake2b_generic xor (...)
  [13367.732638] CR2: 00000000000000b0
  [13367.733166] ---[ end trace d298b6805556acd9 ]---

What happens is the following:

1) At reada_find_extent() we don't find any existing readahead extent for
   the metadata extent starting at logical address X;

2) So we proceed to create a new one. We then call btrfs_map_block() to get
   information about which stripes contain extent X;

3) After that we iterate over the stripes and create only one zone for the
   readahead extent - only one because reada_find_zone() returned NULL for
   all iterations except for one, either because a memory allocation failed
   or it couldn't find the block group of the extent (it may have just been
   deleted);

4) We then add the new readahead extent to the readahead extents radix
   tree at fs_info->reada_tree;

5) Then we iterate over each zone of the new readahead extent, and find
   that the device used for that zone no longer exists, because it was
   removed or it was the source device of a device replace operation.
   Since this left 'have_zone' set to 0, after finishing the loop we jump
   to the 'error' label, call kfree() on the new readahead extent and
   return without removing it from the radix tree at fs_info->reada_tree;

6) Any future call to reada_find_extent() for the logical address X will
   find the stale pointer in the readahead extents radix tree, increment
   its reference counter, which can trigger the use-after-free right
   away or return it to the caller reada_add_block() that results in the
   use-after-free of the example trace above.

So fix this by making sure we delete the readahead extent from the radix
tree if we fail to setup zones for it (when 'have_zone = 0').

Fixes: 3194502118 ("btrfs: reada: bypass adding extent when all zone failed")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:03:59 +01:00
Daniel Xu
85d07fbe09 btrfs: tree-checker: validate number of chunk stripes and parity
If there's no parity and num_stripes < ncopies, a crafted image can
trigger a division by zero in calc_stripe_length().

The image was generated through fuzzing.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209587
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:03:48 +01:00
Pujin Shi
cad69d1396 btrfs: tree-checker: fix incorrect printk format
This patch addresses a compile warning:

fs/btrfs/extent-tree.c: In function '__btrfs_free_extent':
fs/btrfs/extent-tree.c:3187:4: warning: format '%lu' expects argument of type 'long unsigned int', but argument 8 has type 'unsigned int' [-Wformat=]

Fixes: 1c2a07f598 ("btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Pujin Shi <shipujin.t@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:02:30 +01:00
Josef Bacik
7837fa8870 btrfs: drop the path before adding block group sysfs files
Dave reported a problem with my rwsem conversion patch where we got the
following lockdep splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-default+ #1297 Not tainted
  ------------------------------------------------------
  kswapd0/76 is trying to acquire lock:
  ffff9d5d25df2530 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]

  but task is already holding lock:
  ffffffffa40cbba0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #4 (fs_reclaim){+.+.}-{0:0}:
	 __lock_acquire+0x582/0xac0
	 lock_acquire+0xca/0x430
	 fs_reclaim_acquire.part.0+0x25/0x30
	 kmem_cache_alloc+0x30/0x9c0
	 alloc_inode+0x81/0x90
	 iget_locked+0xcd/0x1a0
	 kernfs_get_inode+0x1b/0x130
	 kernfs_get_tree+0x136/0x210
	 sysfs_get_tree+0x1a/0x50
	 vfs_get_tree+0x1d/0xb0
	 path_mount+0x70f/0xa80
	 do_mount+0x75/0x90
	 __x64_sys_mount+0x8e/0xd0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #3 (kernfs_mutex){+.+.}-{3:3}:
	 __lock_acquire+0x582/0xac0
	 lock_acquire+0xca/0x430
	 __mutex_lock+0xa0/0xaf0
	 kernfs_add_one+0x23/0x150
	 kernfs_create_dir_ns+0x58/0x80
	 sysfs_create_dir_ns+0x70/0xd0
	 kobject_add_internal+0xbb/0x2d0
	 kobject_add+0x7a/0xd0
	 btrfs_sysfs_add_block_group_type+0x141/0x1d0 [btrfs]
	 btrfs_read_block_groups+0x1f1/0x8c0 [btrfs]
	 open_ctree+0x981/0x1108 [btrfs]
	 btrfs_mount_root.cold+0xe/0xb0 [btrfs]
	 legacy_get_tree+0x2d/0x60
	 vfs_get_tree+0x1d/0xb0
	 fc_mount+0xe/0x40
	 vfs_kern_mount.part.0+0x71/0x90
	 btrfs_mount+0x13b/0x3e0 [btrfs]
	 legacy_get_tree+0x2d/0x60
	 vfs_get_tree+0x1d/0xb0
	 path_mount+0x70f/0xa80
	 do_mount+0x75/0x90
	 __x64_sys_mount+0x8e/0xd0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #2 (btrfs-extent-00){++++}-{3:3}:
	 __lock_acquire+0x582/0xac0
	 lock_acquire+0xca/0x430
	 down_read_nested+0x45/0x220
	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
	 btrfs_search_slot+0x6d4/0xfd0 [btrfs]
	 check_committed_ref+0x69/0x200 [btrfs]
	 btrfs_cross_ref_exist+0x65/0xb0 [btrfs]
	 run_delalloc_nocow+0x446/0x9b0 [btrfs]
	 btrfs_run_delalloc_range+0x61/0x6a0 [btrfs]
	 writepage_delalloc+0xae/0x160 [btrfs]
	 __extent_writepage+0x262/0x420 [btrfs]
	 extent_write_cache_pages+0x2b6/0x510 [btrfs]
	 extent_writepages+0x43/0x90 [btrfs]
	 do_writepages+0x40/0xe0
	 __writeback_single_inode+0x62/0x610
	 writeback_sb_inodes+0x20f/0x500
	 wb_writeback+0xef/0x4a0
	 wb_do_writeback+0x49/0x2e0
	 wb_workfn+0x81/0x340
	 process_one_work+0x233/0x5d0
	 worker_thread+0x50/0x3b0
	 kthread+0x137/0x150
	 ret_from_fork+0x1f/0x30

  -> #1 (btrfs-fs-00){++++}-{3:3}:
	 __lock_acquire+0x582/0xac0
	 lock_acquire+0xca/0x430
	 down_read_nested+0x45/0x220
	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
	 btrfs_search_slot+0x6d4/0xfd0 [btrfs]
	 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
	 __btrfs_update_delayed_inode+0x93/0x2c0 [btrfs]
	 __btrfs_commit_inode_delayed_items+0x7de/0x850 [btrfs]
	 __btrfs_run_delayed_items+0x8e/0x140 [btrfs]
	 btrfs_commit_transaction+0x367/0xbc0 [btrfs]
	 btrfs_mksubvol+0x2db/0x470 [btrfs]
	 btrfs_mksnapshot+0x7b/0xb0 [btrfs]
	 __btrfs_ioctl_snap_create+0x16f/0x1a0 [btrfs]
	 btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
	 btrfs_ioctl+0xd0b/0x2690 [btrfs]
	 __x64_sys_ioctl+0x6f/0xa0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
	 check_prev_add+0x91/0xc60
	 validate_chain+0xa6e/0x2a20
	 __lock_acquire+0x582/0xac0
	 lock_acquire+0xca/0x430
	 __mutex_lock+0xa0/0xaf0
	 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
	 btrfs_evict_inode+0x3cc/0x560 [btrfs]
	 evict+0xd6/0x1c0
	 dispose_list+0x48/0x70
	 prune_icache_sb+0x54/0x80
	 super_cache_scan+0x121/0x1a0
	 do_shrink_slab+0x16d/0x3b0
	 shrink_slab+0xb1/0x2e0
	 shrink_node+0x230/0x6a0
	 balance_pgdat+0x325/0x750
	 kswapd+0x206/0x4d0
	 kthread+0x137/0x150
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> kernfs_mutex --> fs_reclaim

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(fs_reclaim);
				 lock(kernfs_mutex);
				 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/76:
   #0: ffffffffa40cbba0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffffa40b8b58 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0
   #2: ffff9d5d322390e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50

  stack backtrace:
  CPU: 2 PID: 76 Comm: kswapd0 Not tainted 5.9.0-default+ #1297
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   dump_stack+0x77/0x97
   check_noncircular+0xff/0x110
   ? save_trace+0x50/0x470
   check_prev_add+0x91/0xc60
   validate_chain+0xa6e/0x2a20
   ? save_trace+0x50/0x470
   __lock_acquire+0x582/0xac0
   lock_acquire+0xca/0x430
   ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   __mutex_lock+0xa0/0xaf0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   ? __lock_acquire+0x582/0xac0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   ? btrfs_evict_inode+0x30b/0x560 [btrfs]
   ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   btrfs_evict_inode+0x3cc/0x560 [btrfs]
   evict+0xd6/0x1c0
   dispose_list+0x48/0x70
   prune_icache_sb+0x54/0x80
   super_cache_scan+0x121/0x1a0
   do_shrink_slab+0x16d/0x3b0
   shrink_slab+0xb1/0x2e0
   shrink_node+0x230/0x6a0
   balance_pgdat+0x325/0x750
   kswapd+0x206/0x4d0
   ? finish_wait+0x90/0x90
   ? balance_pgdat+0x750/0x750
   kthread+0x137/0x150
   ? kthread_mod_delayed_work+0xc0/0xc0
   ret_from_fork+0x1f/0x30

This happens because we are still holding the path open when we start
adding the sysfs files for the block groups, which creates a dependency
on fs_reclaim via the tree lock.  Fix this by dropping the path before
we start doing anything with sysfs.

Reported-by: David Sterba <dsterba@suse.com>
CC: stable@vger.kernel.org # 5.8+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:01:34 +01:00
Greg Kroah-Hartman
c62fe038fe Merge 3ad11d7ac8 ("Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block") into android-mainline
Steps on the way to 5.10-rc1

Resolves conflicts in:
	include/linux/blk-crypto.h

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4012850c2e4b804d9e87e90b8e03a3b9ce21b5e7
2020-10-24 17:29:43 +02:00
Greg Kroah-Hartman
aec7085396 Merge 647412daeb ("Merge tag 'mmc-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc") into android-mainline
Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7ef0b81d31025e256bd85102013ed3ad038a54b1
2020-10-23 11:59:44 +02:00
Filipe Manana
1afc708dca btrfs: fix relocation failure due to race with fallocate
When doing a fallocate() we have a short time window, after reserving an
extent and before starting a transaction, where if relocation for the block
group containing the reserved extent happens, we can end up missing the
extent in the data relocation inode causing relocation to fail later.

This only happens when we don't pass a transaction to the internal
fallocate function __btrfs_prealloc_file_range(), which is for all the
cases where fallocate() is called from user space (the internal use cases
include space cache extent allocation and relocation).

When the race triggers the relocation failure, it produces a trace like
the following:

  [200611.995995] ------------[ cut here ]------------
  [200611.997084] BTRFS: Transaction aborted (error -2)
  [200611.998208] WARNING: CPU: 3 PID: 235845 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x3a0/0x5b0 [btrfs]
  [200611.999042] Modules linked in: dm_thin_pool dm_persistent_data (...)
  [200612.003287] CPU: 3 PID: 235845 Comm: btrfs Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [200612.004442] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [200612.006186] RIP: 0010:__btrfs_cow_block+0x3a0/0x5b0 [btrfs]
  [200612.007110] Code: 1b 00 00 02 72 2a 83 f8 fb 0f 84 b8 01 (...)
  [200612.007341] BTRFS warning (device sdb): Skipping commit of aborted transaction.
  [200612.008959] RSP: 0018:ffffaee38550f918 EFLAGS: 00010286
  [200612.009672] BTRFS: error (device sdb) in cleanup_transaction:1901: errno=-30 Readonly filesystem
  [200612.010428] RAX: 0000000000000000 RBX: ffff9174d96f4000 RCX: 0000000000000000
  [200612.011078] BTRFS info (device sdb): forced readonly
  [200612.011862] RDX: 0000000000000001 RSI: ffffffffa8161978 RDI: 00000000ffffffff
  [200612.013215] RBP: ffff9172569a0f80 R08: 0000000000000000 R09: 0000000000000000
  [200612.014263] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9174b8403b88
  [200612.015203] R13: ffff9174b8400a88 R14: ffff9174c90f1000 R15: ffff9174a5a60e08
  [200612.016182] FS:  00007fa55cf878c0(0000) GS:ffff9174ece00000(0000) knlGS:0000000000000000
  [200612.017174] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [200612.018418] CR2: 00007f8fb8048148 CR3: 0000000428a46003 CR4: 00000000003706e0
  [200612.019510] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [200612.020648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [200612.021520] Call Trace:
  [200612.022434]  btrfs_cow_block+0x10b/0x250 [btrfs]
  [200612.023407]  do_relocation+0x54e/0x7b0 [btrfs]
  [200612.024343]  ? do_raw_spin_unlock+0x4b/0xc0
  [200612.025280]  ? _raw_spin_unlock+0x29/0x40
  [200612.026200]  relocate_tree_blocks+0x3bc/0x6d0 [btrfs]
  [200612.027088]  relocate_block_group+0x2f3/0x600 [btrfs]
  [200612.027961]  btrfs_relocate_block_group+0x15e/0x340 [btrfs]
  [200612.028896]  btrfs_relocate_chunk+0x38/0x110 [btrfs]
  [200612.029772]  btrfs_balance+0xb22/0x1790 [btrfs]
  [200612.030601]  ? btrfs_ioctl_balance+0x253/0x380 [btrfs]
  [200612.031414]  btrfs_ioctl_balance+0x2cf/0x380 [btrfs]
  [200612.032279]  btrfs_ioctl+0x620/0x36f0 [btrfs]
  [200612.033077]  ? _raw_spin_unlock+0x29/0x40
  [200612.033948]  ? handle_mm_fault+0x116d/0x1ca0
  [200612.034749]  ? up_read+0x18/0x240
  [200612.035542]  ? __x64_sys_ioctl+0x83/0xb0
  [200612.036244]  __x64_sys_ioctl+0x83/0xb0
  [200612.037269]  do_syscall_64+0x33/0x80
  [200612.038190]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [200612.038976] RIP: 0033:0x7fa55d07ed87
  [200612.040127] Code: 00 00 00 48 8b 05 09 91 0c 00 64 c7 00 26 (...)
  [200612.041669] RSP: 002b:00007ffd5ebf03e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  [200612.042437] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fa55d07ed87
  [200612.043511] RDX: 00007ffd5ebf0470 RSI: 00000000c4009420 RDI: 0000000000000003
  [200612.044250] RBP: 0000000000000003 R08: 000055d8362642a0 R09: 00007fa55d148be0
  [200612.044963] R10: fffffffffffff52e R11: 0000000000000206 R12: 00007ffd5ebf1614
  [200612.045683] R13: 00007ffd5ebf0470 R14: 0000000000000002 R15: 00007ffd5ebf0470
  [200612.046361] irq event stamp: 0
  [200612.047040] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [200612.047725] hardirqs last disabled at (0): [<ffffffffa6eb5ab3>] copy_process+0x823/0x1bc0
  [200612.048387] softirqs last  enabled at (0): [<ffffffffa6eb5ab3>] copy_process+0x823/0x1bc0
  [200612.049024] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [200612.049722] ---[ end trace 49006c6876e65227 ]---

The race happens like this:

1) Task A starts an fallocate() (plain or zero range) and it calls
   __btrfs_prealloc_file_range() with the 'trans' parameter set to NULL;

2) Task A calls btrfs_reserve_extent() and gets an extent that belongs to
   block group X;

3) Before task A gets into btrfs_replace_file_extents(), through the call
   to insert_prealloc_file_extent(), task B starts relocation of block
   group X;

4) Task B enters btrfs_relocate_block_group() and it sets block group X to
   RO mode;

5) Task B enters relocate_block_group(), it calls prepare_to_relocate()
   whichs joins/starts a transaction and then commits the transaction;

6) Task B then starts scanning the extent tree looking for extents that
   belong to block group X - it does not find yet the extent reserved by
   task A, since that extent was not yet added to the extent tree, as its
   delayed reference was not even yet created at this point;

7) The data relocation inode ends up not having the extent reserved by
   task A associated to it;

8) Task A then starts a transaction through btrfs_replace_file_extents(),
   inserts a file extent item in the subvolume tree pointing to the
   reserved extent and creates a delayed reference for it;

9) Task A finishes and returns success to user space;

10) Later on, while relocation is still in progress, the leaf where task A
    inserted the new file extent item is COWed, so we end up at
    __btrfs_cow_block(), which calls btrfs_reloc_cow_block(), and that in
    turn calls relocation.c:replace_file_extents();

11) At relocation.c:replace_file_extents() we iterate over all the items in
    the leaf and find the file extent item pointing to the extent that was
    allocated by task A, and then call relocation.c:get_new_location(), to
    find the new location for the extent;

12) However relocation.c:get_new_location() fails, returning -ENOENT,
    because it couldn't find a corresponding file extent item associated
    with the data relocation inode. This is because the extent was not seen
    in the extent tree at step 6). The -ENOENT error is propagated to
    __btrfs_cow_block(), which aborts the transaction.

So fix this simply by decrementing the block group's number of reservations
after calling insert_prealloc_file_extent(), as relocation waits for that
counter to go down to zero before calling prepare_to_relocate() and start
looking for extents in the extent tree.

This issue only started to happen recently as of commit 8fccebfa53
("btrfs: fix metadata reservation for fallocate that leads to transaction
aborts"), because now we can reserve an extent before starting/joining a
transaction, and previously we always did it after that, so relocation
ended up waiting for a concurrent fallocate() to finish because before
searching for the extents of the block group, it starts/joins a transaction
and then commits it (at prepare_to_relocate()), which made it wait for the
fallocate task to complete first.

Fixes: 8fccebfa53 ("btrfs: fix metadata reservation for fallocate that leads to transaction aborts")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-16 16:01:56 +02:00
Linus Torvalds
3ad11d7ac8 block-5.10-2020-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
 u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
 jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
 fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
 e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
 6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
 WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
 8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
 c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
 FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
 YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
 ZwfpDh2+Tg==
 =LzyE
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
2020-10-13 12:12:44 -07:00
Nikolay Borisov
1fd4033dd0 btrfs: rename BTRFS_INODE_ORDERED_DATA_CLOSE flag
Commit 8d875f95da ("btrfs: disable strict file flushes for
renames and truncates") eliminated the notion of ordered operations and
instead BTRFS_INODE_ORDERED_DATA_CLOSE only remained as a flag
indicating that a file's content should be synced to disk in case a
file is truncated and any writes happen to it concurrently. In fact
this intendend behavior was broken until it was fixed in
f6dc45c7a9 ("Btrfs: fix filemap_flush call in btrfs_file_release").

All things considered let's give the flag a more descriptive name. Also
slightly reword comments.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:18:00 +02:00
Madhuparna Bhowmik
8d1a7aae89 btrfs: annotate device name rcu_string with __rcu
This patch fixes the following sparse errors in
fs/btrfs/super.c in function btrfs_show_devname()

  fs/btrfs/super.c: error: incompatible types in comparison expression (different address spaces):
  fs/btrfs/super.c:    struct rcu_string [noderef] <asn:4> *
  fs/btrfs/super.c:    struct rcu_string *

The error was because of the following line in function btrfs_show_devname():

  if (first_dev)
	 seq_escape(m, rcu_str_deref(first_dev->name), " \t\n\\");

Annotating the btrfs_device::name member with __rcu fixes the sparse
error.

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik04@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:59 +02:00
Anand Jain
96c2e067ed btrfs: skip devices without magic signature when mounting
Many things can happen after the device is scanned and before the device
is mounted.  One such thing is losing the BTRFS_MAGIC on the device.
If it happens we still won't free that device from the memory and cause
the userland confusion.

For example: As the BTRFS_IOC_DEV_INFO still carries the device path
which does not have the BTRFS_MAGIC, 'btrfs fi show' still lists
device which does not belong to the filesystem anymore:

  $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
  $ wipefs -a /dev/sdb
  # /dev/sdb does not contain magic signature
  $ mount -o degraded /dev/sda /btrfs
  $ btrfs fi show -m
  Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
	  Total devices 2 FS bytes used 128.00KiB
	  devid    1 size 3.00GiB used 571.19MiB path /dev/sda
	  devid    2 size 3.00GiB used 571.19MiB path /dev/sdb

We need to distinguish the missing signature and invalid superblock, so
add a specific error code ENODATA for that. This also fixes failure of
fstest btrfs/198.

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:59 +02:00
Josef Bacik
572c83acdc btrfs: cleanup cow block on error
In fstest btrfs/064 a transaction abort in __btrfs_cow_block could lead
to a system lockup. It gets stuck trying to write back inodes, and the
write back thread was trying to lock an extent buffer:

  $ cat /proc/2143497/stack
  [<0>] __btrfs_tree_lock+0x108/0x250
  [<0>] lock_extent_buffer_for_io+0x35e/0x3a0
  [<0>] btree_write_cache_pages+0x15a/0x3b0
  [<0>] do_writepages+0x28/0xb0
  [<0>] __writeback_single_inode+0x54/0x5c0
  [<0>] writeback_sb_inodes+0x1e8/0x510
  [<0>] wb_writeback+0xcc/0x440
  [<0>] wb_workfn+0xd7/0x650
  [<0>] process_one_work+0x236/0x560
  [<0>] worker_thread+0x55/0x3c0
  [<0>] kthread+0x13a/0x150
  [<0>] ret_from_fork+0x1f/0x30

This is because we got an error while COWing a block, specifically here

        if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
                ret = btrfs_reloc_cow_block(trans, root, buf, cow);
                if (ret) {
                        btrfs_abort_transaction(trans, ret);
                        return ret;
                }
        }

  [16402.241552] BTRFS: Transaction aborted (error -2)
  [16402.242362] WARNING: CPU: 1 PID: 2563188 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x376/0x540
  [16402.249469] CPU: 1 PID: 2563188 Comm: fsstress Not tainted 5.9.0-rc6+ #8
  [16402.249936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  [16402.250525] RIP: 0010:__btrfs_cow_block+0x376/0x540
  [16402.252417] RSP: 0018:ffff9cca40e578b0 EFLAGS: 00010282
  [16402.252787] RAX: 0000000000000025 RBX: 0000000000000002 RCX: ffff9132bbd19388
  [16402.253278] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9132bbd19380
  [16402.254063] RBP: ffff9132b41a49c0 R08: 0000000000000000 R09: 0000000000000000
  [16402.254887] R10: 0000000000000000 R11: ffff91324758b080 R12: ffff91326ef17ce0
  [16402.255694] R13: ffff91325fc0f000 R14: ffff91326ef176b0 R15: ffff9132815e2000
  [16402.256321] FS:  00007f542c6d7b80(0000) GS:ffff9132bbd00000(0000) knlGS:0000000000000000
  [16402.256973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [16402.257374] CR2: 00007f127b83f250 CR3: 0000000133480002 CR4: 0000000000370ee0
  [16402.257867] Call Trace:
  [16402.258072]  btrfs_cow_block+0x109/0x230
  [16402.258356]  btrfs_search_slot+0x530/0x9d0
  [16402.258655]  btrfs_lookup_file_extent+0x37/0x40
  [16402.259155]  __btrfs_drop_extents+0x13c/0xd60
  [16402.259628]  ? btrfs_block_rsv_migrate+0x4f/0xb0
  [16402.259949]  btrfs_replace_file_extents+0x190/0x820
  [16402.260873]  btrfs_clone+0x9ae/0xc00
  [16402.261139]  btrfs_extent_same_range+0x66/0x90
  [16402.261771]  btrfs_remap_file_range+0x353/0x3b1
  [16402.262333]  vfs_dedupe_file_range_one.part.0+0xd5/0x140
  [16402.262821]  vfs_dedupe_file_range+0x189/0x220
  [16402.263150]  do_vfs_ioctl+0x552/0x700
  [16402.263662]  __x64_sys_ioctl+0x62/0xb0
  [16402.264023]  do_syscall_64+0x33/0x40
  [16402.264364]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [16402.264862] RIP: 0033:0x7f542c7d15cb
  [16402.266901] RSP: 002b:00007ffd35944ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [16402.267627] RAX: ffffffffffffffda RBX: 00000000009d1968 RCX: 00007f542c7d15cb
  [16402.268298] RDX: 00000000009d2490 RSI: 00000000c0189436 RDI: 0000000000000003
  [16402.268958] RBP: 00000000009d2520 R08: 0000000000000036 R09: 00000000009d2e64
  [16402.269726] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
  [16402.270659] R13: 000000000001f000 R14: 00000000009d1970 R15: 00000000009d2e80
  [16402.271498] irq event stamp: 0
  [16402.271846] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [16402.272497] hardirqs last disabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
  [16402.273343] softirqs last  enabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
  [16402.273905] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [16402.274338] ---[ end trace 737874a5a41a8236 ]---
  [16402.274669] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
  [16402.276179] BTRFS info (device dm-9): forced readonly
  [16402.277046] BTRFS: error (device dm-9) in btrfs_replace_file_extents:2723: errno=-2 No such entry
  [16402.278744] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
  [16402.279968] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
  [16402.280582] BTRFS info (device dm-9): balance: ended with status: -30

The problem here is that as soon as we allocate the new block it is
locked and marked dirty in the btree inode.  This means that we could
attempt to writeback this block and need to lock the extent buffer.
However we're not unlocking it here and thus we deadlock.

Fix this by unlocking the cow block if we have any errors inside of
__btrfs_cow_block, and also free it so we do not leak it.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:59 +02:00
Goldwyn Rodrigues
e3c57805f8 btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
Since we now perform direct reads using i_rwsem, we can remove this
inode flag used to co-ordinate unlocked reads.

The truncate call takes i_rwsem. This means it is correctly synchronized
with concurrent direct reads.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:59 +02:00
Josef Bacik
92e26df43b btrfs: return error if we're unable to read device stats
I noticed when fixing device stats for seed devices that we simply threw
away the return value from btrfs_search_slot().  This is because we may
not have stat items, but we could very well get an error, and thus miss
reporting the error up the chain.

Fix this by returning ret if it's an actual error, and then stop trying
to init the rest of the devices stats and return the error up the chain.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:58 +02:00
Josef Bacik
124604eb50 btrfs: init device stats for seed devices
We recently started recording device stats across the fleet, and noticed
a large increase in messages such as this

  BTRFS warning (device dm-0): get dev_stats failed, not yet valid

on our tiers that use seed devices for their root devices.  This is
because we do not initialize the device stats for any seed devices if we
have a sprout device and mount using that sprout device.  The basic
steps for reproducing are:

  $ mkfs seed device
  $ mount seed device
  # fill seed device
  $ umount seed device
  $ btrfstune -S 1 seed device
  $ mount seed device
  $ btrfs device add -f sprout device /mnt/wherever
  $ umount /mnt/wherever
  $ mount sprout device /mnt/wherever
  $ btrfs device stats /mnt/wherever

This will fail with the above message in dmesg.

Fix this by iterating over the fs_devices->seed if they exist in
btrfs_init_dev_stats.  This fixed the problem and properly reports the
stats for both devices.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ rename to btrfs_device_init_dev_stats ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:58 +02:00
Nikolay Borisov
905eb88bce btrfs: remove struct extent_io_ops
It's no longer used just remove the function and any related code which
was initialising it for inodes. No functional changes.

Removing 8 bytes from extent_io_tree in turn reduces size of other
structures where it is embedded, notably btrfs_inode where it reduces
size by 24 bytes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:25 +02:00
Nikolay Borisov
1b36294a6c btrfs: call submit_bio_hook directly for metadata pages
No need to go through a function pointer indirection simply call
submit_bio_hook directly by exporting and renaming the helper to
btrfs_submit_metadata_bio. This makes the code more readable and should
result in somewhat faster code due to no longer paying the price for
specualtive attack mitigations that come with indirect function calls.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:25 +02:00
Nikolay Borisov
908930f3ed btrfs: stop calling submit_bio_hook for data inodes
Instead export and rename the function to btrfs_submit_data_bio and
call it directly in submit_one_bio. This avoids paying the cost for
speculative attacks mitigations and improves code readability.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:24 +02:00
Nikolay Borisov
be17b3afc4 btrfs: don't opencode is_data_inode in end_bio_extent_readpage
Use the is_data_inode helper.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:24 +02:00
Nikolay Borisov
cd0537449c btrfs: call submit_bio_hook directly in submit_one_bio
BTRFS has 2 inode types (for the purposes of the code in submit_one_bio)
- ordinary data inodes (including the freespace inode) and the btree
inode. Both of these implement submit_bio_hook so btrfsic_submit_bio can
never be called from submit_one_bio so just remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:24 +02:00
Nikolay Borisov
1f03d9cfda btrfs: remove extent_io_ops::readpage_end_io_hook
It's no longer used so let's remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:24 +02:00
Nikolay Borisov
9a446d6a9f btrfs: replace readpage_end_io_hook with direct calls
Don't call readpage_end_io_hook for the btree inode.  Instead of relying
on indirect calls to implement metadata buffer validation simply check
if the inode whose page we are processing equals the btree inode. If it
does call the necessary function.

This is an improvement in 2 directions:

1. We aren't paying the penalty of indirect calls in a post-speculation
   attacks world.

2. The function is now named more explicitly so it's obvious what's
   going on

This is in preparation to removing struct extent_io_ops altogether.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:24 +02:00
Filipe Manana
9c2b4e0347 btrfs: send, recompute reference path after orphanization of a directory
During an incremental send, when an inode has multiple new references we
might end up emitting rename operations for orphanizations that have a
source path that is no longer valid due to a previous orphanization of
some directory inode. This causes the receiver to fail since it tries
to rename a path that does not exists.

Example reproducer:

  $ cat reproducer.sh
  #!/bin/bash

  mkfs.btrfs -f /dev/sdi >/dev/null
  mount /dev/sdi /mnt/sdi

  touch /mnt/sdi/f1
  touch /mnt/sdi/f2
  mkdir /mnt/sdi/d1
  mkdir /mnt/sdi/d1/d2

  # Filesystem looks like:
  #
  # .                           (ino 256)
  # |----- f1                   (ino 257)
  # |----- f2                   (ino 258)
  # |----- d1/                  (ino 259)
  #        |----- d2/           (ino 260)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1
  btrfs send -f /tmp/snap1.send /mnt/sdi/snap1

  # Now do a series of changes such that:
  #
  # *) inode 258 has one new hardlink and the previous name changed
  #
  # *) both names conflict with the old names of two other inodes:
  #
  #    1) the new name "d1" conflicts with the old name of inode 259,
  #       under directory inode 256 (root)
  #
  #    2) the new name "d2" conflicts with the old name of inode 260
  #       under directory inode 259
  #
  # *) inodes 259 and 260 now have the old names of inode 258
  #
  # *) inode 257 is now located under inode 260 - an inode with a number
  #    smaller than the inode (258) for which we created a second hard
  #    link and swapped its names with inodes 259 and 260
  #
  ln /mnt/sdi/f2 /mnt/sdi/d1/f2_link
  mv /mnt/sdi/f1 /mnt/sdi/d1/d2/f1

  # Swap d1 and f2.
  mv /mnt/sdi/d1 /mnt/sdi/tmp
  mv /mnt/sdi/f2 /mnt/sdi/d1
  mv /mnt/sdi/tmp /mnt/sdi/f2

  # Swap d2 and f2_link
  mv /mnt/sdi/f2/d2 /mnt/sdi/tmp
  mv /mnt/sdi/f2/f2_link /mnt/sdi/f2/d2
  mv /mnt/sdi/tmp /mnt/sdi/f2/f2_link

  # Filesystem now looks like:
  #
  # .                                (ino 256)
  # |----- d1                        (ino 258)
  # |----- f2/                       (ino 259)
  #        |----- f2_link/           (ino 260)
  #        |       |----- f1         (ino 257)
  #        |
  #        |----- d2                 (ino 258)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2
  btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2

  mkfs.btrfs -f /dev/sdj >/dev/null
  mount /dev/sdj /mnt/sdj

  btrfs receive -f /tmp/snap1.send /mnt/sdj
  btrfs receive -f /tmp/snap2.send /mnt/sdj

  umount /mnt/sdi
  umount /mnt/sdj

When executed the receive of the incremental stream fails:

  $ ./reproducer.sh
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2
  At subvol snap1
  At snapshot snap2
  ERROR: rename d1/d2 -> o260-6-0 failed: No such file or directory

This happens because:

1) When processing inode 257 we end up computing the name for inode 259
   because it is an ancestor in the send snapshot, and at that point it
   still has its old name, "d1", from the parent snapshot because inode
   259 was not yet processed. We then cache that name, which is valid
   until we start processing inode 259 (or set the progress to 260 after
   processing its references);

2) Later we start processing inode 258 and collecting all its new
   references into the list sctx->new_refs. The first reference in the
   list happens to be the reference for name "d1" while the reference for
   name "d2" is next (the last element of the list).
   We compute the full path "d1/d2" for this second reference and store
   it in the reference (its ->full_path member). The path used for the
   new parent directory was "d1" and not "f2" because inode 259, the
   new parent, was not yet processed;

3) When we start processing the new references at process_recorded_refs()
   we start with the first reference in the list, for the new name "d1".
   Because there is a conflicting inode that was not yet processed, which
   is directory inode 259, we orphanize it, renaming it from "d1" to
   "o259-6-0";

4) Then we start processing the new reference for name "d2", and we
   realize it conflicts with the reference of inode 260 in the parent
   snapshot. So we issue an orphanization operation for inode 260 by
   emitting a rename operation with a destination path of "o260-6-0"
   and a source path of "d1/d2" - this source path is the value we
   stored in the reference earlier at step 2), corresponding to the
   ->full_path member of the reference, however that path is no longer
   valid due to the orphanization of the directory inode 259 in step 3).
   This makes the receiver fail since the path does not exists, it should
   have been "o259-6-0/d2".

Fix this by recomputing the full path of a reference before emitting an
orphanization if we previously orphanized any directory, since that
directory could be a parent in the new path. This is a rare scenario so
keeping it simple and not checking if that previously orphanized directory
is in fact an ancestor of the inode we are trying to orphanize.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:23 +02:00
Filipe Manana
98272bb77b btrfs: send, orphanize first all conflicting inodes when processing references
When doing an incremental send it is possible that when processing the new
references for an inode we end up issuing rename or link operations that
have an invalid path, which contains the orphanized name of a directory
before we actually orphanized it, causing the receiver to fail.

The following reproducer triggers such scenario:

  $ cat reproducer.sh
  #!/bin/bash

  mkfs.btrfs -f /dev/sdi >/dev/null
  mount /dev/sdi /mnt/sdi

  touch /mnt/sdi/a
  touch /mnt/sdi/b
  mkdir /mnt/sdi/testdir
  # We want "a" to have a lower inode number then "testdir" (257 vs 259).
  mv /mnt/sdi/a /mnt/sdi/testdir/a

  # Filesystem looks like:
  #
  # .                           (ino 256)
  # |----- testdir/             (ino 259)
  # |          |----- a         (ino 257)
  # |
  # |----- b                    (ino 258)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1
  btrfs send -f /tmp/snap1.send /mnt/sdi/snap1

  # Now rename 259 to "testdir_2", then change the name of 257 to
  # "testdir" and make it a direct descendant of the root inode (256).
  # Also create a new link for inode 257 with the old name of inode 258.
  # By swapping the names and location of several inodes and create a
  # nasty dependency chain of rename and link operations.
  mv /mnt/sdi/testdir/a /mnt/sdi/a2
  touch /mnt/sdi/testdir/a
  mv /mnt/sdi/b /mnt/sdi/b2
  ln /mnt/sdi/a2 /mnt/sdi/b
  mv /mnt/sdi/testdir /mnt/sdi/testdir_2
  mv /mnt/sdi/a2 /mnt/sdi/testdir

  # Filesystem now looks like:
  #
  # .                            (ino 256)
  # |----- testdir_2/            (ino 259)
  # |          |----- a          (ino 260)
  # |
  # |----- testdir               (ino 257)
  # |----- b                     (ino 257)
  # |----- b2                    (ino 258)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2
  btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2

  mkfs.btrfs -f /dev/sdj >/dev/null
  mount /dev/sdj /mnt/sdj

  btrfs receive -f /tmp/snap1.send /mnt/sdj
  btrfs receive -f /tmp/snap2.send /mnt/sdj

  umount /mnt/sdi
  umount /mnt/sdj

When running the reproducer, the receive of the incremental send stream
fails:

  $ ./reproducer.sh
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2
  At subvol snap1
  At snapshot snap2
  ERROR: link b -> o259-6-0/a failed: No such file or directory

The problem happens because of the following:

1) Before we start iterating the list of new references for inode 257,
   we generate its current path and store it at @valid_path, done at
   the very beginning of process_recorded_refs(). The generated path
   is "o259-6-0/a", containing the orphanized name for inode 259;

2) Then we iterate over the list of new references, which has the
   references "b" and "testdir" in that specific order;

3) We process reference "b" first, because it is in the list before
   reference "testdir". We then issue a link operation to create
   the new reference "b" using a target path corresponding to the
   content at @valid_path, which corresponds to "o259-6-0/a".
   However we haven't yet orphanized inode 259, its name is still
   "testdir", and not "o259-6-0". The orphanization of 259 did not
   happen yet because we will process the reference named "testdir"
   for inode 257 only in the next iteration of the loop that goes
   over the list of new references.

Fix the issue by having a preliminar iteration over all the new references
at process_recorded_refs(). This iteration is responsible only for doing
the orphanization of other inodes that have and old reference that
conflicts with one of the new references of the inode we are currently
processing. The emission of rename and link operations happen now in the
next iteration of the new references.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:23 +02:00
Qu Wenruo
1465af12e2 btrfs: tree-checker: fix false alert caused by legacy btrfs root item
Commit 259ee7754b ("btrfs: tree-checker: Add ROOT_ITEM check")
introduced btrfs root item size check, however btrfs root item has two
versions, the legacy one which just ends before generation_v2 member, is
smaller than current btrfs root item size.

This caused btrfs kernel to reject valid but old tree root leaves.

Fix this problem by also allowing legacy root item, since kernel can
already handle them pretty well and upgrade to newer root item format
when needed.

Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Fixes: 259ee7754b ("btrfs: tree-checker: Add ROOT_ITEM check")
CC: stable@vger.kernel.org # 5.4+
Tested-By: Martin Steigerwald <martin@lichtvoll.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:23 +02:00
David Sterba
e97659cefe btrfs: use unaligned helpers for stack and header set/get helpers
In the definitions generated by BTRFS_SETGET_HEADER_FUNCS there's direct
pointer assignment but we should use the helpers for unaligned access
for clarity. It hasn't been a problem so far because of the natural
alignment.

Similarly for BTRFS_SETGET_STACK_FUNCS, that usually get a structure
from stack that has an aligned start but some members may not be aligned
due to packing. This as well hasn't caused problems so far.

Move the put/get_unaligned_le8 stubs to ctree.h so we can use them.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:23 +02:00
David Sterba
6994ca367c btrfs: free-space-cache: use unaligned helpers to access data
The free space inode stores the tracking data, checksums etc, using the
io_ctl structure and moving the pointers. The data are generally aligned
to at least 4 bytes (u32 for CRC) so it's not completely unaligned but
for clarity we should use the proper helpers whenever a struct is
initialized from io_ctl->cur pointer.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:23 +02:00
David Sterba
e2f896b318 btrfs: send: use helpers for unaligned access to header members
The header is mapped onto the send buffer and thus its members may be
potentially unaligned so use the helpers instead of directly assigning
the pointers. This has worked so far but let's use the helpers to make
that clear.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:23 +02:00
Qu Wenruo
2c53a14dd3 btrfs: use own btree inode io_tree owner id
Btree inode is special compared to all other inode extent io_trees,
although it has a btrfs inode, it doesn't have the track_uptodate bit at
all.

This means a lot of things like extent locking doesn't even need to be
applied to btree io tree.

Since it's so special, adds a new owner value for it to make debuging a
little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:22 +02:00
Johannes Thumshirn
6b613cc97f btrfs: reschedule when cloning lots of extents
We have several occurrences of a soft lockup from fstest's generic/175
testcase, which look more or less like this one:

  watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [xfs_io:10030]
  Kernel panic - not syncing: softlockup: hung tasks
  CPU: 0 PID: 10030 Comm: xfs_io Tainted: G             L    5.9.0-rc5+ #768
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack+0x77/0xa0
   panic+0xfa/0x2cb
   watchdog_timer_fn.cold+0x85/0xa5
   ? lockup_detector_update_enable+0x50/0x50
   __hrtimer_run_queues+0x99/0x4c0
   ? recalibrate_cpu_khz+0x10/0x10
   hrtimer_run_queues+0x9f/0xb0
   update_process_times+0x28/0x80
   tick_handle_periodic+0x1b/0x60
   __sysvec_apic_timer_interrupt+0x76/0x210
   asm_call_on_stack+0x12/0x20
   </IRQ>
   sysvec_apic_timer_interrupt+0x7f/0x90
   asm_sysvec_apic_timer_interrupt+0x12/0x20
  RIP: 0010:btrfs_tree_unlock+0x91/0x1a0 [btrfs]
  RSP: 0018:ffffc90007123a58 EFLAGS: 00000282
  RAX: ffff8881cea2fbe0 RBX: ffff8881cea2fbe0 RCX: 0000000000000000
  RDX: ffff8881d23fd200 RSI: ffffffff82045220 RDI: ffff8881cea2fba0
  RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000032
  R10: 0000160000000000 R11: 0000000000001000 R12: 0000000000001000
  R13: ffff8882357fd5b0 R14: ffff88816fa76e70 R15: ffff8881cea2fad0
   ? btrfs_tree_unlock+0x15b/0x1a0 [btrfs]
   btrfs_release_path+0x67/0x80 [btrfs]
   btrfs_insert_replace_extent+0x177/0x2c0 [btrfs]
   btrfs_replace_file_extents+0x472/0x7c0 [btrfs]
   btrfs_clone+0x9ba/0xbd0 [btrfs]
   btrfs_clone_files.isra.0+0xeb/0x140 [btrfs]
   ? file_update_time+0xcd/0x120
   btrfs_remap_file_range+0x322/0x3b0 [btrfs]
   do_clone_file_range+0xb7/0x1e0
   vfs_clone_file_range+0x30/0xa0
   ioctl_file_clone+0x8a/0xc0
   do_vfs_ioctl+0x5b2/0x6f0
   __x64_sys_ioctl+0x37/0xa0
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f87977fc247
  RSP: 002b:00007ffd51a2f6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87977fc247
  RDX: 00007ffd51a2f710 RSI: 000000004020940d RDI: 0000000000000003
  RBP: 0000000000000004 R08: 00007ffd51a79080 R09: 0000000000000000
  R10: 00005621f11352f2 R11: 0000000000000206 R12: 0000000000000000
  R13: 0000000000000000 R14: 00005621f128b958 R15: 0000000080000000
  Kernel Offset: disabled
  ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---

All of these lockup reports have the call chain btrfs_clone_files() ->
btrfs_clone() in common. btrfs_clone_files() calls btrfs_clone() with
both source and destination extents locked and loops over the source
extent to create the clones.

Conditionally reschedule in the btrfs_clone() loop, to give some time back
to other processes.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:22 +02:00
Denis Efremov
bae12df966 btrfs: use kvcalloc for allocation in btrfs_ioctl_send()
Replace kvzalloc() call with kvcalloc() that also checks the size
internally. There's a standalone overflow check in the function so we
can return invalid parameter combination.  Use array_size() helper to
compute the memory size for clone_sources_tmp.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:22 +02:00
Denis Efremov
8eb2fd0015 btrfs: use kvzalloc() to allocate clone_roots in btrfs_ioctl_send()
btrfs_ioctl_send() used open-coded kvzalloc implementation earlier.
The code was accidentally replaced with kzalloc() call [1]. Restore
the original code by using kvzalloc() to allocate sctx->clone_roots.

[1] https://patchwork.kernel.org/patch/9757891/#20529627

Fixes: 818e010bf9 ("btrfs: replace opencoded kvzalloc with the helper")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:22 +02:00
Nikolay Borisov
c0a4360305 btrfs: remove inode argument from btrfs_start_ordered_extent
The passed in ordered_extent struct is always well-formed and contains
the inode making the explicit argument redundant.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:22 +02:00
Nikolay Borisov
510f85edf1 btrfs: remove inode argument from add_pending_csums
It's used to reference the csum root which can be done from the trans
handle as well. Simplify the signature and while at it also remove the
noinline attribute as the function uses only at most 16 bytes of stack
space.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:21 +02:00
Nikolay Borisov
3c38c877fc btrfs: sink inode argument in insert_ordered_extent_file_extent
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:21 +02:00
Nikolay Borisov
71fe0a55da btrfs: switch btrfs_remove_ordered_extent to btrfs_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:21 +02:00
Nikolay Borisov
633cc816f7 btrfs: clean BTRFS_I usage in btrfs_destroy_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:21 +02:00