Sync up with android12-5.10 for the following commits:
90732d426e UPSTREAM: mm/compaction: correct deferral logic for proactive compaction
d0a88ae479 ANDROID: Enable GKI Dr. No Enforcement
e6a59da61e ANDROID: media: v4l2-core: Fix deadlock in vendor hook
fdb1cfe2d3 FROMGIT: dma_buf: remove dmabuf sysfs teardown before release
1dc0dd2573 ANDROID: update mtk symbol list
a669748346 UPSTREAM: mfd: syscon: Free the allocated name field of struct regmap_config
cffe67a351 ANDROID: Give UIC cmd timeout a larger value
3bca0b5344 ANDROID: binder: retry security_secid_to_secctx()
9fdfeda4c9 ANDROID: update new gki symbol for mtk
1dd167be9f ANDROID: mm, kasan: fix for "integrate page_alloc init with HW_TAGS"
f215850ea0 UPSTREAM: kasan: fix conflict with page poisoning
a3da917661 ANDROID: GKI: Export two more mm symbols for GKI
0497b9601b ANDROID: Update symbol list for mtk
0f27e1d317 FROMLIST: kfence: skip all GFP_ZONEMASK allocations
8478d8dc53 FROMLIST: kfence: move the size check to the beginning of __kfence_alloc()
ddabf14bea ANDROID: ABI: update allowed list for galaxy
11cec52238 FROMGIT: f2fs: add sysfs nodes to get GC info for each GC mode
fdc46110cb ANDROID: abi_gki_aarch64_qcom: Add android_debug_for_each_module
0bb433e014 ANDROID: debug_symbols: Add android_debug_for_each_module
a0429aa1d0 ANDROID: ABI: Update ABI for symbol list updates
c68e5ca9f8 ANDROID: GKI: Update symbols to symbol list
aac5a77959 ANDROID: Update symbol list for mtk
42fc1faf44 UPSTREAM: block, bfq: set next_rq to waker_bfqq->next_rq in waker injection
e37cc8a09f UPSTREAM: mm/mremap: hold the rmap lock in write mode when moving page table entries.
9b136eab76 ANDROID: pstore/ram: Add backward compatibility for ramoops reserved region
df97297651 ANDROID: Update symbol list for mtk
4322bb07e8 ANDROID: vendor_hooks: Modify the function name
007213b1d0 BACKPORT: FROMLIST: kasan: add memzero int for unaligned size at DEBUG
7acbce0bf4 BACKPORT: FROMLIST: mm: move helper to check slub_debug_enabled
d6eeae59b5 ANDROID: ABI: initial update allowed list for galaxy
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6d28f1be3cbd8a1fed86d2d647e6c2bb55221b0a
This effectively locks down OWNERS approval to a small group to guard
the code base against unintentional breakages.
Bug: 194314089
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: Ifd1ea97639a622320ea83f901f6451e2e52b38d4
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD22M4ACgkQONu9yGCS
aT6MbxAAmAmm5sj9fMOk4ijcRz+CbpTZZdSS5VT3BbJJrTrZ1qk9KO0/uAhY1SXe
uuP8Q7BD2llWQ6hQiJj7vZnXKiJsBrngH//4QFa72TjXxa1ENPcw/UceLxMgT6e1
0r74mcikr4f2A6kFoAexyteHfzm0D+8ZfgzJtTKPbBPqCqRIaZFO84xyTclvNJ6h
0Xx2dSNWDU1dXvUe+43ggaZUFNe4SHGupgsc3GSsxPkyKTzrXMGsRh4m3p94t4py
WQULkq/57JaeJwQcxWMOPqLIHF/IWtZSfr+YHx+q1zvThK9uUsAd3e8B8r6FJswV
xmDIDkCw3VIRxTiYhYKFqiDZOanlxYOtWXCXAyI+YV/JIT4NGHapg91qEq5PR9Ti
tkSmAbwaON9LzshzWuKjDHMDJO6x/i3YJmaXuVgBciD56xIvCMKiiardpey2574o
M5m8fLcXQXQBPY/WrusAn312/PanUxgstn6EJgInq0M4GoMj1aUb+W8luQfqq0G5
gYNjzTCEQErTITAyha9SLQvBVH3snfHpKoDL669HK6woq2GH2K48YIKxCam5trlH
9P9LCXBhSHe19/r2qLotcMW7XmuSsu77FetZtxFJQeWVqRokmyHzZnTbeoQAxhpa
xGkvnbnyLettCCtm/JAmwnASqIYnUxuvgj7zcPb5PGnhwk1rfXU=
=b0QD
-----END PGP SIGNATURE-----
Merge 5.10.52 into android12-5.10-lts
Changes in 5.10.52
certs: add 'x509_revocation_list' to gitignore
cifs: handle reconnect of tcon when there is no cached dfs referral
KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio
KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled
KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs
KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA
KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run()
scsi: core: Fix bad pointer dereference when ehandler kthread is invalid
scsi: zfcp: Report port fc_security as unknown early during remote cable pull
tracing: Do not reference char * as a string in histograms
drm/i915/gtt: drop the page table optimisation
drm/i915/gt: Fix -EDEADLK handling regression
cgroup: verify that source is a string
fbmem: Do not delete the mode that is still in use
drm/dp_mst: Do not set proposed vcpi directly
drm/dp_mst: Avoid to mess up payload table by ports in stale topology
drm/dp_mst: Add missing drm parameters to recently added call to drm_dbg_kms()
drm/ingenic: Fix non-OSD mode
drm/ingenic: Switch IPU plane to type OVERLAY
Revert "drm/ast: Remove reference to struct drm_device.pdev"
net: bridge: multicast: fix PIM hello router port marking race
net: bridge: multicast: fix MRD advertisement router port marking race
leds: tlc591xx: fix return value check in tlc591xx_probe()
ASoC: Intel: sof_sdw: add mutual exclusion between PCH DMIC and RT715
dmaengine: fsl-qdma: check dma_set_mask return value
scsi: arcmsr: Fix the wrong CDB payload report to IOP
srcu: Fix broken node geometry after early ssp init
rcu: Reject RCU_LOCKDEP_WARN() false positives
tty: serial: fsl_lpuart: fix the potential risk of division or modulo by zero
serial: fsl_lpuart: disable DMA for console and fix sysrq
misc/libmasm/module: Fix two use after free in ibmasm_init_one
misc: alcor_pci: fix null-ptr-deref when there is no PCI bridge
ASoC: intel/boards: add missing MODULE_DEVICE_TABLE
partitions: msdos: fix one-byte get_unaligned()
iio: gyro: fxa21002c: Balance runtime pm + use pm_runtime_resume_and_get().
iio: magn: bmc150: Balance runtime pm + use pm_runtime_resume_and_get()
ALSA: usx2y: Avoid camelCase
ALSA: usx2y: Don't call free_pages_exact() with NULL address
Revert "ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro"
usb: common: usb-conn-gpio: fix NULL pointer dereference of charger
w1: ds2438: fixing bug that would always get page0
scsi: arcmsr: Fix doorbell status being updated late on ARC-1886
scsi: hisi_sas: Propagate errors in interrupt_init_v1_hw()
scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology
scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs
scsi: core: Cap scsi_host cmd_per_lun at can_queue
ALSA: ac97: fix PM reference leak in ac97_bus_remove()
tty: serial: 8250: serial_cs: Fix a memory leak in error handling path
scsi: mpt3sas: Fix deadlock while cancelling the running firmware event
scsi: core: Fixup calling convention for scsi_mode_sense()
scsi: scsi_dh_alua: Check for negative result value
fs/jfs: Fix missing error code in lmLogInit()
scsi: megaraid_sas: Fix resource leak in case of probe failure
scsi: megaraid_sas: Early detection of VD deletion through RaidMap update
scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs
scsi: iscsi: Add iscsi_cls_conn refcount helpers
scsi: iscsi: Fix conn use after free during resets
scsi: iscsi: Fix shost->max_id use
scsi: qedi: Fix null ref during abort handling
scsi: qedi: Fix race during abort timeouts
scsi: qedi: Fix TMF session block/unblock use
scsi: qedi: Fix cleanup session block/unblock use
mfd: da9052/stmpe: Add and modify MODULE_DEVICE_TABLE
mfd: cpcap: Fix cpcap dmamask not set warnings
ASoC: img: Fix PM reference leak in img_i2s_in_probe()
fsi: Add missing MODULE_DEVICE_TABLE
serial: tty: uartlite: fix console setup
s390/sclp_vt220: fix console name to match device
s390: disable SSP when needed
selftests: timers: rtcpie: skip test if default RTC device does not exist
ALSA: sb: Fix potential double-free of CSP mixer elements
powerpc/ps3: Add dma_mask to ps3_dma_region
iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails
iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation
ASoC: soc-pcm: fix the return value in dpcm_apply_symmetry()
gpio: zynq: Check return value of pm_runtime_get_sync
gpio: zynq: Check return value of irq_get_irq_data
scsi: storvsc: Correctly handle multiple flags in srb_status
ALSA: ppc: fix error return code in snd_pmac_probe()
selftests/powerpc: Fix "no_handler" EBB selftest
gpio: pca953x: Add support for the On Semi pca9655
powerpc/mm/book3s64: Fix possible build error
ASoC: soc-core: Fix the error return code in snd_soc_of_parse_audio_routing()
habanalabs/gaudi: set the correct cpu_id on MME2_QM failure
habanalabs: remove node from list before freeing the node
s390/processor: always inline stap() and __load_psw_mask()
s390/ipl_parm: fix program check new psw handling
s390/mem_detect: fix diag260() program check new psw handling
s390/mem_detect: fix tprot() program check new psw handling
Input: hideep - fix the uninitialized use in hideep_nvm_unlock()
ALSA: bebob: add support for ToneWeal FW66
ALSA: usb-audio: scarlett2: Fix 18i8 Gen 2 PCM Input count
ALSA: usb-audio: scarlett2: Fix data_mutex lock
ALSA: usb-audio: scarlett2: Fix scarlett2_*_ctl_put() return values
usb: gadget: f_hid: fix endianness issue with descriptors
usb: gadget: hid: fix error return code in hid_bind()
powerpc/boot: Fixup device-tree on little endian
ASoC: Intel: kbl_da7219_max98357a: shrink platform_id below 20 characters
backlight: lm3630a: Fix return code of .update_status() callback
ALSA: hda: Add IRQ check for platform_get_irq()
ALSA: usb-audio: scarlett2: Fix 6i6 Gen 2 line out descriptions
ALSA: firewire-motu: fix detection for S/PDIF source on optical interface in v2 protocol
leds: turris-omnia: add missing MODULE_DEVICE_TABLE
staging: rtl8723bs: fix macro value for 2.4Ghz only device
intel_th: Wait until port is in reset before programming it
i2c: core: Disable client irq on reboot/shutdown
phy: intel: Fix for warnings due to EMMC clock 175Mhz change in FIP
lib/decompress_unlz4.c: correctly handle zero-padding around initrds.
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
power: supply: sc27xx: Add missing MODULE_DEVICE_TABLE
power: supply: sc2731_charger: Add missing MODULE_DEVICE_TABLE
pwm: spear: Don't modify HW state in .remove callback
PCI: ftpci100: Rename macro name collision
power: supply: ab8500: Avoid NULL pointers
PCI: hv: Fix a race condition when removing the device
power: supply: max17042: Do not enforce (incorrect) interrupt trigger type
power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE
ARM: 9087/1: kprobes: test-thumb: fix for LLVM_IAS=1
PCI/P2PDMA: Avoid pci_get_slot(), which may sleep
NFSv4: Fix delegation return in cases where we have to retry
PCI: pciehp: Ignore Link Down/Up caused by DPC
watchdog: Fix possible use-after-free in wdt_startup()
watchdog: sc520_wdt: Fix possible use-after-free in wdt_turnoff()
watchdog: Fix possible use-after-free by calling del_timer_sync()
watchdog: imx_sc_wdt: fix pretimeout
watchdog: iTCO_wdt: Account for rebooting on second timeout
x86/fpu: Return proper error codes from user access functions
remoteproc: core: Fix cdev remove and rproc del
PCI: tegra: Add missing MODULE_DEVICE_TABLE
orangefs: fix orangefs df output.
ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty
drm/gma500: Add the missed drm_gem_object_put() in psb_user_framebuffer_create()
NFS: nfs_find_open_context() may only select open files
power: supply: charger-manager: add missing MODULE_DEVICE_TABLE
power: supply: ab8500: add missing MODULE_DEVICE_TABLE
drm/amdkfd: fix sysfs kobj leak
pwm: img: Fix PM reference leak in img_pwm_enable()
pwm: tegra: Don't modify HW state in .remove callback
ACPI: AMBA: Fix resource name in /proc/iomem
ACPI: video: Add quirk for the Dell Vostro 3350
PCI: rockchip: Register IRQ handlers after device and data are ready
virtio-blk: Fix memory leak among suspend/resume procedure
virtio_net: Fix error handling in virtnet_restore()
virtio_console: Assure used length from device is limited
f2fs: atgc: fix to set default age threshold
NFSD: Fix TP_printk() format specifier in nfsd_clid_class
x86/signal: Detect and prevent an alternate signal stack overflow
f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfs
f2fs: compress: fix to disallow temp extension
remoteproc: k3-r5: Fix an error message
PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrun
power: supply: rt5033_battery: Fix device tree enumeration
NFSv4: Initialise connection to the server in nfs4_alloc_client()
NFSv4: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
misc: alcor_pci: fix inverted branch condition
um: fix error return code in slip_open()
um: fix error return code in winch_tramp()
ubifs: Fix off-by-one error
ubifs: journal: Fix error return code in ubifs_jnl_write_inode()
watchdog: aspeed: fix hardware timeout calculation
watchdog: jz4740: Fix return value check in jz4740_wdt_probe()
SUNRPC: prevent port reuse on transports which don't request it.
nfs: fix acl memory leak of posix_acl_create()
ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode
PCI: iproc: Fix multi-MSI base vector number allocation
PCI: iproc: Support multi-MSI only on uniprocessor kernel
f2fs: fix to avoid adding tab before doc section
x86/fpu: Fix copy_xstate_to_kernel() gap handling
x86/fpu: Limit xstate copy size in xstateregs_set()
PCI: intel-gw: Fix INTx enable
pwm: imx1: Don't disable clocks at device remove time
PCI: tegra194: Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift
vdpa/mlx5: Fix umem sizes assignments on VQ create
vdpa/mlx5: Fix possible failure in umem size calculation
virtio_net: move tx vq operation under tx queue lock
nvme-tcp: can't set sk_user_data without write_lock
nfsd: Reduce contention for the nfsd_file nf_rwsem
ALSA: isa: Fix error return code in snd_cmi8330_probe()
vdpa/mlx5: Clear vq ready indication upon device reset
NFSv4/pnfs: Fix the layout barrier update
NFSv4/pnfs: Fix layoutget behaviour after invalidation
NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script
hexagon: use common DISCARDS macro
ARM: dts: gemini-rut1xx: remove duplicate ethernet node
reset: RESET_BRCMSTB_RESCAL should depend on ARCH_BRCMSTB
reset: RESET_INTEL_GW should depend on X86
reset: a10sr: add missing of_match_table reference
ARM: exynos: add missing of_node_put for loop iteration
ARM: dts: exynos: fix PWM LED max brightness on Odroid XU/XU3
ARM: dts: exynos: fix PWM LED max brightness on Odroid HC1
ARM: dts: exynos: fix PWM LED max brightness on Odroid XU4
memory: stm32-fmc2-ebi: add missing of_node_put for loop iteration
memory: atmel-ebi: add missing of_node_put for loop iteration
reset: brcmstb: Add missing MODULE_DEVICE_TABLE
memory: pl353: Fix error return code in pl353_smc_probe()
ARM: dts: sun8i: h3: orangepi-plus: Fix ethernet phy-mode
rtc: fix snprintf() checking in is_rtc_hctosys()
arm64: dts: renesas: v3msk: Fix memory size
ARM: dts: r8a7779, marzen: Fix DU clock names
arm64: dts: ti: j7200-main: Enable USB2 PHY RX sensitivity workaround
arm64: dts: renesas: Add missing opp-suspend properties
arm64: dts: renesas: r8a7796[01]: Fix OPP table entry voltages
ARM: dts: stm32: Connect PHY IRQ line on DH STM32MP1 SoM
ARM: dts: stm32: Rework LAN8710Ai PHY reset on DHCOM SoM
arm64: dts: qcom: trogdor: Add no-hpd to DSI bridge node
firmware: tegra: Fix error return code in tegra210_bpmp_init()
firmware: arm_scmi: Reset Rx buffer to max size during async commands
dt-bindings: i2c: at91: fix example for scl-gpios
ARM: dts: BCM5301X: Fixup SPI binding
reset: bail if try_module_get() fails
arm64: dts: renesas: r8a779a0: Drop power-domains property from GIC node
arm64: dts: ti: k3-j721e-main: Fix external refclk input to SERDES
memory: fsl_ifc: fix leak of IO mapping on probe failure
memory: fsl_ifc: fix leak of private memory on probe failure
arm64: dts: allwinner: a64-sopine-baseboard: change RGMII mode to TXID
ARM: dts: dra7: Fix duplicate USB4 target module node
ARM: dts: am335x: align ti,pindir-d0-out-d1-in property with dt-shema
ARM: dts: am437x: align ti,pindir-d0-out-d1-in property with dt-shema
thermal/drivers/sprd: Add missing MODULE_DEVICE_TABLE
ARM: dts: imx6q-dhcom: Fix ethernet reset time properties
ARM: dts: imx6q-dhcom: Fix ethernet plugin detection problems
ARM: dts: imx6q-dhcom: Add gpios pinctrl for i2c bus recovery
thermal/drivers/rcar_gen3_thermal: Fix coefficient calculations
firmware: turris-mox-rwtm: fix reply status decoding function
firmware: turris-mox-rwtm: report failures better
firmware: turris-mox-rwtm: fail probing when firmware does not support hwrng
firmware: turris-mox-rwtm: show message about HWRNG registration
arm64: dts: rockchip: Re-add regulator-boot-on, regulator-always-on for vdd_gpu on rk3399-roc-pc
arm64: dts: rockchip: Re-add regulator-always-on for vcc_sdio for rk3399-roc-pc
scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
sched/uclamp: Ignore max aggregation if rq is idle
jump_label: Fix jump_label_text_reserved() vs __init
static_call: Fix static_call_text_reserved() vs __init
mips: always link byteswap helpers into decompressor
mips: disable branch profiling in boot/decompress.o
MIPS: vdso: Invalid GIC access through VDSO
scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg()
seq_file: disallow extremely large seq buffer allocations
Linux 5.10.52
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic1b04661728db8b0e060ca6935783e15a22210da
[ Upstream commit 1b1774998b2dec837a57d729d1a22e5eb2d6d206 ]
A simplification of get_unaligned() clashes with callers that pass
in a character pointer, causing a harmless warning like:
block/partitions/msdos.c: In function 'msdos_partition':
include/asm-generic/unaligned.h:13:22: warning: 'packed' attribute ignored for field of type 'u8' {aka 'unsigned char'} [-Wattributes]
Remove the SYS_IND() macro with the get_unaligned() call
and just use the ->ind field directly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD1LZUACgkQONu9yGCS
aT7gERAArjYemBSkD4/nq5HvxoVu7ueEyqI2orJCyB6b5npPrBZjlKna3SuYuNUF
CmcX5Y2Ynxd3gWJvYFEJdAAQAEtKAzPdPv0QJ+KiLSP2bEZ4Q5grEJXfzcgxcB4L
fUfaCZnUwjoII5xzW1+U2zCJ7YtGx5hZySLKMxSEc0IzawDlfMx4HdwlohXzczsA
Zq3/sTJcW30PWSp6MSuMOH//lPh7sAoCnksAv4Yb8MZYZjC8JNnKFn+IwRUGWEMZ
sFtNbq51sMgGq4TjYBIdO6wBElP4dgWJhYc4cO0667cDvgp6iod/bKlOLJSiwIVX
uPWkyPihH9ZUvNVY0TxjjnxS1rnM0QhH+cEXNGn+SE7KaNmzsI2MaR+DVA2yWcFr
9edqTq5x2CJJ0R/oXHP4nYFtsvlV/QcirlrF0OZHYwz84b16f37Ac7tHOQpr3kNO
N29AW0l5XmpxbfHgo1Iaoi02seouLC47vRkvjTpS9mValGUC0ciXTb97CK4FremE
34rskxIRWBU8HECYioFOeHTAi0+xb/9tOj87BnB5CJ28CD2Md27TTdorsISnqPb0
/ER89QfVtlJLi17wGB0rlAm8fDF3Cy6BnA57QIql1z1NbJGc1cxenEAFiIOnbQ5G
t6SLs3mgqpQizsKUFYimCWa04ZzY4Bg8H9bAI+M+9w7J6yujQzI=
=dGMa
-----END PGP SIGNATURE-----
Merge 5.10.51 into android12-5.10-lts
Changes in 5.10.51
drm/mxsfb: Don't select DRM_KMS_FB_HELPER
drm/zte: Don't select DRM_KMS_FB_HELPER
drm/ast: Fixed CVE for DP501
drm/amd/display: fix HDCP reset sequence on reinitialize
drm/amd/amdgpu/sriov disable all ip hw status by default
drm/vc4: fix argument ordering in vc4_crtc_get_margins()
drm/bridge: nwl-dsi: Force a full modeset when crtc_state->active is changed to be true
net: pch_gbe: Use proper accessors to BE data in pch_ptp_match()
drm/amd/display: fix use_max_lb flag for 420 pixel formats
clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe()
hugetlb: clear huge pte during flush function on mips platform
atm: iphase: fix possible use-after-free in ia_module_exit()
mISDN: fix possible use-after-free in HFC_cleanup()
atm: nicstar: Fix possible use-after-free in nicstar_cleanup()
net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
drm/mediatek: Fix PM reference leak in mtk_crtc_ddp_hw_init()
net: mdio: ipq8064: add regmap config to disable REGCACHE
drm/bridge: lt9611: Add missing MODULE_DEVICE_TABLE
reiserfs: add check for invalid 1st journal block
drm/virtio: Fix double free on probe failure
net: mdio: provide shim implementation of devm_of_mdiobus_register
net/sched: cls_api: increase max_reclassify_loop
pinctrl: equilibrium: Add missing MODULE_DEVICE_TABLE
drm/scheduler: Fix hang when sched_entity released
drm/sched: Avoid data corruptions
udf: Fix NULL pointer dereference in udf_symlink function
drm/vc4: Fix clock source for VEC PixelValve on BCM2711
drm/vc4: hdmi: Fix PM reference leak in vc4_hdmi_encoder_pre_crtc_co()
e100: handle eeprom as little endian
igb: handle vlan types with checker enabled
igb: fix assignment on big endian machines
drm/bridge: cdns: Fix PM reference leak in cdns_dsi_transfer()
clk: renesas: r8a77995: Add ZA2 clock
net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet
net/mlx5: Fix lag port remapping logic
drm: rockchip: add missing registers for RK3188
drm: rockchip: add missing registers for RK3066
net: stmmac: the XPCS obscures a potential "PHY not found" error
RDMA/rtrs: Change MAX_SESS_QUEUE_DEPTH
clk: tegra: Fix refcounting of gate clocks
clk: tegra: Ensure that PLLU configuration is applied properly
drm: bridge: cdns-mhdp8546: Fix PM reference leak in
virtio-net: Add validation for used length
ipv6: use prandom_u32() for ID generation
MIPS: cpu-probe: Fix FPU detection on Ingenic JZ4760(B)
MIPS: ingenic: Select CPU_SUPPORTS_CPUFREQ && MIPS_EXTERNAL_TIMER
drm/amd/display: Avoid HDCP over-read and corruption
drm/amdgpu: remove unsafe optimization to drop preamble ib
net: tcp better handling of reordering then loss cases
RDMA/cxgb4: Fix missing error code in create_qp()
dm space maps: don't reset space map allocation cursor when committing
dm writecache: don't split bios when overwriting contiguous cache content
dm: Fix dm_accept_partial_bio() relative to zone management commands
net: bridge: mrp: Update ring transitions.
pinctrl: mcp23s08: fix race condition in irq handler
ice: set the value of global config lock timeout longer
ice: fix clang warning regarding deadcode.DeadStores
virtio_net: Remove BUG() to avoid machine dead
net: mscc: ocelot: check return value after calling platform_get_resource()
net: bcmgenet: check return value after calling platform_get_resource()
net: mvpp2: check return value after calling platform_get_resource()
net: micrel: check return value after calling platform_get_resource()
net: moxa: Use devm_platform_get_and_ioremap_resource()
drm/amd/display: Fix DCN 3.01 DSCCLK validation
drm/amd/display: Update scaling settings on modeset
drm/amd/display: Release MST resources on switch from MST to SST
drm/amd/display: Set DISPCLK_MAX_ERRDET_CYCLES to 7
drm/amd/display: Fix off-by-one error in DML
net: phy: realtek: add delay to fix RXC generation issue
selftests: Clean forgotten resources as part of cleanup()
net: sgi: ioc3-eth: check return value after calling platform_get_resource()
drm/amdkfd: use allowed domain for vmbo validation
fjes: check return value after calling platform_get_resource()
selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
r8169: avoid link-up interrupt issue on RTL8106e if user enables ASPM
drm/amd/display: Verify Gamma & Degamma LUT sizes in amdgpu_dm_atomic_check
xfrm: Fix error reporting in xfrm_state_construct.
dm writecache: commit just one block, not a full page
wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP
wl1251: Fix possible buffer overflow in wl1251_cmd_scan
cw1200: add missing MODULE_DEVICE_TABLE
drm/amdkfd: fix circular locking on get_wave_state
drm/amdkfd: Fix circular lock in nocpsch path
bpf: Fix up register-based shifts in interpreter to silence KUBSAN
ice: fix incorrect payload indicator on PTYPE
ice: mark PTYPE 2 as reserved
mt76: mt7615: fix fixed-rate tx status reporting
net: fix mistake path for netdev_features_strings
net: ipa: Add missing of_node_put() in ipa_firmware_load()
net: sched: fix error return code in tcf_del_walker()
io_uring: fix false WARN_ONCE
drm/amdgpu: fix bad address translation for sienna_cichlid
drm/amdkfd: Walk through list with dqm lock hold
mt76: mt7915: fix IEEE80211_HE_PHY_CAP7_MAX_NC for station mode
rtl8xxxu: Fix device info for RTL8192EU devices
MIPS: add PMD table accounting into MIPS'pmd_alloc_one
net: fec: add ndo_select_queue to fix TX bandwidth fluctuations
atm: nicstar: use 'dma_free_coherent' instead of 'kfree'
atm: nicstar: register the interrupt handler in the right place
vsock: notify server to shutdown when client has pending signal
RDMA/rxe: Don't overwrite errno from ib_umem_get()
iwlwifi: mvm: don't change band on bound PHY contexts
iwlwifi: mvm: fix error print when session protection ends
iwlwifi: pcie: free IML DMA memory allocation
iwlwifi: pcie: fix context info freeing
sfc: avoid double pci_remove of VFs
sfc: error code if SRIOV cannot be disabled
wireless: wext-spy: Fix out-of-bounds warning
cfg80211: fix default HE tx bitrate mask in 2G band
mac80211: consider per-CPU statistics if present
mac80211_hwsim: add concurrent channels scanning support over virtio
IB/isert: Align target max I/O size to initiator size
media, bpf: Do not copy more entries than user space requested
net: ip: avoid OOM kills with large UDP sends over loopback
RDMA/cma: Fix rdma_resolve_route() memory leak
Bluetooth: btusb: Fixed too many in-token issue for Mediatek Chip.
Bluetooth: Fix the HCI to MGMT status conversion table
Bluetooth: Fix alt settings for incoming SCO with transparent coding format
Bluetooth: Shutdown controller after workqueues are flushed or cancelled
Bluetooth: btusb: Add a new QCA_ROME device (0cf3:e500)
Bluetooth: L2CAP: Fix invalid access if ECRED Reconfigure fails
Bluetooth: L2CAP: Fix invalid access on ECRED Connection response
Bluetooth: btusb: Add support USB ALT 3 for WBS
Bluetooth: mgmt: Fix the command returns garbage parameter value
Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc.
sched/fair: Ensure _sum and _avg values stay consistent
bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc()
flow_offload: action should not be NULL when it is referenced
sctp: validate from_addr_param return
sctp: add size validation when walking chunks
MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops
MIPS: set mips32r5 for virt extensions
selftests/resctrl: Fix incorrect parsing of option "-t"
MIPS: MT extensions are not available on MIPS32r1
ath11k: unlock on error path in ath11k_mac_op_add_interface()
arm64: dts: rockchip: add rk3328 dwc3 usb controller node
arm64: dts: rockchip: Enable USB3 for rk3328 Rock64
loop: fix I/O error on fsync() in detached loop devices
mm,hwpoison: return -EBUSY when migration fails
io_uring: simplify io_remove_personalities()
io_uring: Convert personality_idr to XArray
io_uring: convert io_buffer_idr to XArray
scsi: iscsi: Fix race condition between login and sync thread
scsi: iscsi: Fix iSCSI cls conn state
powerpc/mm: Fix lockup on kernel exec fault
powerpc/barrier: Avoid collision with clang's __lwsync macro
powerpc/powernv/vas: Release reference to tgid during window close
drm/amdgpu: Update NV SIMD-per-CU to 2
drm/amdgpu: enable sdma0 tmz for Raven/Renoir(V2)
drm/radeon: Add the missed drm_gem_object_put() in radeon_user_framebuffer_create()
drm/radeon: Call radeon_suspend_kms() in radeon_pci_shutdown() for Loongson64
drm/vc4: txp: Properly set the possible_crtcs mask
drm/vc4: crtc: Skip the TXP
drm/vc4: hdmi: Prevent clock unbalance
drm/dp: Handle zeroed port counts in drm_dp_read_downstream_info()
drm/rockchip: dsi: remove extra component_del() call
drm/amd/display: fix incorrrect valid irq check
pinctrl/amd: Add device HID for new AMD GPIO controller
drm/amd/display: Reject non-zero src_y and src_x for video planes
drm/tegra: Don't set allow_fb_modifiers explicitly
drm/msm/mdp4: Fix modifier support enabling
drm/arm/malidp: Always list modifiers
drm/nouveau: Don't set allow_fb_modifiers explicitly
drm/i915/display: Do not zero past infoframes.vsc
mmc: sdhci-acpi: Disable write protect detection on Toshiba Encore 2 WT8-B
mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode
mmc: core: clear flags before allowing to retune
mmc: core: Allow UHS-I voltage switch for SDSC cards if supported
ata: ahci_sunxi: Disable DIPM
arm64: tlb: fix the TTL value of tlb_get_level
cpu/hotplug: Cure the cpusets trainwreck
clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
fpga: stratix10-soc: Add missing fpga_mgr_free() call
ASoC: tegra: Set driver_name=tegra for all machine drivers
i40e: fix PTP on 5Gb links
qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute
ipmi/watchdog: Stop watchdog timer when the current action is 'none'
thermal/drivers/int340x/processor_thermal: Fix tcc setting
ubifs: Fix races between xattr_{set|get} and listxattr operations
power: supply: ab8500: Fix an old bug
mfd: syscon: Free the allocated name field of struct regmap_config
nvmem: core: add a missing of_node_put
lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE
selftests/lkdtm: Fix expected text for CR4 pinning
extcon: intel-mrfld: Sync hardware and software state on init
seq_buf: Fix overflow in seq_buf_putmem_hex()
rq-qos: fix missed wake-ups in rq_qos_throttle try two
tracing: Simplify & fix saved_tgids logic
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe
coresight: Propagate symlink failure
coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer()
dm zoned: check zone capacity
dm writecache: flush origin device when writing and cache is full
dm btree remove: assign new_root only when removal succeeds
PCI: Leave Apple Thunderbolt controllers on for s2idle or standby
PCI: aardvark: Fix checking for PIO Non-posted Request
PCI: aardvark: Implement workaround for the readback value of VEND_ID
media: subdev: disallow ioctl for saa6588/davinci
media: dtv5100: fix control-request directions
media: zr364xx: fix memory leak in zr364xx_start_readpipe
media: gspca/sq905: fix control-request direction
media: gspca/sunplus: fix zero-length control requests
media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
io_uring: fix clear IORING_SETUP_R_DISABLED in wrong function
dm writecache: write at least 4k when committing
pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq()
drm/ast: Remove reference to struct drm_device.pdev
jfs: fix GPF in diFree
smackfs: restrict bytes count in smk_set_cipso()
ext4: fix memory leak in ext4_fill_super
f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances
Linux 5.10.51
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Icb10fed733a0050848ecc23db13ae3d134895acd
commit 11c7aa0ddea8611007768d3e6b58d45dc60a19e1 upstream.
Commit 545fbd0775 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:
CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker)
rq_qos_wait() rq_qos_wait()
acquire_inflight_cb() -> fails
acquire_inflight_cb() -> fails
completes IOs, inflight
decreased
prepare_to_wait_exclusive()
prepare_to_wait_exclusive()
has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
has_sleeper = !wq_has_single_sleeper() -> true
io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.
Fixes: 545fbd0775 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c5089591c3ba ("block, bfq: detect wakers and
unconditionally inject their I/O"), when the in-service bfq_queue, say
Q, is temporarily empty, BFQ checks whether there are I/O requests to
inject (also) from the waker bfq_queue for Q. To this goal, the value
pointed by bfqq->waker_bfqq->next_rq must be controlled. However, the
current implementation mistakenly looks at bfqq->next_rq, which
instead points to the next request of the currently served queue.
This mistake evidently causes losses of throughput in scenarios with
waker bfq_queues.
This commit corrects this mistake.
Fixes: c5089591c3ba ("block, bfq: detect wakers and unconditionally inject their I/O")
Signed-off-by: Jia Cheng Hu <jia.jiachenghu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 193744965
Change-Id: I2b31773e1f67e4828e33a32acfa29b59809be7e0
(cherry picked from commit d4fc3640ff361a09e359867e0bca898abd2b7ecb)
Signed-off-by: Ed Tsai <ed.tsai@mediatek.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDu+1UACgkQONu9yGCS
aT7jQRAAuLDi7ejk3JUameYFMzVXGAUE6yPs392/lWJzey7IBf+2uLqz4FzqqUHp
U1GkEKJVaCacEfi0+rpi7BxNFljUdZdg/F/P68ARtAWPvwqAeJ4QIh5u3A682UUO
1M5h6e5/oY9F4kQIb5Kot04avqOeR6lTqrkA8jeP5h43ngyLWuS2d+5oOGmbCukS
UgEaCC6CiKjcN51UUTj/fXMQ0X4IDHP5pD8rWwH0IvK0i7gduvk744un8LVB6aW1
rNV88C3BEFFtkPQh2XySnXM5Ok8kYlhFoTDsqlpeAX7pA8hiUPYBoRzTg0MJtPZn
N1L/Yqhvxmn5xs9HAw7mDOo8E8NWXzsT5FvZVaBeiCgtdKmcPszylXqmSt1oiOb0
/EmkCWmlbG/3qWql24+LU4XP36iVPx32HQxAgg2XbnlNU5o0E1y2F98p6p/3JSWX
NAjHtmg/MxueFQ+w8bDzhO8YzYn1dIU3V3qaXRvtpODrmaSYW+bwCyPtSjXe3/vL
604zb3dOg9+tD/gKqfRb/UPMu24nNll8M/gnSRci05/thmIxwtYudPwoLNSejDqr
e+a8vejISfIyp41XrpYQbUeKs1WOA+A7vgx6CZrT791afiT+6UgC/ecQfg1NFxhs
8ayWpocaIszxyXxVGro1rfwZeQmTlbTCZ5wVdpn9sDPZfI7epts=
=FCrA
-----END PGP SIGNATURE-----
Merge 5.10.50 into android12-5.10-lts
Changes in 5.10.50
Bluetooth: hci_qca: fix potential GPF
Bluetooth: btqca: Don't modify firmware contents in-place
Bluetooth: Remove spurious error message
ALSA: usb-audio: fix rate on Ozone Z90 USB headset
ALSA: usb-audio: Fix OOB access at proc output
ALSA: firewire-motu: fix stream format for MOTU 8pre FireWire
ALSA: usb-audio: scarlett2: Fix wrong resume call
ALSA: intel8x0: Fix breakage at ac97 clock measurement
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
ALSA: hda/realtek: Add another ALC236 variant support
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook x360 830 G8
ALSA: hda/realtek: Improve fixup for HP Spectre x360 15-df0xxx
ALSA: hda/realtek: Fix bass speaker DAC mapping for Asus UM431D
ALSA: hda/realtek: Apply LED fixup for HP Dragonfly G1, too
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
media: dvb-usb: fix wrong definition
Input: usbtouchscreen - fix control-request directions
net: can: ems_usb: fix use-after-free in ems_usb_disconnect()
usb: gadget: eem: fix echo command packet response issue
usb: renesas-xhci: Fix handling of unknown ROM state
USB: cdc-acm: blacklist Heimann USB Appset device
usb: dwc3: Fix debugfs creation flow
usb: typec: Add the missed altmode_id_remove() in typec_register_altmode()
xhci: solve a double free problem while doing s4
gfs2: Fix underflow in gfs2_page_mkwrite
gfs2: Fix error handling in init_statfs
ntfs: fix validity check for file name attribute
selftests/lkdtm: Avoid needing explicit sub-shell
copy_page_to_iter(): fix ITER_DISCARD case
iov_iter_fault_in_readable() should do nothing in xarray case
Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl
crypto: nx - Fix memcpy() over-reading in nonce
crypto: ccp - Annotate SEV Firmware file names
arm_pmu: Fix write counter incorrect in ARMv7 big-endian mode
ARM: dts: ux500: Fix LED probing
ARM: dts: at91: sama5d4: fix pinctrl muxing
btrfs: send: fix invalid path for unlink operations after parent orphanization
btrfs: compression: don't try to compress if we don't have enough pages
btrfs: clear defrag status of a root if starting transaction fails
ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle
ext4: fix kernel infoleak via ext4_extent_header
ext4: fix overflow in ext4_iomap_alloc()
ext4: return error code when ext4_fill_flex_info() fails
ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
ext4: remove check for zero nr_to_scan in ext4_es_scan()
ext4: fix avefreec in find_group_orlov
ext4: use ext4_grp_locked_error in mb_find_extent
can: bcm: delay release of struct bcm_op after synchronize_rcu()
can: gw: synchronize rcu operations before removing gw job entry
can: isotp: isotp_release(): omit unintended hrtimer restart on socket release
can: j1939: j1939_sk_init(): set SOCK_RCU_FREE to call sk_destruct() after RCU is done
can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path
mac80211: remove iwlwifi specific workaround that broke sta NDP tx
SUNRPC: Fix the batch tasks count wraparound.
SUNRPC: Should wake up the privileged task firstly.
bus: mhi: Wait for M2 state during system resume
mm/gup: fix try_grab_compound_head() race with split_huge_page()
perf/smmuv3: Don't trample existing events with global filter
KVM: nVMX: Handle split-lock #AC exceptions that happen in L2
KVM: PPC: Book3S HV: Workaround high stack usage with clang
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk
s390/cio: dont call css_wait_for_slow_path() inside a lock
s390: mm: Fix secure storage access exception handling
f2fs: Prevent swap file in LFS mode
clk: agilex/stratix10/n5x: fix how the bypass_reg is handled
clk: agilex/stratix10: remove noc_clk
clk: agilex/stratix10: fix bypass representation
rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path
iio: frequency: adf4350: disable reg and clk on error in adf4350_probe()
iio: light: tcs3472: do not free unallocated IRQ
iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too
iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR
iio: ltr501: ltr501_read_ps(): add missing endianness conversion
iio: accel: bma180: Fix BMA25x bandwidth register values
serial: mvebu-uart: fix calculation of clock divisor
serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
serial_cs: Add Option International GSM-Ready 56K/ISDN modem
serial_cs: remove wrong GLOBETROTTER.cis entry
ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()
ssb: sdio: Don't overwrite const buffer if block_write fails
rsi: Assign beacon rate settings to the correct rate_info descriptor field
rsi: fix AP mode with WPA failure due to encrypted EAPOL
tracing/histograms: Fix parsing of "sym-offset" modifier
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
loop: Fix missing discard support when using LOOP_CONFIGURE
evm: Execute evm_inode_init_security() only when an HMAC key is loaded
evm: Refuse EVM_ALLOW_METADATA_WRITES only if an HMAC key is loaded
fuse: Fix crash in fuse_dentry_automount() error path
fuse: Fix crash if superblock of submount gets killed early
fuse: Fix infinite loop in sget_fc()
fuse: ignore PG_workingset after stealing
fuse: check connected before queueing on fpq->io
fuse: reject internal errno
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
spi: Make of_register_spi_device also set the fwnode
Add a reference to ucounts for each cred
staging: media: rkvdec: fix pm_runtime_get_sync() usage count
media: marvel-ccic: fix some issues when getting pm_runtime
media: mdk-mdp: fix pm_runtime_get_sync() usage count
media: s5p: fix pm_runtime_get_sync() usage count
media: am437x: fix pm_runtime_get_sync() usage count
media: sh_vou: fix pm_runtime_get_sync() usage count
media: mtk-vcodec: fix PM runtime get logic
media: s5p-jpeg: fix pm_runtime_get_sync() usage count
media: sunxi: fix pm_runtime_get_sync() usage count
media: sti/bdisp: fix pm_runtime_get_sync() usage count
media: exynos4-is: fix pm_runtime_get_sync() usage count
media: exynos-gsc: fix pm_runtime_get_sync() usage count
spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf'
spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages()
spi: omap-100k: Fix the length judgment problem
regulator: uniphier: Add missing MODULE_DEVICE_TABLE
sched/core: Initialize the idle task with preemption disabled
hwrng: exynos - Fix runtime PM imbalance on error
crypto: nx - add missing MODULE_DEVICE_TABLE
media: sti: fix obj-$(config) targets
media: cpia2: fix memory leak in cpia2_usb_probe
media: cobalt: fix race condition in setting HPD
media: hevc: Fix dependent slice segment flags
media: pvrusb2: fix warning in pvr2_i2c_core_done
media: imx: imx7_mipi_csis: Fix logging of only error event counters
crypto: qat - check return code of qat_hal_rd_rel_reg()
crypto: qat - remove unused macro in FW loader
crypto: qce: skcipher: Fix incorrect sg count for dma transfers
arm64: perf: Convert snprintf to sysfs_emit
sched/fair: Fix ascii art by relpacing tabs
media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off
media: bt878: do not schedule tasklet when it is not setup
media: em28xx: Fix possible memory leak of em28xx struct
media: hantro: Fix .buf_prepare
media: cedrus: Fix .buf_prepare
media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release
media: bt8xx: Fix a missing check bug in bt878_probe
media: st-hva: Fix potential NULL pointer dereferences
crypto: hisilicon/sec - fixup 3des minimum key size declaration
Makefile: fix GDB warning with CONFIG_RELR
media: dvd_usb: memory leak in cinergyt2_fe_attach
memstick: rtsx_usb_ms: fix UAF
mmc: sdhci-sprd: use sdhci_sprd_writew
mmc: via-sdmmc: add a check against NULL pointer dereference
spi: meson-spicc: fix a wrong goto jump for avoiding memory leak.
spi: meson-spicc: fix memory leak in meson_spicc_probe
crypto: shash - avoid comparing pointers to exported functions under CFI
media: dvb_net: avoid speculation from net slot
media: siano: fix device register error path
media: imx-csi: Skip first few frames from a BT.656 source
hwmon: (max31790) Report correct current pwm duty cycles
hwmon: (max31790) Fix pwmX_enable attributes
drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe()
KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors
btrfs: fix error handling in __btrfs_update_delayed_inode
btrfs: abort transaction if we fail to update the delayed inode
btrfs: sysfs: fix format string for some discard stats
btrfs: don't clear page extent mapped if we're not invalidating the full page
btrfs: disable build on platforms having page size 256K
locking/lockdep: Fix the dep path printing for backwards BFS
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
KVM: s390: get rid of register asm usage
regulator: mt6358: Fix vdram2 .vsel_mask
regulator: da9052: Ensure enough delay time for .set_voltage_time_sel
media: Fix Media Controller API config checks
ACPI: video: use native backlight for GA401/GA502/GA503
HID: do not use down_interruptible() when unbinding devices
EDAC/ti: Add missing MODULE_DEVICE_TABLE
ACPI: processor idle: Fix up C-state latency if not ordered
hv_utils: Fix passing zero to 'PTR_ERR' warning
lib: vsprintf: Fix handling of number field widths in vsscanf
Input: goodix - platform/x86: touchscreen_dmi - Move upside down quirks to touchscreen_dmi.c
platform/x86: touchscreen_dmi: Add an extra entry for the upside down Goodix touchscreen on Teclast X89 tablets
platform/x86: touchscreen_dmi: Add info for the Goodix GT912 panel of TM800A550L tablets
ACPI: EC: Make more Asus laptops use ECDT _GPE
block_dump: remove block_dump feature in mark_inode_dirty()
blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
blk-mq: clear stale request in tags->rq[] before freeing one request pool
fs: dlm: cancel work sync othercon
random32: Fix implicit truncation warning in prandom_seed_state()
open: don't silently ignore unknown O-flags in openat2()
drivers: hv: Fix missing error code in vmbus_connect()
fs: dlm: fix memory leak when fenced
ACPICA: Fix memory leak caused by _CID repair function
ACPI: bus: Call kobject_put() in acpi_init() error path
ACPI: resources: Add checks for ACPI IRQ override
block: fix race between adding/removing rq qos and normal IO
platform/x86: asus-nb-wmi: Revert "Drop duplicate DMI quirk structures"
platform/x86: asus-nb-wmi: Revert "add support for ASUS ROG Zephyrus G14 and G15"
platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard()
nvme-pci: fix var. type for increasing cq_head
nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()
EDAC/Intel: Do not load EDAC driver when running as a guest
PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv()
cifs: improve fallocate emulation
ACPI: EC: trust DSDT GPE for certain HP laptop
clocksource: Retry clock read if long delays detected
clocksource: Check per-CPU clock synchronization when marked unstable
tpm_tis_spi: add missing SPI device ID entries
ACPI: tables: Add custom DSDT file as makefile prerequisite
HID: wacom: Correct base usage for capacitive ExpressKey status bits
cifs: fix missing spinlock around update to ses->status
mailbox: qcom: Use PLATFORM_DEVID_AUTO to register platform device
block: fix discard request merge
kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
ia64: mca_drv: fix incorrect array size calculation
writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
spi: Allow to have all native CSs in use along with GPIOs
spi: Avoid undefined behaviour when counting unused native CSs
media: venus: Rework error fail recover logic
media: s5p_cec: decrement usage count if disabled
media: hantro: do a PM resume earlier
crypto: ixp4xx - dma_unmap the correct address
crypto: ixp4xx - update IV after requests
crypto: ux500 - Fix error return code in hash_hw_final()
sata_highbank: fix deferred probing
pata_rb532_cf: fix deferred probing
media: I2C: change 'RST' to "RSET" to fix multiple build errors
sched/uclamp: Fix wrong implementation of cpu.uclamp.min
sched/uclamp: Fix locking around cpu_util_update_eff()
kbuild: Fix objtool dependency for 'OBJECT_FILES_NON_STANDARD_<obj> := n'
pata_octeon_cf: avoid WARN_ON() in ata_host_activate()
evm: fix writing <securityfs>/evm overflow
x86/elf: Use _BITUL() macro in UAPI headers
crypto: sa2ul - Fix leaks on failure paths with sa_dma_init()
crypto: sa2ul - Fix pm_runtime enable in sa_ul_probe()
crypto: ccp - Fix a resource leak in an error handling path
media: rc: i2c: Fix an error message
pata_ep93xx: fix deferred probing
locking/lockdep: Reduce LOCKDEP dependency list
media: rkvdec: Fix .buf_prepare
media: exynos4-is: Fix a use after free in isp_video_release
media: au0828: fix a NULL vs IS_ERR() check
media: tc358743: Fix error return code in tc358743_probe_of()
media: gspca/gl860: fix zero-length control requests
m68k: atari: Fix ATARI_KBD_CORE kconfig unmet dependency warning
media: siano: Fix out-of-bounds warnings in smscore_load_firmware_family2()
regulator: fan53880: Fix vsel_mask setting for FAN53880_BUCK
crypto: nitrox - fix unchecked variable in nitrox_register_interrupts
crypto: omap-sham - Fix PM reference leak in omap sham ops
crypto: x86/curve25519 - fix cpu feature checking logic in mod_exit
crypto: sm2 - remove unnecessary reset operations
crypto: sm2 - fix a memory leak in sm2
mmc: usdhi6rol0: fix error return code in usdhi6_probe()
arm64: consistently use reserved_pg_dir
arm64/mm: Fix ttbr0 values stored in struct thread_info for software-pan
media: subdev: remove VIDIOC_DQEVENT_TIME32 handling
media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx
hwmon: (lm70) Use device_get_match_data()
hwmon: (lm70) Revert "hwmon: (lm70) Add support for ACPI"
hwmon: (max31722) Remove non-standard ACPI device IDs
hwmon: (max31790) Fix fan speed reporting for fan7..12
KVM: nVMX: Sync all PGDs on nested transition with shadow paging
KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number
KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
regulator: hi655x: Fix pass wrong pointer to config.driver_data
btrfs: clear log tree recovering status if starting transaction fails
x86/sev: Make sure IRQs are disabled while GHCB is active
x86/sev: Split up runtime #VC handler for correct state tracking
sched/rt: Fix RT utilization tracking during policy change
sched/rt: Fix Deadline utilization tracking during policy change
sched/uclamp: Fix uclamp_tg_restrict()
lockdep: Fix wait-type for empty stack
lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
spi: spi-sun6i: Fix chipselect/clock bug
crypto: nx - Fix RCU warning in nx842_OF_upd_status
psi: Fix race between psi_trigger_create/destroy
media: v4l2-async: Clean v4l2_async_notifier_add_fwnode_remote_subdev
media: video-mux: Skip dangling endpoints
PM / devfreq: Add missing error code in devfreq_add_device()
ACPI: PM / fan: Put fan device IDs into separate header file
block: avoid double io accounting for flush request
nvme-pci: look for StorageD3Enable on companion ACPI device instead
ACPI: sysfs: Fix a buffer overrun problem with description_show()
mark pstore-blk as broken
clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
extcon: extcon-max8997: Fix IRQ freeing at error path
ACPI: APEI: fix synchronous external aborts in user-mode
blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
blk-wbt: make sure throttle is enabled properly
ACPI: Use DEVICE_ATTR_<RW|RO|WO> macros
ACPI: bgrt: Fix CFI violation
cpufreq: Make cpufreq_online() call driver->offline() on errors
blk-mq: update hctx->dispatch_busy in case of real scheduler
ocfs2: fix snprintf() checking
dax: fix ENOMEM handling in grab_mapping_entry()
mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect
mm/debug_vm_pgtable/basic: iterate over entire protection_map[]
mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage()
swap: fix do_swap_page() race with swapoff
mm/shmem: fix shmem_swapin() race with swapoff
mm: memcg/slab: properly set up gfp flags for objcg pointer array
mm: page_alloc: refactor setup_per_zone_lowmem_reserve()
mm/page_alloc: fix counting of managed_pages
xfrm: xfrm_state_mtu should return at least 1280 for ipv6
drm/bridge/sii8620: fix dependency on extcon
drm/bridge: Fix the stop condition of drm_bridge_chain_pre_enable()
drm/amd/dc: Fix a missing check bug in dm_dp_mst_detect()
drm/ast: Fix missing conversions to managed API
video: fbdev: imxfb: Fix an error message
net: mvpp2: Put fwnode in error case during ->probe()
net: pch_gbe: Propagate error from devm_gpio_request_one()
pinctrl: renesas: r8a7796: Add missing bias for PRESET# pin
pinctrl: renesas: r8a77990: JTAG pins do not have pull-down capabilities
drm/vmwgfx: Mark a surface gpu-dirty after the SVGA3dCmdDXGenMips command
drm/vmwgfx: Fix cpu updates of coherent multisample surfaces
net: qrtr: ns: Fix error return code in qrtr_ns_init()
clk: meson: g12a: fix gp0 and hifi ranges
net: ftgmac100: add missing error return code in ftgmac100_probe()
drm: rockchip: set alpha_en to 0 if it is not used
drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write()
drm/rockchip: dsi: move all lane config except LCDC mux to bind()
drm/rockchip: lvds: Fix an error handling path
drm/rockchip: cdn-dp: fix sign extension on an int multiply for a u64 result
mptcp: fix pr_debug in mptcp_token_new_connect
mptcp: generate subflow hmac after mptcp_finish_join()
RDMA/srp: Fix a recently introduced memory leak
RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
RDMA/rtrs: Do not reset hb_missed_max after re-connection
RDMA/rtrs-srv: Fix memory leak of unfreed rtrs_srv_stats object
RDMA/rtrs-srv: Fix memory leak when having multiple sessions
RDMA/rtrs-clt: Check if the queue_depth has changed during a reconnection
RDMA/rtrs-clt: Fix memory leak of not-freed sess->stats and stats->pcpu_stats
ehea: fix error return code in ehea_restart_qps()
clk: tegra30: Use 300MHz for video decoder by default
xfrm: remove the fragment check for ipv6 beet mode
net/sched: act_vlan: Fix modify to allow 0
RDMA/core: Sanitize WQ state received from the userspace
drm/pl111: depend on CONFIG_VEXPRESS_CONFIG
RDMA/rxe: Fix failure during driver load
drm/pl111: Actually fix CONFIG_VEXPRESS_CONFIG depends
drm/vc4: hdmi: Fix error path of hpd-gpios
clk: vc5: fix output disabling when enabling a FOD
drm: qxl: ensure surf.data is ininitialized
tools/bpftool: Fix error return code in do_batch()
ath10k: go to path err_unsupported when chip id is not supported
ath10k: add missing error return code in ath10k_pci_probe()
wireless: carl9170: fix LEDS build errors & warnings
ieee802154: hwsim: Fix possible memory leak in hwsim_subscribe_all_others
clk: imx8mq: remove SYS PLL 1/2 clock gates
wcn36xx: Move hal_buf allocation to devm_kmalloc in probe
ssb: Fix error return code in ssb_bus_scan()
brcmfmac: fix setting of station info chains bitmask
brcmfmac: correctly report average RSSI in station info
brcmfmac: Fix a double-free in brcmf_sdio_bus_reset
brcmsmac: mac80211_if: Fix a resource leak in an error handling path
cw1200: Revert unnecessary patches that fix unreal use-after-free bugs
ath11k: Fix an error handling path in ath11k_core_fetch_board_data_api_n()
ath10k: Fix an error code in ath10k_add_interface()
ath11k: send beacon template after vdev_start/restart during csa
netlabel: Fix memory leak in netlbl_mgmt_add_common
RDMA/mlx5: Don't add slave port to unaffiliated list
netfilter: nft_exthdr: check for IPv6 packet before further processing
netfilter: nft_osf: check for TCP packet before further processing
netfilter: nft_tproxy: restrict support to TCP and UDP transport protocols
RDMA/rxe: Fix qp reference counting for atomic ops
selftests/bpf: Whitelist test_progs.h from .gitignore
xsk: Fix missing validation for skb and unaligned mode
xsk: Fix broken Tx ring validation
bpf: Fix libelf endian handling in resolv_btfids
RDMA/rtrs-srv: Set minimal max_send_wr and max_recv_wr
samples/bpf: Fix Segmentation fault for xdp_redirect command
samples/bpf: Fix the error return code of xdp_redirect's main()
mt76: fix possible NULL pointer dereference in mt76_tx
mt76: mt7615: fix NULL pointer dereference in tx_prepare_skb()
net: ethernet: aeroflex: fix UAF in greth_of_remove
net: ethernet: ezchip: fix UAF in nps_enet_remove
net: ethernet: ezchip: fix error handling
vrf: do not push non-ND strict packets with a source LLA through packet taps again
net: sched: add barrier to ensure correct ordering for lockless qdisc
tls: prevent oversized sendfile() hangs by ignoring MSG_MORE
netfilter: nf_tables_offload: check FLOW_DISSECTOR_KEY_BASIC in VLAN transfer logic
pkt_sched: sch_qfq: fix qfq_change_class() error path
xfrm: Fix xfrm offload fallback fail case
iwlwifi: increase PNVM load timeout
rtw88: 8822c: fix lc calibration timing
vxlan: add missing rcu_read_lock() in neigh_reduce()
ip6_tunnel: fix GRE6 segmentation
net/ipv4: swap flow ports when validating source
net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues
tc-testing: fix list handling
ieee802154: hwsim: Fix memory leak in hwsim_add_one
ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl()
bpf: Fix null ptr deref with mixed tail calls and subprogs
drm/msm: Fix error return code in msm_drm_init()
drm/msm/dpu: Fix error return code in dpu_mdss_init()
mac80211: remove iwlwifi specific workaround NDPs of null_response
net: bcmgenet: Fix attaching to PYH failed on RPi 4B
ipv6: exthdrs: do not blindly use init_net
can: j1939: j1939_sk_setsockopt(): prevent allocation of j1939 filter for optlen == 0
bpf: Do not change gso_size during bpf_skb_change_proto()
i40e: Fix error handling in i40e_vsi_open
i40e: Fix autoneg disabling for non-10GBaseT links
i40e: Fix missing rtnl locking when setting up pf switch
Revert "ibmvnic: remove duplicate napi_schedule call in open function"
ibmvnic: set ltb->buff to NULL after freeing
ibmvnic: free tx_pool if tso_pool alloc fails
RDMA/cma: Protect RMW with qp_mutex
net: macsec: fix the length used to copy the key for offloading
net: phy: mscc: fix macsec key length
net: atlantic: fix the macsec key length
ipv6: fix out-of-bound access in ip6_parse_tlv()
e1000e: Check the PCIm state
net: dsa: sja1105: fix NULL pointer dereference in sja1105_reload_cbs()
bpfilter: Specify the log level for the kmsg message
RDMA/cma: Fix incorrect Packet Lifetime calculation
gve: Fix swapped vars when fetching max queues
Revert "be2net: disable bh with spin_lock in be_process_mcc"
Bluetooth: mgmt: Fix slab-out-of-bounds in tlv_data_is_valid
Bluetooth: Fix not sending Set Extended Scan Response
Bluetooth: Fix Set Extended (Scan Response) Data
Bluetooth: Fix handling of HCI_LE_Advertising_Set_Terminated event
clk: actions: Fix UART clock dividers on Owl S500 SoC
clk: actions: Fix SD clocks factor table on Owl S500 SoC
clk: actions: Fix bisp_factor_table based clocks on Owl S500 SoC
clk: actions: Fix AHPPREDIV-H-AHB clock chain on Owl S500 SoC
clk: qcom: clk-alpha-pll: fix CAL_L write in alpha_pll_fabia_prepare
clk: si5341: Wait for DEVICE_READY on startup
clk: si5341: Avoid divide errors due to bogus register contents
clk: si5341: Check for input clock presence and PLL lock on startup
clk: si5341: Update initialization magic
writeback: fix obtain a reference to a freeing memcg css
net: lwtunnel: handle MTU calculation in forwading
net: sched: fix warning in tcindex_alloc_perfect_hash
net: tipc: fix FB_MTU eat two pages
RDMA/mlx5: Don't access NULL-cleared mpi pointer
RDMA/core: Always release restrack object
MIPS: Fix PKMAP with 32-bit MIPS huge page support
staging: fbtft: Rectify GPIO handling
staging: fbtft: Don't spam logs when probe is deferred
ASoC: rt5682: Disable irq on shutdown
rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread()
serial: fsl_lpuart: don't modify arbitrary data on lpuart32
serial: fsl_lpuart: remove RTSCTS handling from get_mctrl()
serial: 8250_omap: fix a timeout loop condition
tty: nozomi: Fix a resource leak in an error handling function
mwifiex: re-fix for unaligned accesses
iio: adis_buffer: do not return ints in irq handlers
iio: adis16400: do not return ints in irq handlers
iio: adis16475: do not return ints in irq handlers
iio: accel: bma180: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: bma220: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: hid: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: kxcjk-1013: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: mxc4005: Fix overread of data and alignment issue.
iio: accel: stk8312: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: stk8ba50: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: ti-ads1015: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: vf610: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: gyro: bmg160: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: humidity: am2315: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: srf08: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: pulsed-light: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: as3935: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: magn: hmc5843: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: magn: bmc150: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: isl29125: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: tcs3414: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: tcs3472: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: chemical: atlas: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: cros_ec_sensors: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
iio: potentiostat: lmp91000: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
ASoC: rk3328: fix missing clk_disable_unprepare() on error in rk3328_platform_probe()
ASoC: hisilicon: fix missing clk_disable_unprepare() on error in hi6210_i2s_startup()
backlight: lm3630a_bl: Put fwnode in error case during ->probe()
ASoC: rsnd: tidyup loop on rsnd_adg_clk_query()
Input: hil_kbd - fix error return code in hil_dev_connect()
perf scripting python: Fix tuple_set_u64()
mtd: partitions: redboot: seek fis-index-block in the right node
mtd: rawnand: arasan: Ensure proper configuration for the asserted target
staging: mmal-vchiq: Fix incorrect static vchiq_instance.
char: pcmcia: error out if 'num_bytes_read' is greater than 4 in set_protocol()
firmware: stratix10-svc: Fix a resource leak in an error handling path
tty: nozomi: Fix the error handling path of 'nozomi_card_init()'
leds: class: The -ENOTSUPP should never be seen by user space
leds: lm3532: select regmap I2C API
leds: lm36274: Put fwnode in error case during ->probe()
leds: lm3692x: Put fwnode in any case during ->probe()
leds: lm3697: Don't spam logs when probe is deferred
leds: lp50xx: Put fwnode in error case during ->probe()
scsi: FlashPoint: Rename si_flags field
scsi: iscsi: Flush block work before unblock
mfd: mp2629: Select MFD_CORE to fix build error
mfd: rn5t618: Fix IRQ trigger by changing it to level mode
fsi: core: Fix return of error values on failures
fsi: scom: Reset the FSI2PIB engine for any error
fsi: occ: Don't accept response from un-initialized OCC
fsi/sbefifo: Clean up correct FIFO when receiving reset request from SBE
fsi/sbefifo: Fix reset timeout
visorbus: fix error return code in visorchipset_init()
iommu/amd: Fix extended features logging
s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK
s390: enable HAVE_IOREMAP_PROT
s390: appldata depends on PROC_SYSCTL
selftests: splice: Adjust for handler fallback removal
iommu/dma: Fix IOVA reserve dma ranges
ASoC: max98373-sdw: use first_hw_init flag on resume
ASoC: rt1308-sdw: use first_hw_init flag on resume
ASoC: rt5682-sdw: use first_hw_init flag on resume
ASoC: rt700-sdw: use first_hw_init flag on resume
ASoC: rt711-sdw: use first_hw_init flag on resume
ASoC: rt715-sdw: use first_hw_init flag on resume
ASoC: rt5682: fix getting the wrong device id when the suspend_stress_test
ASoC: rt5682-sdw: set regcache_cache_only false before reading RT5682_DEVICE_ID
ASoC: mediatek: mtk-btcvsd: Fix an error handling path in 'mtk_btcvsd_snd_probe()'
usb: gadget: f_fs: Fix setting of device and driver data cross-references
usb: dwc2: Don't reset the core after setting turnaround time
eeprom: idt_89hpesx: Put fwnode in matching case during ->probe()
eeprom: idt_89hpesx: Restore printing the unsupported fwnode name
thunderbolt: Bond lanes only when dual_link_port != NULL in alloc_dev_default()
iio: adc: at91-sama5d2: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: hx711: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: mxs-lradc: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: ti-ads8688: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
iio: magn: rm3100: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
iio: light: vcnl4000: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
ASoC: fsl_spdif: Fix error handler with pm_runtime_enable
staging: gdm724x: check for buffer overflow in gdm_lte_multi_sdu_pkt()
staging: gdm724x: check for overflow in gdm_lte_netif_rx()
staging: rtl8712: fix error handling in r871xu_drv_init
staging: rtl8712: fix memory leak in rtl871x_load_fw_cb
coresight: core: Fix use of uninitialized pointer
staging: mt7621-dts: fix pci address for PCI memory range
serial: 8250: Actually allow UPF_MAGIC_MULTIPLIER baud rates
iio: light: vcnl4035: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: isl29501: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
ASoC: cs42l42: Correct definition of CS42L42_ADC_PDN_MASK
of: Fix truncation of memory sizes on 32-bit platforms
mtd: rawnand: marvell: add missing clk_disable_unprepare() on error in marvell_nfc_resume()
habanalabs: Fix an error handling path in 'hl_pci_probe()'
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
soundwire: stream: Fix test for DP prepare complete
phy: uniphier-pcie: Fix updating phy parameters
phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe()
extcon: sm5502: Drop invalid register write in sm5502_reg_data
extcon: max8997: Add missing modalias string
powerpc/powernv: Fix machine check reporting of async store errors
ASoC: atmel-i2s: Fix usage of capture and playback at the same time
configfs: fix memleak in configfs_release_bin_file
ASoC: Intel: sof_sdw: add SOF_RT715_DAI_ID_FIX for AlderLake
ASoC: fsl_spdif: Fix unexpected interrupt after suspend
leds: as3645a: Fix error return code in as3645a_parse_node()
leds: ktd2692: Fix an error handling path
selftests/ftrace: fix event-no-pid on 1-core machine
serial: 8250: 8250_omap: Disable RX interrupt after DMA enable
serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
powerpc: Offline CPU in stop_this_cpu()
powerpc/papr_scm: Properly handle UUID types and API
powerpc/64s: Fix copy-paste data exposure into newly created tasks
powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable
ALSA: firewire-lib: Fix 'amdtp_domain_start()' when no AMDTP_OUT_STREAM stream is found
serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
serial: mvebu-uart: correctly calculate minimal possible baudrate
arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
vfio/pci: Handle concurrent vma faults
mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled
mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
mm/huge_memory.c: don't discard hugepage if other processes are mapping it
mm/hugetlb: use helper huge_page_order and pages_per_huge_page
mm/hugetlb: remove redundant check in preparing and destroying gigantic page
hugetlb: remove prep_compound_huge_page cleanup
include/linux/huge_mm.h: remove extern keyword
mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page
lib/math/rational.c: fix divide by zero
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: refill shadow register after implicit kernel write
perf llvm: Return -ENOMEM when asprintf() fails
csky: fix syscache.c fallthrough warning
csky: syscache: Fixup duplicate cache flush
exfat: handle wrong stream entry size in exfat_readdir()
scsi: fc: Correct RHBA attributes length
scsi: target: cxgbit: Unmap DMA buffer before calling target_execute_cmd()
mailbox: qcom-ipcc: Fix IPCC mbox channel exhaustion
fscrypt: don't ignore minor_hash when hash is 0
fscrypt: fix derivation of SipHash keys on big endian CPUs
tpm: Replace WARN_ONCE() with dev_err_once() in tpm_tis_status()
erofs: fix error return code in erofs_read_superblock()
block: return the correct bvec when checking for gaps
io_uring: fix blocking inline submission
mmc: block: Disable CMDQ on the ioctl path
mmc: vub3000: fix control-request direction
media: exynos4-is: remove a now unused integer
scsi: core: Retry I/O for Notify (Enable Spinup) Required error
crypto: qce - fix error return code in qce_skcipher_async_req_handle()
s390: preempt: Fix preempt_count initialization
cred: add missing return error code when set_cred_ucounts() failed
iommu/dma: Fix compile warning in 32-bit builds
powerpc/preempt: Don't touch the idle task's preempt_count during hotplug
Linux 5.10.50
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec4eab24ea8eb5a6d79739a1aec8432d93a8f82c
[ Upstream commit cb9516be7708a2a18ec0a19fe3a225b5b3bc92c7 ]
Commit 6e6fcbc27e ("blk-mq: support batching dispatch in case of io")
starts to support io batching submission by using hctx->dispatch_busy.
However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy
in that commit, so fix the issue by updating hctx->dispatch_busy in case
of real scheduler.
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Fixes: 6e6fcbc27e ("blk-mq: support batching dispatch in case of io")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210625020248.1630497-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76a8040817b4b9c69b53f9b326987fa891b4082a ]
After commit a79050434b ("blk-rq-qos: refactor out common elements of
blk-wbt"), if throttle was disabled by wbt_disable_default(), we could
not enable again, fix this by set enable_state back to
WBT_STATE_ON_DEFAULT.
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210619093700.920393-3-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d0903d61e9645c6330b94247b96dd873dfc11c8 ]
Now that we disable wbt by simply zero out rwb->wb_normal in
wbt_disable_default() when switch elevator to bfq, but it's not safe
because it will become false positive if we change queue depth. If it
become false positive between wbt_wait() and wbt_track() when submit
write request, it will lead to drop rqw->inflight to -1 in wbt_done(),
which will end up trigger IO hung. Fix this issue by introduce a new
state which mean the wbt was disabled.
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210619093700.920393-2-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84da7acc3ba53af26f15c4b0ada446127b7a7836 ]
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Fixes: b686631865 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2705dfb2094777e405e065105e307074af8965c1 ]
ll_new_hw_segment() is reached only in case of single range discard
merge, and we don't have max discard segment size limit actually, so
it is wrong to run the following check:
if (req->nr_phys_segments + nr_phys_segs > blk_rq_get_max_segments(req))
it may be always false since req->nr_phys_segments is initialized as
one, and bio's segment count is still 1, blk_rq_get_max_segments(reg)
is 1 too.
Fix the issue by not doing the check and bypassing the calculation of
discard request's nr_phys_segments.
Based on analysis from Wang Shanker.
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Wang Shanker <shankerwangmiao@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210628023312.1903255-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cafe29a8d03f02a3d16193bdaae2f3e82a423f9 ]
Yi reported several kernel panics on:
[16687.001777] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
...
[16687.163549] pc : __rq_qos_track+0x38/0x60
or
[ 997.690455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
[ 997.850347] pc : __rq_qos_done+0x2c/0x50
Turns out it is caused by race between adding rq qos(wbt) and normal IO
because rq_qos_add can be run when IO is being submitted, fix this issue
by freezing queue before adding/deleting rq qos to queue.
rq_qos_exit() needn't to freeze queue because it is called after queue
has been frozen.
iolatency calls rq_qos_add() during allocating queue, so freezing won't
add delay because queue usage refcount works at atomic mode at that
time.
iocost calls rq_qos_add() when writing cgroup attribute file, that is
fine to freeze queue at that time since we usually freeze queue when
storing to queue sysfs attribute, meantime iocost only exists on the
root cgroup.
wbt_init calls it in blk_register_queue() and queue sysfs attribute
store(queue_wb_lat_store() when write it 1st time in case of !BLK_WBT_MQ),
the following patch will speedup the queue freezing in wbt_init.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20210609015822.103433-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd63141d585bef14f4caf111f6d0e27fe2300ec6 ]
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e315dc07df009c3e29d6926871f62a30cfae394 ]
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This vendor hook let us initialize payload of the request.
Bug: 188749221
Change-Id: I51d6a3010ac0ab36066dbe1368158592832112b7
Signed-off-by: Yang Yang <yang.yang@vivo.com>
This vendor hook let us attach oem data as payload to the request.
The payload is used by oem driver for debugging purpose.
Bug: 188749221
Change-Id: Iac598bd9cce836dac0efe9198a3e7752928f351a
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Commit "block: Do not accept any requests while suspended" broke the UFS
driver. In the upstream kernel this has been fixed by commit b294ff3e3449
("scsi: ufs: core: Enable power management for wlun"). Backporting that
commit or backporting the entire v5.14-rc1 UFS driver is too risky.
Hence revert the block layer patch that is incompatible with the v5.10
UFS driver power management code.
This reverts commit d55d15a332.
Bug: 193181075
Change-Id: Ic50d4e1df98d7ed393bf9797787225ae22e5d7a3
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Add ANDROID_OEM_DATA for implement of oem gki
Bug: 188749221
Change-Id: I1feba2334aa34e3bc46eb9d0217118485405beb4
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Add ANDROID_OEM_DATA for implement of oem gki
Bug: 188749221
Change-Id: Ide8378a898de01a34d8ca3c34472844cd4ffa71c
Signed-off-by: Yang Yang <yang.yang@vivo.com>
While one or more requests with a certain I/O priority are pending, do not
dispatch lower priority requests. Dispatch lower priority requests anyway
after the "aging" time has expired.
This patch has been tested as follows:
modprobe scsi_debug ndelay=1000000 max_queue=16 &&
sd='' &&
while [ -z "$sd" ]; do
sd=/dev/$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done &&
echo $((100*1000)) > /sys/block/$sd/queue/iosched/aging_expire &&
cd /sys/fs/cgroup/blkio/ &&
echo $$ >cgroup.procs &&
echo restrict-to-be >blkio.prio.class &&
mkdir -p hipri &&
cd hipri &&
echo none-to-rt >blkio.prio.class &&
{ max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/low-pri.txt & } &&
echo $$ >cgroup.procs &&
max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/hi-pri.txt
Result:
* 11000 IOPS for the high-priority job
* 40 IOPS for the low-priority job
If the aging expiry time is changed from 100s into 0, the IOPS results change
into 6712 and 6796 IOPS.
The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Change-Id: I99a0674b018d096ec96bbfa3008eedcfda5013da
BUG: 187357408
(cherry picked from commit 40d5d42992b0de3ae7961735ea15eef5bd385ebf git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Maintain statistics per cgroup and export these to user space. These
statistics are essential for verifying whether the proper I/O priorities
have been assigned to requests. An example of the statistics data with
this patch applied:
$ cat /sys/fs/cgroup/io.stat
11:2 rbytes=0 wbytes=0 rios=3 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0
8:32 rbytes=2142720 wbytes=0 rios=105 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I8d976c62ba2c0397cbb18076f3e61d5ab246cbcf
(cherry picked from commit f5dc926252cb31739809f7d27a8cbc9941b4d36d git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Track I/O statistics per I/O priority and export these statistics to
debugfs. These statistics help developers of the deadline scheduler.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I8e91693dc1d015060737fa2fc15f5f2ebee2530c
(cherry picked from commit 9dc236caf2518c1e434be7a4f8fae60fb0be506a git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Maintain one dispatch list and one FIFO list per I/O priority class: RT, BE
and IDLE. Maintain statistics for each priority level. Split the debugfs
attributes per priority level as follows:
$ ls /sys/kernel/debug/block/.../sched/
async_depth dispatch2 read_next_rq write2_fifo_list
batching read0_fifo_list starved write_next_rq
dispatch0 read1_fifo_list write0_fifo_list
dispatch1 read2_fifo_list write1_fifo_list
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I60451cfdb416ad27601dc3ffb4eb307fa6ff783f
(cherry picked from commit 5b701a6e040ff8626ecf29ac06de9689efc00754 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
When dispatching the first request of a batch, the deadline_move_request()
call clears .next_rq[] for the opposite data direction. .next_rq[] is not
restored when changing data direction. Fix this by not clearing .next_rq[]
and by keeping track of the data direction of a batch in a variable instead.
This patch is a micro-optimization because:
- The number of deadline_next_request() calls for the read direction is
halved.
- The number of times that deadline_next_request() returns NULL is reduced.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I582e99603a5443d75cf2b18a5daa2c93b5c66de3
(cherry picked from commit ea0fd2a525436ab5b9ada0f1953b0c0a29357311 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
For interactive workloads it is important that synchronous requests are
not delayed. Hence reserve 25% of scheduler tags for synchronous requests.
This patch still allows asynchronous requests to fill the hardware queues
since blk_mq_init_sched() makes sure that the number of scheduler requests
is the double of the hardware queue depth. From blk_mq_init_sched():
q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
BLKDEV_MAX_RQ);
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: Ib9cd753a39c8e5f5c45908001d69334130ef2067
(cherry picked from commit c970bc8292aaaf6f2d333d612e657df3a99f417c git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Define separate macros for integers and jiffies to improve readability.
Use sysfs_emit() and kstrtoint() instead of sprintf() and simple_strtol().
The former functions are the recommended functions.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I4e0fd35124cd0319fcace0d1d5e3c113b60a213c
(cherry picked from commit d9baee13f8cf66a8fac9ec67fdb85ce419fcce3a git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Modern compilers complain if an out-of-range value is passed to a function
argument that has an enumeration type. Let the compiler detect out-of-range
data direction arguments instead of verifying the data_dir argument at
runtime.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I4ad8c106a86d17f3010e12e172702e77eca61e80
(cherry picked from commit d9baee13f8cf66a8fac9ec67fdb85ce419fcce3a git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change "queue" into "sched" to make the function names reflect better the
purpose of these functions.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I30825b379146dbaef4ff3f85148b2e788667a77c
(cherry picked from commit a6e57fe5ab09c250fc741294e6321270a4364fec git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Make __dd_dispatch_request() easier to read by removing two local
variables.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I5567f7d02a2c628efb437058a1c103c7b123747a
(cherry picked from commit f005b6ff19d2a961a2c3ae9c5f49d48fda143469 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Document the locking strategy by adding two lockdep_assert_held()
statements.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: Ie8cf0b0ae208c9cc87731a9c6d7df5e5e59332d5
(cherry picked from commit 91831ddfd7c6e3df9857526a76cfa88673ec0637 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Make the code easier to read by adding more comments.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: If62eb600614d2883d72ee3bd7e7859ae66b24512
(cherry picked from commit 16c3afdb127bbff7d3552e076e568281765674b7 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Introduce an rq-qos policy that assigns an I/O priority to requests based
on blk-cgroup configuration settings. This policy has the following
advantages over the ioprio_set() system call:
- This policy is cgroup based so it has all the advantages of cgroups.
- While ioprio_set() does not affect page cache writeback I/O, this rq-qos
controller affects page cache writeback I/O for filesystems that support
assiociating a cgroup with writeback I/O. See also
Documentation/admin-guide/cgroup-v2.rst.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: If51e608ad37ee7a3f57b507bb17900dcfcb263ed
(cherry picked from commit ee9d2a55c960f152b5710078bbe399a4c51eb0a9 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
rq_qos_id_to_name() is only used in blk-mq-debugfs.c so move that function
into in blk-mq-debugfs.c.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: If03083a13917bc2f88b6df7151e033a11ab1bc50
(cherry picked from commit f1a7f539c2720906fb10be0af3514b034e1a9fee git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Before adding more calls in this function, simplify the error path.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I8568b87d1bebbd3841e42a79b7efe2d0a1bff2bc
(cherry picked from commit f1a7f539c2720906fb10be0af3514b034e1a9fee git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
These entries were consecutive at the time of their introduction but are no
longer consecutive. Make these again consecutive. Additionally, modify the
help text since it refers to blk-mq and since the legacy block layer has
been removed.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I568383377a3244efba9748adf0a2e90bd7660bb2
(cherry picked from commit fdc250ea26e44066d690bbe65a03fab512af0699 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Since commit 01e99aeca3 'blk-mq: insert passthrough request into
hctx->dispatch directly', passthrough request should not appear in
IO-scheduler any more, so blk_rq_is_passthrough checking in addon IO
schedulers is redundant.
(Notes: this patch passes generic IO load test with hdds under SAS
controller and hdds under AHCI controller but obviously not covers all.
Not sure if passthrough request can still escape into IO scheduler from
blk_mq_sched_insert_requests, which is used by blk_mq_flush_plug_list and
has lots of indirect callers.)
Signed-off-by: Lin Feng <linf@wangsu.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
BUG: 187357408
Change-Id: I97d85c38e584add44399295f3839994b694bc9ca
(cherry picked from commit 0856faaa220759a4fe4334f5c57a8661c94c14ce git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Currently when non-mq aware IO scheduler (BFQ, mq-deadline) is used for
a queue with multiple HW queues, the performance it rather bad. The
problem is that these IO schedulers use queue-wide locking and their
dispatch function does not respect the hctx it is passed in and returns
any request it finds appropriate. Thus locality of request access is
broken and dispatch from multiple CPUs just contends on IO scheduler
locks. For these IO schedulers there's little point in dispatching from
multiple CPUs. Instead dispatch always only from a single CPU to limit
contention.
Below is a comparison of dbench runs on XFS filesystem where the storage
is a raid card with 64 HW queues and to it attached a single rotating
disk. BFQ is used as IO scheduler:
clients MQ SQ MQ-Patched
Amean 1 39.12 (0.00%) 43.29 * -10.67%* 36.09 * 7.74%*
Amean 2 128.58 (0.00%) 101.30 * 21.22%* 96.14 * 25.23%*
Amean 4 577.42 (0.00%) 494.47 * 14.37%* 508.49 * 11.94%*
Amean 8 610.95 (0.00%) 363.86 * 40.44%* 362.12 * 40.73%*
Amean 16 391.78 (0.00%) 261.49 * 33.25%* 282.94 * 27.78%*
Amean 32 324.64 (0.00%) 267.71 * 17.54%* 233.00 * 28.23%*
Amean 64 295.04 (0.00%) 253.02 * 14.24%* 242.37 * 17.85%*
Amean 512 10281.61 (0.00%) 10211.16 * 0.69%* 10447.53 * -1.61%*
Numbers are times so lower is better. MQ is stock 5.10-rc6 kernel. SQ is
the same kernel with megaraid_sas.host_tagset_enable=0 so that the card
advertises just a single HW queue. MQ-Patched is a kernel with this
patch applied.
You can see multiple hardware queues heavily hurt performance in
combination with BFQ. The patch restores the performance.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
BUG: 187357408
Change-Id: I53645eb48cb308cd3af81a1c5e718a6abec6a1f9
(cherry picked from commit fa56cac78af68bd93734c290a0ffd0716e871dba git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
This reverts commit b445547ec1.
Since both mq-deadline and BFQ completely ignore hctx they are passed to
their dispatch function and dispatch whatever request they deem fit
checking whether any request for a particular hctx is queued is just
pointless since we'll very likely get a request from a different hctx
anyway. In the following commit we'll deal with lock contention in these
IO schedulers in presence of multiple HW queues in a different way.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Change-Id: Ibd7dbe69ae1799f2efce5788986e2f1aad88f66d
BUG: 187357408
(cherry picked from commit 2490aeca0081bb168e96fb7b1746d676be84369f git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
This reverts commit 59870a78d4.
Bring back the commit in 5.10.38 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic93df6ad7bb79dda947aedc31d23d69fcd97b7d7
Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.
Based on one patch written by David Jeffery.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Change-Id: I631dbb5138e427246d2e717576fc44727daaa286
Bug: 188199752
(cherry picked from commit 51d4673e57d2613152fdb2ccfe917643472bb218 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Change-Id: I740ddf3b83ea04ce0349b1b8055ac8b9db1d0557
Bug: 188199752
(cherry picked from commit 33238eb62b7575350be110adff231f32584b20f7 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Change-Id: I3e092c0487989ddf389308dc0da325b90e0bf7d4
Bug: 188199752
(cherry picked from commit 91af4d7b8930d9fd8767aee826c3ff4c1eaeec02 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
This reverts commit 54dbe2d2c1 as it
breaks the kernel abi at the moment. It will be restored at a later
point in time.
Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ida737ad962db2dc0ece0bd35ccb71e0db8e76fa2
Changes in 5.10.38
KEYS: trusted: Fix memory leak on object td
tpm: fix error return code in tpm2_get_cc_attrs_tbl()
tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt()
tpm, tpm_tis: Reserve locality in tpm_tis_resume()
KVM: x86/mmu: Remove the defunct update_pte() paging hook
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
ACPI: PM: Add ACPI ID of Alder Lake Fan
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
kvm: Cap halt polling at kvm->max_halt_poll_ns
ath11k: fix thermal temperature read
fs: dlm: fix debugfs dump
fs: dlm: add errno handling to check callback
fs: dlm: check on minimum msglen size
fs: dlm: flush swork on shutdown
tipc: convert dest node's address to network order
ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
net: stmmac: Set FIFO sizes for ipq806x
ASoC: rsnd: core: Check convert rate in rsnd_hw_params
Bluetooth: Fix incorrect status handling in LE PHY UPDATE event
i2c: bail out early when RDWR parameters are wrong
ALSA: hdsp: don't disable if not enabled
ALSA: hdspm: don't disable if not enabled
ALSA: rme9652: don't disable if not enabled
ALSA: bebob: enable to deliver MIDI messages for multiple ports
Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
Bluetooth: initialize skb_queue_head at l2cap_chan_create()
net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports
net: bridge: when suppression is enabled exclude RARP packets
Bluetooth: check for zapped sk before connecting
selftests/powerpc: Fix L1D flushing tests for Power10
powerpc/32: Statically initialise first emergency context
net: hns3: remediate a potential overflow risk of bd_num_list
net: hns3: add handling for xmit skb with recursive fraglist
ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
ice: handle increasing Tx or Rx ring sizes
Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip.
ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055
i2c: Add I2C_AQ_NO_REP_START adapter quirk
MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED
coresight: Do not scan for graph if none is present
IB/hfi1: Correct oversized ring allocation
mac80211: clear the beacon's CRC after channel switch
pinctrl: samsung: use 'int' for register masks in Exynos
rtw88: 8822c: add LC calibration for RTL8822C
mt76: mt7615: support loading EEPROM for MT7613BE
mt76: mt76x0: disable GTK offloading
mt76: mt7915: fix txpower init for TSSI off chips
fuse: invalidate attrs when page writeback completes
virtiofs: fix userns
cuse: prevent clone
iwlwifi: pcie: make cfg vs. trans_cfg more robust
powerpc/mm: Add cond_resched() while removing hpte mappings
ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init()
Revert "iommu/amd: Fix performance counter initialization"
iommu/amd: Remove performance counter pre-initialization test
drm/amd/display: Force vsync flip when reconfiguring MPCC
selftests: Set CC to clang in lib.mk if LLVM is set
kconfig: nconf: stop endless search loops
ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740
ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp
ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
powerpc/smp: Set numa node before updating mask
ASoC: rt286: Generalize support for ALC3263 codec
ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule
samples/bpf: Fix broken tracex1 due to kprobe argument change
powerpc/pseries: Stop calling printk in rtas_stop_self()
drm/amd/display: fixed divide by zero kernel crash during dsc enablement
drm/amd/display: add handling for hdcp2 rx id list validation
drm/amdgpu: Add mem sync flag for IB allocated by SA
mt76: mt7615: fix entering driver-own state on mt7663
crypto: ccp: Free SEV device if SEV init fails
wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth
powerpc/iommu: Annotate nested lock for lockdep
iavf: remove duplicate free resources calls
net: ethernet: mtk_eth_soc: fix RX VLAN offload
selftests: mlxsw: Increase the tolerance of backlog buildup
selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
kbuild: generate Module.symvers only when vmlinux exists
bnxt_en: Add PCI IDs for Hyper-V VF devices.
ia64: module: fix symbolizer crash on fdescr
watchdog: rename __touch_watchdog() to a better descriptive name
watchdog: explicitly update timestamp when reporting softlockup
watchdog/softlockup: remove logic that tried to prevent repeated reports
watchdog: fix barriers when printing backtraces from all CPUs
ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params()
f2fs: move ioctl interface definitions to separated file
f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE
f2fs: fix to allow migrating fully valid segment
f2fs: fix panic during f2fs_resize_fs()
f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
remoteproc: qcom_q6v5_mss: Replace ioremap with memremap
remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader
PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
PCI: Release OF node in pci_scan_device()'s error path
ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
f2fs: fix to align to section for fallocate() on pinned file
f2fs: fix to update last i_size if fallocate partially succeeds
PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR
PCI: endpoint: Add helper API to get the 'next' unreserved BAR
PCI: endpoint: Make *_free_bar() to return error codes on failure
PCI: endpoint: Fix NULL pointer dereference for ->get_features()
f2fs: fix to avoid touching checkpointed data in get_victim()
f2fs: fix to cover __allocate_new_section() with curseg_lock
f2fs: Fix a hungtask problem in atomic write
f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
NFS: nfs4_bitmask_adjust() must not change the server global bitmasks
NFS: Fix attribute bitmask in _nfs42_proc_fallocate()
NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
NFS: Deal correctly with attribute generation counter overflow
PCI: endpoint: Fix missing destroy_workqueue()
pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
NFSv4.2 fix handling of sr_eof in SEEK's reply
SUNRPC: Move fault injection call sites
SUNRPC: Remove trace_xprt_transmit_queued
SUNRPC: Handle major timeout in xprt_adjust_timeout()
thermal/drivers/tsens: Fix missing put_device error
NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
nfsd: ensure new clients break delegations
rtc: fsl-ftm-alarm: add MODULE_TABLE()
dmaengine: idxd: Fix potential null dereference on pointer status
dmaengine: idxd: fix dma device lifetime
dmaengine: idxd: fix cdev setup and free device lifetime issues
SUNRPC: fix ternary sign expansion bug in tracing
pwm: atmel: Fix duty cycle calculation in .get_state()
xprtrdma: Avoid Receive Queue wrapping
xprtrdma: Fix cwnd update ordering
xprtrdma: rpcrdma_mr_pop() already does list_del_init()
swiotlb: Fix the type of index
ceph: fix inode leak on getattr error in __fh_to_dentry
scsi: qla2xxx: Prevent PRLI in target mode
scsi: ufs: core: Do not put UFS power into LPM if link is broken
scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
scsi: ufs: core: Narrow down fast path in system suspend path
rtc: ds1307: Fix wday settings for rx8130
net: hns3: fix incorrect configuration for igu_egu_hw_err
net: hns3: initialize the message content in hclge_get_link_mode()
net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()
net: hns3: fix for vxlan gpe tx checksum bug
net: hns3: use netif_tx_disable to stop the transmit queue
net: hns3: disable phy loopback setting in hclge_mac_start_phy
sctp: do asoc update earlier in sctp_sf_do_dupcook_a
RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
sunrpc: Fix misplaced barrier in call_decode
libbpf: Fix signed overflow in ringbuf_process_ring
block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
block/rnbd-clt: Check the return value of the function rtrs_clt_query
ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
netfilter: xt_SECMARK: add new revision to fix structure layout
xsk: Fix for xp_aligned_validate_desc() when len == chunk_size
net: stmmac: Clear receive all(RA) bit when promiscuous mode is off
drm/radeon: Fix off-by-one power_state index heap overwrite
drm/radeon: Avoid power table parsing memory leaks
arm64: entry: factor irq triage logic into macros
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
ksm: fix potential missing rmap_item for stable_node
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
ethtool: fix missing NLM_F_MULTI flag when dumping
net: fix nla_strcmp to handle more then one trailing null character
smc: disallow TCP_ULP in smc_setsockopt()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
netfilter: nftables: Fix a memleak from userdata error path in new objects
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251x: fix resume from sleep before interface was brought up
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
sched: Fix out-of-bound access in uclamp
sched/fair: Fix unfairness caused by missing load decay
fs/proc/generic.c: fix incorrect pde_is_permanent check
kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
netfilter: nftables: avoid overflows in nft_hash_buckets()
i40e: fix broken XDP support
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
mptcp: fix splat when closing unaccepted socket
f2fs: avoid unneeded data copy in f2fs_ioc_move_range()
ARC: entry: fix off-by-one error in syscall number validation
ARC: mm: PAE: use 40-bit physical page mask
ARC: mm: Use max_high_pfn as a HIGHMEM zone border
powerpc/64s: Fix crashes when toggling stf barrier
powerpc/64s: Fix crashes when toggling entry flush barrier
hfsplus: prevent corruption in shrinking truncate
squashfs: fix divide error in calculate_skip()
userfaultfd: release page in error path to avoid BUG_ON
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
blk-iocost: fix weight updates of inner active iocgs
arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
btrfs: fix race leading to unpersisted data and metadata on fsync
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
drm/i915: Avoid div-by-zero on gen2
kvm: exit halt polling on need_resched() as well
KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer
drm/msm/dp: initialize audio_comp when audio starts
KVM: x86: Cancel pvclock_gtod_work on module removal
KVM: x86: Prevent deadlock against tk_core.seq
dax: Add an enum for specifying dax wakup mode
dax: Add a wakeup mode parameter to put_unlocked_entry()
dax: Wake up all waiters after invalidating dax entry
xen/unpopulated-alloc: consolidate pgmap manipulation
xen/unpopulated-alloc: fix error return code in fill_list()
perf tools: Fix dynamic libbpf link
usb: dwc3: gadget: Free gadget structure only after freeing endpoints
iio: light: gp2ap002: Fix rumtime PM imbalance on error
iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
usb: fotg210-hcd: Fix an error message
hwmon: (occ) Fix poll rate limiting
usb: musb: Fix an error message
ACPI: scan: Fix a memory leak in an error handling path
kyber: fix out of bounds access when preempted
nvmet: add lba to sect conversion helpers
nvmet: fix inline bio check for bdev-ns
nvmet-rdma: Fix NULL deref when SEND is completed with error
f2fs: compress: fix to free compress page correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to assign cc.cluster_idx correctly
nbd: Fix NULL pointer in flush_workqueue
blk-mq: plug request for shared sbitmap
blk-mq: Swap two calls in blk_mq_exit_queue()
usb: dwc3: omap: improve extcon initialization
usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
usb: xhci: Increase timeout for HC halt
usb: dwc2: Fix gadget DMA unmap direction
usb: core: hub: fix race condition about TRSMRCY of resume
usb: dwc3: gadget: Enable suspend events
usb: dwc3: gadget: Return success always for kick transfer in ep queue
usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
usb: typec: ucsi: Put fwnode in any case during ->probe()
xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
xhci: Do not use GFP_KERNEL in (potentially) atomic context
xhci: Add reset resume quirk for AMD xhci controller.
iio: gyro: mpu3050: Fix reported temperature value
iio: tsl2583: Fix division by a zero lux_val
cdc-wdm: untangle a circular dependency between callback and softint
xen/gntdev: fix gntdev_mmap() error exit path
KVM: x86: Emulate RDPID only if RDTSCP is supported
KVM: x86: Move RDPID emulation intercept to its own enum
KVM: nVMX: Always make an attempt to map eVMCS after migration
KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported
KVM: VMX: Disable preemption when probing user return MSRs
Revert "iommu/vt-d: Remove WO permissions on second-level paging entries"
Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL"
iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
iommu/vt-d: Remove WO permissions on second-level paging entries
mm: fix struct page layout on 32-bit systems
MIPS: Reinstate platform `__div64_32' handler
MIPS: Avoid DIVU in `__div64_32' is result would be zero
MIPS: Avoid handcoded DIVU in `__div64_32' altogether
clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
ARM: 9012/1: move device tree mapping out of linear region
ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section
usb: typec: tcpm: Fix error while calculating PPS out values
kobject_uevent: remove warning in init_uevent_argv()
drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
drm/i915/overlay: Fix active retire callback alignment
drm/i915: Fix crash in auto_retire
clk: exynos7: Mark aclk_fsys1_200 as critical
media: rkvdec: Remove of_match_ptr()
i2c: mediatek: Fix send master code at more than 1MHz
dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
dt-bindings: serial: 8250: Remove duplicated compatible strings
debugfs: Make debugfs_allow RO after init
ext4: fix debug format string warning
nvme: do not try to reconfigure APST when the controller is not live
ASoC: rsnd: check all BUSIF status when error
Linux 5.10.38
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia32e01283b488a38be48015c58a0e481f09aaf65
[ Upstream commit 630ef623ed26c18a457cdc070cf24014e50129c2 ]
If a tag set is shared across request queues (e.g. SCSI LUNs) then the
block layer core keeps track of the number of active request queues in
tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that
atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make
sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is
cleared by blk_mq_del_queue_tag_set().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Fixes: 0d2602ca30 ("blk-mq: improve support for shared tags maps")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03f26d8f11403295de445b6e4e0e57ac57755791 ]
In case of shared sbitmap, request won't be held in plug list any more
sine commit 32bc15afed ("blk-mq: Facilitate a shared sbitmap per
tagset"), this way makes request merge from flush plug list & batching
submission not possible, so cause performance regression.
Yanhui reports performance regression when running sequential IO
test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
is emulated with image stored on xfs/megaraid_sas.
Fix the issue by recovering original behavior to allow to hold request
in plug list.
Cc: Yanhui Ma <yama@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: kashyap.desai@broadcom.com
Fixes: 32bc15afed ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514022052.1047665-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efed9a3337e341bd0989161b97453b52567bc59d ]
__blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and
passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx
for the current CPU again and uses that to get the corresponding Kyber
context in the passed hctx. However, the thread may be preempted between
the two calls to blk_mq_get_ctx(), and the ctx returned the second time
may no longer correspond to the passed hctx. This "works" accidentally
most of the time, but it can cause us to read garbage if the second ctx
came from an hctx with more ctx's than the first one (i.e., if
ctx->index_hw[hctx->type] > hctx->nr_ctx).
This manifested as this UBSAN array index out of bounds error reported
by Jakub:
UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9
index 13106 is out of range for type 'long unsigned int [128]'
Call Trace:
dump_stack+0xa4/0xe5
ubsan_epilogue+0x5/0x40
__ubsan_handle_out_of_bounds.cold.13+0x2a/0x34
queued_spin_lock_slowpath+0x476/0x480
do_raw_spin_lock+0x1c2/0x1d0
kyber_bio_merge+0x112/0x180
blk_mq_submit_bio+0x1f5/0x1100
submit_bio_noacct+0x7b0/0x870
submit_bio+0xc2/0x3a0
btrfs_map_bio+0x4f0/0x9d0
btrfs_submit_data_bio+0x24e/0x310
submit_one_bio+0x7f/0xb0
submit_extent_page+0xc4/0x440
__extent_writepage_io+0x2b8/0x5e0
__extent_writepage+0x28d/0x6e0
extent_write_cache_pages+0x4d7/0x7a0
extent_writepages+0xa2/0x110
do_writepages+0x8f/0x180
__writeback_single_inode+0x99/0x7f0
writeback_sb_inodes+0x34e/0x790
__writeback_inodes_wb+0x9e/0x120
wb_writeback+0x4d2/0x660
wb_workfn+0x64d/0xa10
process_one_work+0x53a/0xa80
worker_thread+0x69/0x5b0
kthread+0x20b/0x240
ret_from_fork+0x1f/0x30
Only Kyber uses the hctx, so fix it by passing the request_queue to
->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can
map the queues itself to avoid the mismatch.
Fixes: a6088845c2 ("block: kyber: make kyber more friendly with merging")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e9f4eee9a0023ba22db9560d4cc6ee63f933dae8 upstream.
When the weight of an active iocg is updated, weight_updated() is called
which in turn calls __propagate_weights() to update the active and inuse
weights so that the effective hierarchical weights are update accordingly.
The current implementation is incorrect for inner active nodes. For an
active leaf iocg, inuse can be any value between 1 and active and the
difference represents how much the iocg is donating. When weight is updated,
as long as inuse is clamped between 1 and the new weight, we're alright and
this is what __propagate_weights() currently implements.
However, that's not how an active inner node's inuse is set. An inner node's
inuse is solely determined by the ratio between the sums of inuse's and
active's of its children - ie. they're results of propagating the leaves'
active and inuse weights upwards. __propagate_weights() incorrectly applies
the same clamping as for a leaf when an active inner node's weight is
updated. Consider a hierarchy which looks like the following with saturating
workloads in AA and BB.
R
/ \
A B
| |
AA BB
1. For both A and B, active=100, inuse=100, hwa=0.5, hwi=0.5.
2. echo 200 > A/io.weight
3. __propagate_weights() update A's active to 200 and leave inuse at 100 as
it's already between 1 and the new active, making A:active=200,
A:inuse=100. As R's active_sum is updated along with A's active,
A:hwa=2/3, B:hwa=1/3. However, because the inuses didn't change, the
hwi's remain unchanged at 0.5.
4. The weight of A is now twice that of B but AA and BB still have the same
hwi of 0.5 and thus are doing the same amount of IOs.
Fix it by making __propgate_weights() always calculate the inuse of an
active inner iocg based on the ratio of child_inuse_sum to child_active_sum.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Link: https://lore.kernel.org/r/YJsxnLZV1MnBcqjj@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Fixes: b686631865 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 188199752
(cherry picked from commit 773cd5fb22e7c61c65c7528a3e1cb5bcbc1408ea git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I0cbcb35fce831daab99145d2401a6fdeb5d575b4
If a tag set is shared across request queues (e.g. SCSI LUNs) then the
block layer core keeps track of the number of active request queues in
tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that
atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make
sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is
cleared by blk_mq_del_queue_tag_set().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Fixes: 0d2602ca30 ("blk-mq: improve support for shared tags maps")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bug: 181696921
Link: https://lore.kernel.org/linux-block/d1b1f123-9b47-8805-0b86-2fa9f6b19bb5@kernel.dk/T/#ma8a01d264c59074d3f9b69f833ffe408592cf837
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I3f9ca00fbefec97d1e3c9df85365be0265b8cec6
Revert most of commit 25a6f60b717d ("BACKPORT: bio: limit bio max size")
because it has been reported to cause data corruption and because it has
been reverted upstream. See also
https://lore.kernel.org/linux-block/1620571445.2k94orj8ee.none@localhost/T/#t
Cc: Changheun Lee <nanich.lee@samsung.com>
Cc: Jaegeuk Kim <jaegeuk@google.com>
Bug: 182716953
Change-Id: I79bf39d1d1a0c13cb30ff4b0f1da2b43e1c817f0
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The inflight of partition 0 doesn't include inflight IOs to all
sub-partitions, since currently mq calculates inflight of specific
partition by simply camparing the value of the partition pointer.
Thus the following case is possible:
$ cat /sys/block/vda/inflight
0 0
$ cat /sys/block/vda/vda1/inflight
0 128
While single queue device (on a previous version, e.g. v3.10) has no
this issue:
$cat /sys/block/sda/sda3/inflight
0 33
$cat /sys/block/sda/inflight
0 33
Partition 0 should be specially handled since it represents the whole
disk. This issue is introduced since commit bf0ddaba65 ("blk-mq: fix
sysfs inflight counter").
Besides, this patch can also fix the inflight statistics of part 0 in
/proc/diskstats. Before this patch, the inflight statistics of part 0
doesn't include that of sub partitions. (I have marked the 'inflight'
field with asterisk.)
$cat /proc/diskstats
259 0 nvme0n1 45974469 0 367814768 6445794 1 0 1 0 *0* 111062 6445794 0 0 0 0 0 0
259 2 nvme0n1p1 45974058 0 367797952 6445727 0 0 0 0 *33* 111001 6445727 0 0 0 0 0 0
This is introduced since commit f299b7c7a9 ("blk-mq: provide internal
in-flight variant").
Fixes: bf0ddaba65 ("blk-mq: fix sysfs inflight counter")
Fixes: f299b7c7a9 ("blk-mq: provide internal in-flight variant")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[axboe: adapt for 5.11 partition change]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 187355247
Change-Id: I378b2cb7312a5e47d5e2ec7301dc392e6e7336d0
(cherry picked from commit b0d97557ebfc9d5ba5f2939339a9fdd267abafeb)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.
When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.
| bio merge for 32MB. total 8,192 pages are merged.
| total elapsed time is over 2ms.
|------------------ ... ----------------------->|
| 8,192 pages merged a bio.
| at this time, first bio submit is done.
| 1 bio is split to 32 read request and issue.
|--------------->
|--------------->
|--------------->
......
|--------------->
|--------------->|
total 19ms elapsed to complete 32MB read done from device. |
If bio max size is limited with 1MB, behavior is changed below.
| bio merge for 1MB. 256 pages are merged for each bio.
| total 32 bio will be made.
| total elapsed time is over 2ms. it's same.
| but, first bio submit timing is fast. about 100us.
|--->|--->|--->|---> ... -->|--->|--->|--->|--->|
| 256 pages merged a bio.
| at this time, first bio submit is done.
| and 1 read request is issued for 1 bio.
|--------------->
|--------------->
|--------------->
......
|--------------->
|--------------->|
total 17ms elapsed to complete 32MB read done from device. |
As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.
Signed-off-by: Changheun Lee <nanich.lee@samsung.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210503095203.29076-1-nanich.lee@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 182716953
(cherry picked from commit cd2c7545ae1beac3b6aae033c7f31193b3255946)
Change-Id: Ie3876daa495535dc7f856ed9a281e65d72a437c1
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Changes in 5.10.33
vhost-vdpa: protect concurrent access to vhost device iotlb
gpio: omap: Save and restore sysconfig
KEYS: trusted: Fix TPM reservation for seal/unseal
vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
pinctrl: lewisburg: Update number of pins in community
block: return -EBUSY when there are open partitions in blkdev_reread_part
pinctrl: core: Show pin numbers for the controllers with base = 0
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
bpf: Permits pointers on stack for helper calls
bpf: Allow variable-offset stack access
bpf: Refactor and streamline bounds check into helper
bpf: Tighten speculative pointer arithmetic mask
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
perf auxtrace: Fix potential NULL pointer dereference
perf map: Fix error return code in maps__clone()
HID: google: add don USB id
HID: alps: fix error return code in alps_input_configured()
HID cp2112: fix support for multiple gpiochips
HID: wacom: Assign boolean values to a bool variable
soc: qcom: geni: shield geni_icc_get() for ACPI boot
dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
dmaengine: xilinx: dpdma: Fix race condition in done IRQ
ARM: dts: Fix swapped mmc order for omap3
net: geneve: check skb is large enough for IPv4/IPv6 header
dmaengine: tegra20: Fix runtime PM imbalance on error
s390/entry: save the caller of psw_idle
arm64: kprobes: Restore local irqflag if kprobes is cancelled
xen-netback: Check for hotplug-status existence before watching
cavium/liquidio: Fix duplicate argument
kasan: fix hwasan build for gcc
csky: change a Kconfig symbol name to fix e1000 build error
ia64: fix discontig.c section mismatches
ia64: tools: remove duplicate definition of ia64_mf() on ia64
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
net: hso: fix NULL-deref on disconnect regression
USB: CDC-ACM: fix poison/unpoison imbalance
Linux 5.10.33
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I638db3c919ad938eaaaac3d687175252edcd7990
[ Upstream commit 68e6582e8f2dc32fd2458b9926564faa1fb4560e ]
The switch to go through blkdev_get_by_dev means we now ignore the
return value from bdev_disk_changed in __blkdev_get. Add a manual
check to restore the old semantics.
Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part")
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210421160502.447418-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.31
interconnect: core: fix error return code of icc_link_destroy()
gfs2: Flag a withdraw if init_threads() fails
KVM: arm64: Hide system instruction access to Trace registers
KVM: arm64: Disable guest access to trace filter controls
drm/imx: imx-ldb: fix out of bounds array access warning
gfs2: report "already frozen/thawed" errors
ftrace: Check if pages were allocated before calling free_pages()
tools/kvm_stat: Add restart delay
drm/tegra: dc: Don't set PLL clock to 0Hz
gpu: host1x: Use different lock classes for each client
XArray: Fix splitting to non-zero orders
block: only update parent bi_status when bio fail
radix tree test suite: Register the main thread with the RCU library
idr test suite: Take RCU read lock in idr_find_test_1
idr test suite: Create anchor before launching throbber
null_blk: fix command timeout completion handling
io_uring: don't mark S_ISBLK async work as unbounded
riscv,entry: fix misaligned base for excp_vect_table
block: don't ignore REQ_NOWAIT for direct IO
netfilter: x_tables: fix compat match/target pad out-of-bound write
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
net: sfp: relax bitrate-derived mode check
net: sfp: cope with SFPs that set both LOS normal and LOS inverted
xen/events: fix setting irq affinity
Linux 5.10.31
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I19a7cfbdaab23e578dd82c552aea86d367c2f40f
[ Upstream commit 3edf5346e4f2ce2fa0c94651a90a8dda169565ee ]
For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:
split bio(READ) kworker
nvme_complete_rq
blk_update_request //split error=0
bio_endio
bio_integrity_endio
queue_work(kintegrityd_wq, &bip->bip_work);
bio_integrity_verify_fn
bio_endio //split bio
__bio_chain_endio
if (!parent->bi_status)
<interrupt entry>
nvme_irq
blk_update_request //parent error=7
req_bio_endio
bio->bi_status = 7 //parent bio
<interrupt exit>
parent->bi_status = 0
parent->bi_end_io() // return bi_status=0
The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.
In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.27
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
mm/memcg: set memcg when splitting page
mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
net: stmmac: fix dma physical address of descriptor when display ring
net: fec: ptp: avoid register access when ipg clock is disabled
powerpc/4xx: Fix build errors from mfdcr()
atm: eni: dont release is never initialized
atm: lanai: dont run lanai_dev_close if not open
Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
ALSA: hda: ignore invalid NHLT table
ixgbe: Fix memleak in ixgbe_configure_clsu32
scsi: ufs: ufs-qcom: Disable interrupt in reset path
blk-cgroup: Fix the recursive blkg rwstat
net: tehuti: fix error return code in bdx_probe()
net: intel: iavf: fix error return code of iavf_init_get_resources()
sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
gianfar: fix jumbo packets+napi+rx overrun crash
cifs: ask for more credit on async read/write code paths
gfs2: fix use-after-free in trans_drain
cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
gpiolib: acpi: Add missing IRQF_ONESHOT
nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
NFS: Correct size calculation for create reply length
net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
net: wan: fix error return code of uhdlc_init()
net: davicom: Use platform_get_irq_optional()
net: enetc: set MAC RX FIFO to recommended value
atm: uPD98402: fix incorrect allocation
atm: idt77252: fix null-ptr-dereference
cifs: change noisy error message to FYI
irqchip/ingenic: Add support for the JZ4760
kbuild: add image_name to no-sync-config-targets
kbuild: dummy-tools: fix inverted tests for gcc
umem: fix error return code in mm_pci_probe()
sparc64: Fix opcode filtering in handling of no fault loads
habanalabs: Call put_pid() when releasing control device
staging: rtl8192e: fix kconfig dependency on CRYPTO
u64_stats,lockdep: Fix u64_stats_init() vs lockdep
kselftest: arm64: Fix exit code of sve-ptrace
regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck
block: Fix REQ_OP_ZONE_RESET_ALL handling
drm/amd/display: Revert dram_clock_change_latency for DCN2.1
drm/amdgpu: fb BO should be ttm_bo_type_device
drm/radeon: fix AGP dependency
nvme: simplify error logic in nvme_validate_ns()
nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
nvme-core: check ctrl css before setting up zns
nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
nfs: we don't support removing system.nfs4_acl
block: Suppress uevent for hidden device when removed
mm/fork: clear PASID for new mm
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
static_call: Pull some static_call declarations to the type headers
static_call: Allow module use without exposing static_call_key
static_call: Fix the module key fixup
static_call: Fix static_call_set_init()
KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
btrfs: fix sleep while in non-sleep context during qgroup removal
selinux: don't log MAC_POLICY_LOAD record on failed policy load
selinux: fix variable scope issue in live sidtab conversion
netsec: restore phy power state after controller reset
platform/x86: intel-vbtn: Stop reporting SW_DOCK events
psample: Fix user API breakage
z3fold: prevent reclaim/free race for headless pages
squashfs: fix inode lookup sanity checks
squashfs: fix xattr id and id lookup sanity checks
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
kasan: fix per-page tags for non-page_alloc pages
gcov: fix clang-11+ support
ACPI: video: Add missing callback back for Sony VPCEH3U1E
ACPICA: Always create namespace nodes using acpi_ns_create_node()
arm64: stacktrace: don't trace arch_stack_walk()
arm64: dts: ls1046a: mark crypto engine dma coherent
arm64: dts: ls1012a: mark crypto engine dma coherent
arm64: dts: ls1043a: mark crypto engine dma coherent
ARM: dts: at91: sam9x60: fix mux-mask for PA7 so it can be set to A, B and C
ARM: dts: at91: sam9x60: fix mux-mask to match product's datasheet
ARM: dts: at91-sama5d27_som1: fix phy address to 7
integrity: double check iint_cache was initialized
drm/etnaviv: Use FOLL_FORCE for userptr
drm/amd/pm: workaround for audio noise issue
drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
drm/amdgpu: Add additional Sienna Cichlid PCI ID
drm/i915: Fix the GT fence revocation runtime PM logic
dm verity: fix DM_VERITY_OPTS_MAX value
dm ioctl: fix out of bounds array access when no devices
bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD
ARM: OMAP2+: Fix smartreflex init regression after dropping legacy data
soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva
veth: Store queue_mapping independently of XDP prog presence
bpf: Change inode_storage's lookup_elem return value from NULL to -EBADF
libbpf: Fix INSTALL flag order
net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets
net/mlx5e: When changing XDP program without reset, take refs for XSK RQs
net/mlx5e: Don't match on Geneve options in case option masks are all zero
ipv6: fix suspecious RCU usage warning
drop_monitor: Perform cleanup upon probe registration failure
macvlan: macvlan_count_rx() needs to be aware of preemption
net: sched: validate stab values
net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
igc: reinit_locked() should be called with rtnl_lock
igc: Fix Pause Frame Advertising
igc: Fix Supported Pause Frame Link Setting
igc: Fix igc_ptp_rx_pktstamp()
e1000e: add rtnl_lock() to e1000_reset_task
e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
net: phy: broadcom: Add power down exit reset state delay
ftgmac100: Restart MAC HW once
clk: qcom: gcc-sc7180: Use floor ops for the correct sdcc1 clk
net: ipa: terminate message handler arrays
net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()
flow_dissector: fix byteorder of dissected ICMP ID
selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed
netfilter: ctnetlink: fix dump of the expect mask attribute
net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"
net: phylink: Fix phylink_err() function name error in phylink_major_config
tipc: better validate user input in tipc_nl_retrieve_key()
tcp: relookup sock for RST+ACK packets handled by obsolete req sock
can: isotp: isotp_setsockopt(): only allow to set low level TX flags for CAN-FD
can: isotp: TX-path: ensure that CAN frame flags are initialized
can: peak_usb: add forgotten supported devices
can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate
can: kvaser_pciefd: Always disable bus load reporting
can: c_can_pci: c_can_pci_remove(): fix use-after-free
can: c_can: move runtime PM enable/disable to c_can_platform
can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors
mac80211: fix rate mask reset
mac80211: Allow HE operation to be longer than expected.
selftests/net: fix warnings on reuseaddr_ports_exhausted
nfp: flower: fix unsupported pre_tunnel flows
nfp: flower: add ipv6 bit to pre_tunnel control message
nfp: flower: fix pre_tun mask id allocation
ftrace: Fix modify_ftrace_direct.
drm/msm/dsi: fix check-before-set in the 7nm dsi_pll code
ionic: linearize tso skb with too many frags
net/sched: cls_flower: fix only mask bit check in the validate_ct_state
netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags
netfilter: nftables: allow to update flowtable flags
netfilter: flowtable: Make sure GC works periodically in idle system
libbpf: Fix error path in bpf_object__elf_init()
libbpf: Use SOCK_CLOEXEC when opening the netlink socket
ARM: dts: imx6ull: fix ubi filesystem mount failed
ipv6: weaken the v4mapped source check
octeontx2-af: Formatting debugfs entry rsrc_alloc.
octeontx2-af: Modify default KEX profile to extract TX packet fields
octeontx2-af: Remove TOS field from MKEX TX
octeontx2-af: Fix irq free in rvu teardown
octeontx2-pf: Clear RSS enable flag on interace down
octeontx2-af: fix infinite loop in unmapping NPC counter
net: check all name nodes in __dev_alloc_name
net: cdc-phonet: fix data-interface release on probe failure
igb: check timestamp validity
r8152: limit the RX buffer size of RTL8153A for USB 2.0
net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes
selinux: vsock: Set SID for socket returned by accept()
selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value
libbpf: Fix BTF dump of pointer-to-array-of-struct
bpf: Fix umd memory leak in copy_process()
can: isotp: tx-path: zero initialize outgoing CAN frames
drm/msm: fix shutdown hook in case GPU components failed to bind
drm/msm: Fix suspend/resume on i.MX5
arm64: kdump: update ppos when reading elfcorehdr
PM: runtime: Defer suspending suppliers
net/mlx5: Add back multicast stats for uplink representor
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
net/mlx5e: Offload tuple rewrite for non-CT flows
net/mlx5e: Fix error path for ethtool set-priv-flag
PM: EM: postpone creating the debugfs dir till fs_initcall
net: bridge: don't notify switchdev for local FDB addresses
octeontx2-af: Fix memory leak of object buf
xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs
net: Consolidate common blackhole dst ops
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
igb: avoid premature Rx buffer reuse
net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode
net: axienet: Fix probe error cleanup
net: phy: introduce phydev->port
net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()
net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S
net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M
Revert "netfilter: x_tables: Switch synchronization to RCU"
netfilter: x_tables: Use correct memory barriers.
dm table: Fix zoned model check and zone sectors check
mm/mmu_notifiers: ensure range_end() is paired with range_start()
Revert "netfilter: x_tables: Update remaining dereference to RCU"
ACPI: scan: Rearrange memory allocation in acpi_device_add()
ACPI: scan: Use unique number for instance_no
perf auxtrace: Fix auxtrace queue conflict
perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
io_uring: fix provide_buffers sign extension
block: recalculate segment count for multi-segment discards correctly
scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
scsi: qedi: Fix error return code of qedi_alloc_global_queues()
scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
smb3: fix cached file size problems in duplicate extents (reflink)
cifs: Adjust key sizes and key generation routines for AES256 encryption
locking/mutex: Fix non debug version of mutex_lock_io_nested()
x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
mm/memcg: fix 5.10 backport of splitting page memcg
fs/cachefiles: Remove wait_bit_key layout dependency
ch_ktls: fix enum-conversion warning
can: dev: Move device back to init netns on owning netns delete
r8169: fix DMA being used after buffer free if WoL is enabled
net: dsa: b53: VLAN filtering is global to all users
mac80211: fix double free in ibss_leave
ext4: add reclaim checks to xattr code
fs/ext4: fix integer overflow in s_log_groups_per_flex
Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
Revert "net: bonding: fix error return code of bond_neigh_init()"
nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
can: peak_usb: Revert "can: peak_usb: add forgotten supported devices"
xen-blkback: don't leak persistent grants from xen_blkbk_map()
Linux 5.10.27
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7eafe976fd6bf33db6db4adb8ebf2ff087294a23
[ Upstream commit a958937ff166fc60d1c3a721036f6ff41bfa2821 ]
When a stacked block device inserts a request into another block device
using blk_insert_cloned_request, the request's nr_phys_segments field gets
recalculated by a call to blk_recalc_rq_segments in
blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to
handle multi-segment discards. For disk types which can handle
multi-segment discards like nvme, this results in discard requests which
claim a single segment when it should report several, triggering a warning
in nvme and causing nvme to fail the discard from the invalid state.
WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core]
...
nvme_setup_cmd+0x217/0x270 [nvme_core]
nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop]
__blk_mq_try_issue_directly+0xe7/0x1b0
blk_mq_request_issue_directly+0x41/0x70
? blk_account_io_start+0x40/0x50
dm_mq_queue_rq+0x200/0x3e0
blk_mq_dispatch_rq_list+0x10a/0x7d0
? __sbitmap_queue_get+0x25/0x90
? elv_rb_del+0x1f/0x30
? deadline_remove_request+0x55/0xb0
? dd_dispatch_request+0x181/0x210
__blk_mq_do_dispatch_sched+0x144/0x290
? bio_attempt_discard_merge+0x134/0x1f0
__blk_mq_sched_dispatch_requests+0x129/0x180
blk_mq_sched_dispatch_requests+0x30/0x60
__blk_mq_run_hw_queue+0x47/0xe0
__blk_mq_delay_run_hw_queue+0x15b/0x170
blk_mq_sched_insert_requests+0x68/0xe0
blk_mq_flush_plug_list+0xf0/0x170
blk_finish_plug+0x36/0x50
xlog_cil_committed+0x19f/0x290 [xfs]
xlog_cil_process_committed+0x57/0x80 [xfs]
xlog_state_do_callback+0x1e0/0x2a0 [xfs]
xlog_ioend_work+0x2f/0x80 [xfs]
process_one_work+0x1b6/0x350
worker_thread+0x53/0x3e0
? process_one_work+0x350/0x350
kthread+0x11b/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
This patch fixes blk_recalc_rq_segments to be aware of devices which can
have multi-segment discards. It calculates the correct discard segment
count by counting the number of bio as each discard bio is considered its
own segment.
Fixes: 1e739730c5 ("block: optionally merge discontiguous discard bios into a single request")
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Link: https://lore.kernel.org/r/20210211143807.GA115624@redhat
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ec491447b90ad6a4056a9656b13f0b3a1e83043 ]
register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN
but enables uevents at the end again in order to announce disk after
possible partitions are created.
When the device is removed the uevents are still on and user land sees
'remove' messages for devices which were never 'add'ed to the system.
KERNEL[95481.571887] remove /devices/virtual/nvme-fabrics/ctl/nvme5/nvme0c5n1 (block)
Let's suppress the uevents for GENHD_FL_HIDDEN by not enabling the
uevents at all.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20210311151917.136091-1-dwagner@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit faa44c69daf9ccbd5b8a1aee13e0e0d037c0be17 ]
Similarly to a single zone reset operation (REQ_OP_ZONE_RESET), execute
REQ_OP_ZONE_RESET_ALL operations with REQ_SYNC set.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f44657d74873735e93a50eb25014721a66aac19 ]
The current blkio.throttle.io_service_bytes_recursive doesn't
work correctly.
As an example, for the following blkcg hierarchy:
(Made 1GB READ in test1, 512MB READ in test2)
test
/ \
test1 test2
$ head -n 1 test/test1/blkio.throttle.io_service_bytes_recursive
8:0 Read 1073684480
$ head -n 1 test/test2/blkio.throttle.io_service_bytes_recursive
8:0 Read 537448448
$ head -n 1 test/blkio.throttle.io_service_bytes_recursive
8:0 Read 537448448
Clearly, above data of "test" reflects "test2" not "test1"+"test2".
Do the correct summary in blkg_rwstat_recursive_sum().
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBSKS0ACgkQONu9yGCS
aT7Ngg//c4C1WnWC0sNWzP3xT2paCkLnUUyjQTmrkbPvLtr2DvehW5Bvp/32pGiS
8mDMoTLq3QxNrfrU6SY3KavZRC9Pa+migAsVmuujygQwNphqv95/XxnFemFEAYTl
b8b5OJPyomzMMEwHzx1Tr+7/d58czrqXo97QI0lmaDlHl+9JKTg2SMX9AkHkU8pK
zYjbtzdhd9UZCTdVYY1ZFkQ1ik1iAWo3Xv0G2aMeQQpuGcZIh/Y66xBuyH+8g+Yz
3mInhPQvhkb+c+m4ZJ9NhOUVEW4Fl0fq0mVrrYkfHqXe0D36Vj/yYvO/yTSBqb4+
XQ5PLXX3KFVDFl1id94unXGgP3c0zBe30JZPqKdpSET+PzOlGiZTxMCfjPeTgu/Z
7xc2qSX1zn273HMTRrT1daO4/NXQ85kE04mZMzq7cqDpum7ltfKrEMum/Gma+dJz
Knn47oZHbSW4Er/WcAwHSeZpxvD7AahG/GlsQRy+IVPu/jMXJHmo2/Nv1fLJWp+G
7VVWRXug69hywGr7hFiT3USG2C5g5cV3/dEO8NFFjGKRa5CbLbQD6B3+Dz3dXyBH
jE3MGIoqoNk+SvJOAf2ogu7SS6wLynZWOchmAVvIQ4QEzcP2jroeFHKD49MYxDUE
dKcq0dtfMc4nUaUZ/XRfWtS9fSm+T4XonmvEY4yXnAyfZ0aeEM8=
=FdFm
-----END PGP SIGNATURE-----
Merge 5.10.24 into android12-5.10-lts
Changes in 5.10.24
uapi: nfnetlink_cthelper.h: fix userspace compilation error
powerpc/perf: Fix handling of privilege level checks in perf interrupt context
powerpc/pseries: Don't enforce MSI affinity with kdump
ethernet: alx: fix order of calls on resume
crypto: mips/poly1305 - enable for all MIPS processors
ath9k: fix transmitting to stations in dynamic SMPS mode
net: Fix gro aggregation for udp encaps with zero csum
net: check if protocol extracted by virtio_net_hdr_set_proto is correct
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
net: l2tp: reduce log level of messages in receive path, add counter instead
can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership
can: flexcan: assert FRZ bit in flexcan_chip_freeze()
can: flexcan: enable RX FIFO after FRZ/HALT valid
can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode
tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)
tcp: add sanity tests to TCP_QUEUE_SEQ
netfilter: nf_nat: undo erroneous tcp edemux lookup
netfilter: x_tables: gpf inside xt_find_revision()
net: always use icmp{,v6}_ndo_send from ndo_start_xmit
net: phy: fix save wrong speed and duplex problem if autoneg is on
selftests/bpf: Use the last page in test_snprintf_btf on s390
selftests/bpf: No need to drop the packet when there is no geneve opt
selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier
samples, bpf: Add missing munmap in xdpsock
libbpf: Clear map_info before each bpf_obj_get_info_by_fd
ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.
ibmvnic: always store valid MAC address
mt76: dma: do not report truncated frames to mac80211
powerpc/603: Fix protection of user pages mapped with PROT_NONE
mount: fix mounting of detached mounts onto targets that reside on shared mounts
cifs: return proper error code in statfs(2)
Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
docs: networking: drop special stable handling
net: dsa: tag_rtl4_a: fix egress tags
sh_eth: fix TRSCER mask for SH771x
net: enetc: don't overwrite the RSS indirection table when initializing
net: enetc: take the MDIO lock only once per NAPI poll cycle
net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets
net: enetc: don't disable VLAN filtering in IFF_PROMISC mode
net: enetc: force the RGMII speed and duplex instead of operating in inband mode
net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr
net: enetc: keep RX ring consumer index in sync with hardware
net: ethernet: mtk-star-emac: fix wrong unmap in RX handling
net/mlx4_en: update moderation when config reset
net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10
nexthop: Do not flush blackhole nexthops when loopback goes down
net: sched: avoid duplicates in classes dump
net: mscc: ocelot: properly reject destination IP keys in VCAP IS1
net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10
net: usb: qmi_wwan: allow qmimux add/del with master up
netdevsim: init u64 stats for 32bit hardware
cipso,calipso: resolve a number of problems with the DOI refcounts
net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII
stmmac: intel: Fixes clock registration error seen for multiple interfaces
net: lapbether: Remove netif_start_queue / netif_stop_queue
net: davicom: Fix regulator not turned off on failed probe
net: davicom: Fix regulator not turned off on driver removal
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
net: qrtr: fix error return code of qrtr_sendmsg()
s390/qeth: fix memory leak after failed TX Buffer allocation
r8169: fix r8168fp_adjust_ocp_cmd function
ixgbe: fail to create xfrm offload of IPsec tunnel mode SA
tools/resolve_btfids: Fix build error with older host toolchains
perf build: Fix ccache usage in $(CC) when generating arch errno table
net: stmmac: stop each tx channel independently
net: stmmac: fix watchdog timeout during suspend/resume stress test
net: stmmac: fix wrongly set buffer2 valid when sph unsupport
ethtool: fix the check logic of at least one channel for RX/TX
net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused
selftests: forwarding: Fix race condition in mirror installation
mlxsw: spectrum_ethtool: Add an external speed to PTYS register
perf traceevent: Ensure read cmdlines are null terminated.
perf report: Fix -F for branch & mem modes
net: hns3: fix query vlan mask value error for flow director
net: hns3: fix bug when calculating the TCAM table info
s390/cio: return -EFAULT if copy_to_user() fails again
bnxt_en: reliably allocate IRQ table on reset to avoid crash
gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
gpiolib: acpi: Allow to find GpioInt() resource by name and index
gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
gpio: fix gpio-device list corruption
drm/compat: Clear bounce structures
drm/amd/display: Add a backlight module option
drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp()
drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()
drm/amd/pm: bug fix for pcie dpm
drm/amdgpu/display: simplify backlight setting
drm/amdgpu/display: don't assert in set backlight function
drm/amdgpu/display: handle aux backlight in backlight_get_brightness
drm/shmem-helper: Check for purged buffers in fault handler
drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
drm: Use USB controller's DMA mask when importing dmabufs
drm: meson_drv add shutdown function
drm/shmem-helpers: vunmap: Don't put pages for dma-buf
drm/i915: Wedge the GPU if command parser setup fails
s390/cio: return -EFAULT if copy_to_user() fails
s390/crypto: return -EFAULT if copy_to_user() fails
qxl: Fix uninitialised struct field head.surface_id
sh_eth: fix TRSCER mask for R7S9210
media: usbtv: Fix deadlock on suspend
media: rkisp1: params: fix wrong bits settings
media: v4l: vsp1: Fix uif null pointer access
media: v4l: vsp1: Fix bru null pointer access
media: rc: compile rc-cec.c into rc-core
cifs: fix credit accounting for extra channel
net: hns3: fix error mask definition of flow director
s390/qeth: don't replace a fully completed async TX buffer
s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state
s390/qeth: improve completion of pending TX buffers
s390/qeth: fix notification for pending buffers during teardown
net: dsa: implement a central TX reallocation procedure
net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
net: dsa: trailer: don't allocate additional memory for padding/tagging
net: dsa: tag_qca: let DSA core deal with TX reallocation
net: dsa: tag_ocelot: let DSA core deal with TX reallocation
net: dsa: tag_mtk: let DSA core deal with TX reallocation
net: dsa: tag_lan9303: let DSA core deal with TX reallocation
net: dsa: tag_edsa: let DSA core deal with TX reallocation
net: dsa: tag_brcm: let DSA core deal with TX reallocation
net: dsa: tag_dsa: let DSA core deal with TX reallocation
net: dsa: tag_gswip: let DSA core deal with TX reallocation
net: dsa: tag_ar9331: let DSA core deal with TX reallocation
net: dsa: tag_mtk: fix 802.1ad VLAN egress
enetc: Fix unused var build warning for CONFIG_OF
net: enetc: initialize RFS/RSS memories for unused ports too
ath11k: peer delete synchronization with firmware
ath11k: start vdev if a bss peer is already created
ath11k: fix AP mode for QCA6390
i2c: rcar: faster irq code to minimize HW race condition
i2c: rcar: optimize cacheline to minimize HW race condition
scsi: ufs: WB is only available on LUN #0 to #7
udf: fix silent AED tagLocation corruption
iommu/vt-d: Clear PRQ overflow only when PRQ is empty
mmc: mxs-mmc: Fix a resource leak in an error handling path in 'mxs_mmc_probe()'
mmc: mediatek: fix race condition between msdc_request_timeout and irq
mmc: sdhci-iproc: Add ACPI bindings for the RPi
Platform: OLPC: Fix probe error handling
powerpc/pci: Add ppc_md.discover_phbs()
spi: stm32: make spurious and overrun interrupts visible
powerpc: improve handling of unrecoverable system reset
powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
HID: logitech-dj: add support for the new lightspeed connection iteration
powerpc/64: Fix stack trace not displaying final frame
iommu/amd: Fix performance counter initialization
clk: qcom: gdsc: Implement NO_RET_PERIPH flag
sparc32: Limit memblock allocation to low memory
sparc64: Use arch_validate_flags() to validate ADI flag
Input: applespi - don't wait for responses to commands indefinitely.
PCI: xgene-msi: Fix race in installing chained irq handler
PCI: mediatek: Add missing of_node_put() to fix reference leak
drivers/base: build kunit tests without structleak plugin
PCI/LINK: Remove bandwidth notification
ext4: don't try to processed freed blocks until mballoc is initialized
kbuild: clamp SUBLEVEL to 255
PCI: Fix pci_register_io_range() memory leak
i40e: Fix memory leak in i40e_probe
kasan: fix memory corruption in kasan_bitops_tags test
s390/smp: __smp_rescan_cpus() - move cpumask away from stack
drivers/base/memory: don't store phys_device in memory blocks
sysctl.c: fix underflow value setting risk in vm_table
scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
scsi: target: core: Add cmd length set before cmd complete
scsi: target: core: Prevent underflow for service actions
clk: qcom: gpucc-msm8998: Add resets, cxc, fix flags on gpu_gx_gdsc
mmc: sdhci: Update firmware interface API
ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler
ARM: assembler: introduce adr_l, ldr_l and str_l macros
ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk
ALSA: hda/hdmi: Cancel pending works before suspend
ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5
ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support
ALSA: hda: Drop the BATCH workaround for AMD controllers
ALSA: hda: Flush pending unsolicited events before suspend
ALSA: hda: Avoid spurious unsol event handling during S3/S4
ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
ALSA: usb-audio: Apply the control quirk to Plantronics headsets
ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()
ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe
ALSA: usb-audio: fix use after free in usb_audio_disconnect
Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")
block: Discard page cache of zone reset target range
block: Try to handle busy underlying device on discard
arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
arm64: mte: Map hotplugged memory as Normal Tagged
arm64: perf: Fix 64-bit event counter read truncation
s390/dasd: fix hanging DASD driver unbind
s390/dasd: fix hanging IO request during DASD driver unbind
software node: Fix node registration
xen/events: reset affinity of 2-level event when tearing it down
mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants
mmc: core: Fix partition switch time for eMMC
mmc: cqhci: Fix random crash when remove mmc module/card
cifs: do not send close in compound create+close requests
Goodix Fingerprint device is not a modem
USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
USB: gadget: u_ether: Fix a configfs return code
usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
usb: gadget: f_uac1: stop playback on function disable
usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
usb: dwc3: qcom: add URS Host support for sdm845 ACPI boot
usb: dwc3: qcom: add ACPI device id for sc8180x
usb: dwc3: qcom: Honor wakeup enabled/disabled state
USB: usblp: fix a hang in poll() if disconnected
usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
usb: xhci: do not perform Soft Retry for some xHCI hosts
xhci: Improve detection of device initiated wake signal.
usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
USB: serial: io_edgeport: fix memory leak in edge_startup
USB: serial: ch341: add new Product ID
USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
USB: serial: cp210x: add some more GE USB IDs
usbip: fix stub_dev to check for stream socket
usbip: fix vhci_hcd to check for stream socket
usbip: fix vudc to check for stream socket
usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
usbip: fix vhci_hcd attach_store() races leading to gpf
usbip: fix vudc usbip_sockfd_store races leading to gpf
Revert "serial: max310x: rework RX interrupt handling"
misc/pvpanic: Export module FDT device table
misc: fastrpc: restrict user apps from sending kernel RPC messages
staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
staging: rtl8712: unterminated string leads to read overflow
staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan
staging: comedi: addi_apci_1032: Fix endian problem for COS sample
staging: comedi: addi_apci_1500: Fix endian problem for command sample
staging: comedi: adv_pci1710: Fix endian problem for AI command data
staging: comedi: das6402: Fix endian problem for AI command data
staging: comedi: das800: Fix endian problem for AI command data
staging: comedi: dmm32at: Fix endian problem for AI command data
staging: comedi: me4000: Fix endian problem for AI command data
staging: comedi: pcl711: Fix endian problem for AI command data
staging: comedi: pcl818: Fix endian problem for AI command data
sh_eth: fix TRSCER mask for R7S72100
cpufreq: qcom-hw: fix dereferencing freed memory 'data'
cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
SUNRPC: Set memalloc_nofs_save() for sync tasks
NFS: Don't revalidate the directory permissions on a lookup failure
NFS: Don't gratuitously clear the inode cache when lookup failed
NFSv4.2: fix return value of _nfs4_get_security_label()
block: rsxx: fix error return code of rsxx_pci_probe()
nvme-fc: fix racing controller reset and create association
configfs: fix a use-after-free in __configfs_open_file
arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
perf/core: Flush PMU internal buffers for per-CPU events
perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
powerpc/64s/exception: Clean up a missed SRR specifier
seqlock,lockdep: Fix seqcount_latch_init()
stop_machine: mark helpers __always_inline
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
zram: fix return value on writeback_store
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
sched/membarrier: fix missing local execution of ipi_sync_rq_state()
efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
powerpc: Fix inverted SET_FULL_REGS bitop
powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
binfmt_misc: fix possible deadlock in bm_register_write
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
x86/sev-es: Introduce ip_within_syscall_gap() helper
x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
x86/entry: Move nmi entry/exit into common code
x86/sev-es: Correctly track IRQ states in runtime #VC handler
x86/sev-es: Use __copy_from_user_inatomic()
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
KVM: arm64: Fix range alignment when walking page tables
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: nvhe: Save the SPE context early
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: Fix exclusive limit for IPA size
mm/userfaultfd: fix memory corruption due to writeprotect
mm/madvise: replace ptrace attach requirement for process_madvise
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
xen/events: don't unmask an event channel when an eoi is pending
xen/events: avoid handling the same event on two cpus at the same time
KVM: arm64: Fix nVHE hyp panic host context restore
RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
Linux 5.10.24
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie53a3c1963066a18d41357b6be41cff00690bd40
commit e5113505904ea1c1c0e1f92c1cfa91fbf4da1694 upstream.
When zone reset ioctl and data read race for a same zone on zoned block
devices, the data read leaves stale page cache even though the zone
reset ioctl zero clears all the zone data on the device. To avoid
non-zero data read from the stale page cache after zone reset, discard
page cache of reset target zones in blkdev_zone_mgmt_ioctl(). Introduce
the helper function blkdev_truncate_zone_range() to discard the page
cache. Ensure the page cache discarded by calling the helper function
before and after zone reset in same manner as fallocate does.
This patch can be applied back to the stable kernel version v5.10.y.
Rework is needed for older stable kernels.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 3ed05a987e ("blk-zoned: implement ioctls")
Cc: <stable@vger.kernel.org> # 5.10+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210311072546.678999-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.20
vmlinux.lds.h: add DWARF v5 sections
vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
debugfs: be more robust at handling improper input in debugfs_lookup()
debugfs: do not attempt to create a new file before the filesystem is initalized
scsi: libsas: docs: Remove notify_ha_event()
scsi: qla2xxx: Fix mailbox Ch erroneous error
kdb: Make memory allocations more robust
w1: w1_therm: Fix conversion result for negative temperatures
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: Decline to resize resources if boot config must be preserved
virt: vbox: Do not use wait_event_interruptible when called from kernel context
bfq: Avoid false bfq queue merging
ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
random: fix the RNDRESEEDCRNG ioctl
ALSA: pcm: Call sync_stop at disconnection
ALSA: pcm: Assure sync with the pending stop operation at suspend
ALSA: pcm: Don't call sync_stop if it hasn't been stopped
drm/i915/gt: One more flush for Baytrail clear residuals
ath10k: Fix error handling in case of CE pipe init failure
Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
Bluetooth: hci_uart: Fix a race for write_work scheduling
Bluetooth: Fix initializing response id after clearing struct
arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio
arm64: dts: renesas: beacon: Fix audio-1.8V pin enable
ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops
Bluetooth: hci_qca: Fix memleak in qca_controller_memdump
staging: vchiq: Fix bulk userdata handling
staging: vchiq: Fix bulk transfers on 64-bit builds
arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible
net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
firmware: arm_scmi: Fix call site of scmi_notification_exit
arm64: dts: allwinner: A64: properly connect USB PHY to port 0
arm64: dts: allwinner: H6: properly connect USB PHY to port 0
arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency
arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors
cpufreq: brcmstb-avs-cpufreq: Free resources in error path
cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node
ACPICA: Fix exception code class checks
usb: gadget: u_audio: Free requests only after callback
arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node
soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model()
soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function
staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet
Bluetooth: drop HCI device reference before return
Bluetooth: Put HCI device if inquiry procedure interrupts
memory: ti-aemif: Drop child node when jumping out loop
ARM: dts: Configure missing thermal interrupt for 4430
usb: dwc2: Do not update data length if it is 0 on inbound transfers
usb: dwc2: Abort transaction after errors with unknown reason
usb: dwc2: Make "trimming xfer length" a debug message
staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
arm64: dts: renesas: beacon: Fix EEPROM compatible value
can: mcp251xfd: mcp251xfd_probe(): fix errata reference
ARM: dts: armada388-helios4: assign pinctrl to LEDs
ARM: dts: armada388-helios4: assign pinctrl to each fan
arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware
opp: Correct debug message in _opp_add_static_v2()
Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv
soc: qcom: ocmem: don't return NULL in of_get_ocmem
arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
arm64: dts: meson: fix broken wifi node for Khadas VIM3L
iwlwifi: mvm: set enabled in the PPAG command properly
ARM: s3c: fix fiq for clang IAS
optee: simplify i2c access
staging: wfx: fix possible panic with re-queued frames
ARM: at91: use proper asm syntax in pm_suspend
ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
ath11k: fix a locking bug in ath11k_mac_op_start()
soc: aspeed: snoop: Add clock control logic
iwlwifi: mvm: fix the type we use in the PPAG table validity checks
iwlwifi: mvm: store PPAG enabled/disabled flag properly
iwlwifi: mvm: send stored PPAG command instead of local
iwlwifi: mvm: assign SAR table revision to the command later
iwlwifi: mvm: don't check if CSA event is running before removing
bpf_lru_list: Read double-checked variable once without lock
iwlwifi: pnvm: set the PNVM again if it was already loaded
iwlwifi: pnvm: increment the pointer before checking the TLV
ath9k: fix data bus crash when setting nf_override via debugfs
selftests/bpf: Convert test_xdp_redirect.sh to bash
ibmvnic: Set to CLOSED state even on error
bnxt_en: reverse order of TX disable and carrier off
bnxt_en: Fix devlink info's stored fw.psid version format.
xen/netback: fix spurious event detection for common event case
dpaa2-eth: fix memory leak in XDP_REDIRECT
net: phy: consider that suspend2ram may cut off PHY power
net/mlx5e: Don't change interrupt moderation params when DIM is enabled
net/mlx5e: Change interrupt moderation channel params also when channels are closed
net/mlx5: Fix health error state handling
net/mlx5e: Replace synchronize_rcu with synchronize_net
net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
net/mlx5: Disable devlink reload for multi port slave device
net/mlx5: Disallow RoCE on multi port slave device
net/mlx5: Disallow RoCE on lag device
net/mlx5: Disable devlink reload for lag devices
net/mlx5e: CT: manage the lifetime of the ct entry object
net/mlx5e: Check tunnel offload is required before setting SWP
mac80211: fix potential overflow when multiplying to u32 integers
libbpf: Ignore non function pointer member in struct_ops
bpf: Fix an unitialized value in bpf_iter
bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation
bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
selftests: mptcp: fix ACKRX debug message
tcp: fix SO_RCVLOWAT related hangs under mem pressure
net: axienet: Handle deferred probe on clock properly
cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
bpf: Clear subreg_def for global function return values
ibmvnic: add memory barrier to protect long term buffer
ibmvnic: skip send_request_unmap for timeout reset
net: dsa: felix: perform teardown in reverse order of setup
net: dsa: felix: don't deinitialize unused ports
net: phy: mscc: adding LCPLL reset to VSC8514
net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
net: amd-xgbe: Reset link when the link never comes back
net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
net: mvneta: Remove per-cpu queue mapping for Armada 3700
net: enetc: fix destroyed phylink dereference during unbind
tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
tty: implement read_iter
fbdev: aty: SPARC64 requires FB_ATY_CT
drm/gma500: Fix error return code in psb_driver_load()
gma500: clean up error handling in init
drm/fb-helper: Add missed unlocks in setcmap_legacy()
drm/panel: mantix: Tweak init sequence
drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check
crypto: sun4i-ss - linearize buffers content must be kept
crypto: sun4i-ss - fix kmap usage
crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
hwrng: ingenic - Fix a resource leak in an error handling path
media: allegro: Fix use after free on error
kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state()
drm: rcar-du: Fix PM reference leak in rcar_cmm_enable()
drm: rcar-du: Fix crash when using LVDS1 clock for CRTC
drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node
drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
drm/virtio: make sure context is created in gem open
drm/fourcc: fix Amlogic format modifier masks
media: ipu3-cio2: Build only for x86
media: i2c: ov5670: Fix PIXEL_RATE minimum value
media: imx: Unregister csc/scaler only if registered
media: imx: Fix csc/scaler unregister
media: mtk-vcodec: fix error return code in vdec_vp9_decode()
media: camss: missing error code in msm_video_register()
media: vsp1: Fix an error handling path in the probe function
media: em28xx: Fix use-after-free in em28xx_alloc_urbs
media: media/pci: Fix memleak in empress_init
media: tm6000: Fix memleak in tm6000_start_stream
media: aspeed: fix error return code in aspeed_video_setup_video()
ASoC: cs42l56: fix up error handling in probe
ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai
evm: Fix memleak in init_desc
crypto: bcm - Rename struct device_private to bcm_device_private
sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue
drm/sun4i: tcon: fix inverted DCLK polarity
media: imx7: csi: Fix regression for parallel cameras on i.MX6UL
media: imx7: csi: Fix pad link validation
media: ti-vpe: cal: fix write to unallocated memory
MIPS: properly stop .eh_frame generation
MIPS: Compare __SYNC_loongson3_war against 0
drm/tegra: Fix reference leak when pm_runtime_get_sync() fails
drm/amdgpu: toggle on DF Cstate after finishing xgmi injection
bsg: free the request before return error code
macintosh/adb-iop: Use big-endian autopoll mask
drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
drm/amd/display: Fix HDMI deep color output for DCE 6-11.
media: software_node: Fix refcounts in software_node_get_next_child()
media: lmedm04: Fix misuse of comma
media: vidtv: psi: fix missing crc for PMT
media: atomisp: Fix a buffer overflow in debug code
media: qm1d1c0042: fix error return code in qm1d1c0042_init()
media: cx25821: Fix a bug when reallocating some dma memory
media: mtk-vcodec: fix argument used when DEBUG is defined
media: pxa_camera: declare variable when DEBUG is defined
media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
sched/eas: Don't update misfit status if the task is pinned
f2fs: compress: fix potential deadlock
ASoC: qcom: lpass-cpu: Remove bit clock state check
ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend
perf/arm-cmn: Fix PMU instance naming
perf/arm-cmn: Move IRQs when migrating context
mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions()
crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
crypto: talitos - Fix ctr(aes) on SEC1
drm/nouveau: bail out of nouveau_channel_new if channel init fails
mm: proc: Invalidate TLB after clearing soft-dirty page state
ata: ahci_brcm: Add back regulators management
ASoC: cpcap: fix microphone timeslot mask
ASoC: codecs: add missing max_register in regmap config
mtd: parsers: afs: Fix freeing the part name memory in failure
f2fs: fix to avoid inconsistent quota data
drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
f2fs: fix a wrong condition in __submit_bio
ASoC: qcom: Fix typo error in HDMI regmap config callbacks
KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
drm/mediatek: Check if fb is null
Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E
locking/lockdep: Avoid unmatched unlock
ASoC: qcom: lpass: Fix i2s ctl register bit map
ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown
ASoC: SOF: debug: Fix a potential issue on string buffer termination
btrfs: clarify error returns values in __load_free_space_cache
btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
s390/zcrypt: return EIO when msg retry limit reached
drm/vc4: hdmi: Move hdmi reset to bind
drm/vc4: hdmi: Fix register offset with longer CEC messages
drm/vc4: hdmi: Fix up CEC registers
drm/vc4: hdmi: Restore cec physical address on reconnect
drm/vc4: hdmi: Compute the CEC clock divider from the clock rate
drm/vc4: hdmi: Update the CEC clock divider on HSM rate change
drm/lima: fix reference leak in lima_pm_busy
drm/dp_mst: Don't cache EDIDs for physical ports
hwrng: timeriomem - Fix cooldown period calculation
crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
io_uring: fix possible deadlock in io_uring_poll
nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
nvmet-tcp: fix potential race of tcp socket closing accept_work
nvme-multipath: set nr_zones for zoned namespaces
nvmet: remove extra variable in identify ns
nvmet: set status to 0 in case for invalid nsid
ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk
ima: Free IMA measurement buffer on error
ima: Free IMA measurement buffer after kexec syscall
ASoC: simple-card-utils: Fix device module clock
fs/jfs: fix potential integer overflow on shift of a int
jffs2: fix use after free in jffs2_sum_write_data()
ubifs: Fix memleak in ubifs_init_authentication
ubifs: replay: Fix high stack usage, again
ubifs: Fix error return code in alloc_wbufs()
irqchip/imx: IMX_INTMUX should not default to y, unconditionally
smp: Process pending softirqs in flush_smp_call_function_from_idle()
drm/amdgpu/display: remove hdcp_srm sysfs on device removal
capabilities: Don't allow writing ambiguous v3 file capabilities
HSI: Fix PM usage counter unbalance in ssi_hw_init
power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression
clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
clk: meson: clk-pll: make "ret" a signed integer
clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate()
selftests/powerpc: Make the test check in eeh-basic.sh posix compliant
regulator: qcom-rpmh-regulator: add pm8009-1 chip revision
arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators
quota: Fix memory leak when handling corrupted quota file
i2c: iproc: handle only slave interrupts which are enabled
i2c: iproc: update slave isr mask (ISR_MASK_SLAVE)
i2c: iproc: handle master read request
spi: cadence-quadspi: Abort read if dummy cycles required are too many
clk: sunxi-ng: h6: Fix CEC clock
clk: renesas: r8a779a0: Remove non-existent S2 clock
clk: renesas: r8a779a0: Fix parent of CBFUSA clock
HID: core: detect and skip invalid inputs to snto32()
RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
dmaengine: fsldma: Fix a resource leak in the remove function
dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
dmaengine: owl-dma: Fix a resource leak in the remove function
dmaengine: hsu: disable spurious interrupt
mfd: bd9571mwv: Use devm_mfd_add_devices()
power: supply: cpcap-charger: Fix missing power_supply_put()
power: supply: cpcap-battery: Fix missing power_supply_put()
power: supply: cpcap-charger: Fix power_supply_put on null battery pointer
fdt: Properly handle "no-map" field in the memory region
of/fdt: Make sure no-map does not remove already reserved regions
RDMA/rtrs: Extend ibtrs_cq_qp_create
RDMA/rtrs-srv: Release lock before call into close_sess
RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect
RDMA/rtrs-clt: Set mininum limit when create QP
RDMA/rtrs: Call kobject_put in the failure path
RDMA/rtrs-srv: Fix missing wr_cqe
RDMA/rtrs-clt: Refactor the failure cases in alloc_clt
RDMA/rtrs-srv: Init wr_cnt as 1
power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
rtc: s5m: select REGMAP_I2C
dmaengine: idxd: set DMA channel to be private
power: supply: fix sbs-charger build, needs REGMAP_I2C
clocksource/drivers/ixp4xx: Select TIMER_OF when needed
clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
spi: imx: Don't print error on -EPROBEDEFER
RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
clk: sunxi-ng: h6: Fix clock divider range on some clocks
platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT
platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask
regulator: axp20x: Fix reference cout leak
watch_queue: Drop references to /dev/watch_queue
certs: Fix blacklist flag type confusion
regulator: s5m8767: Fix reference count leak
spi: atmel: Put allocated master before return
regulator: s5m8767: Drop regulators OF node reference
power: supply: axp20x_usb_power: Init work before enabling IRQs
power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable
regulator: core: Avoid debugfs: Directory ... already present! error
isofs: release buffer head before return
watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready
auxdisplay: ht16k33: Fix refresh rate handling
objtool: Fix error handling for STD/CLD warnings
objtool: Fix retpoline detection in asm code
objtool: Fix ".cold" section suffix check for newer versions of GCC
scsi: lpfc: Fix ancient double free
iommu: Switch gather->end to the inclusive end
IB/umad: Return EIO in case of when device disassociated
IB/umad: Return EPOLLERR in case of when device disassociated
KVM: PPC: Make the VMX instruction emulation routines static
powerpc/47x: Disable 256k page size
powerpc/time: Enable sched clock for irqtime
mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function
mmc: sdhci-sprd: Fix some resource leaks in the remove function
mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct
amba: Fix resource leak for drivers without .remove
iommu: Move iotlb_sync_map out from __iommu_map
iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
IB/mlx5: Return appropriate error code instead of ENOMEM
IB/cm: Avoid a loop when device has 255 ports
tracepoint: Do not fail unregistering a probe due to memory failure
rtc: zynqmp: depend on HAS_IOMEM
perf tools: Fix DSO filtering when not finding a map for a sampled address
perf vendor events arm64: Fix Ampere eMag event typo
RDMA/rxe: Fix coding error in rxe_recv.c
RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt
RDMA/rxe: Correct skb on loopback path
spi: stm32: properly handle 0 byte transfer
mfd: altera-sysmgr: Fix physical address storing more
mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
powerpc/pseries/dlpar: handle ibm, configure-connector delay status
powerpc/8xx: Fix software emulation interrupt
clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
kunit: tool: fix unit test cleanup handling
kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
RDMA/hns: Fixed wrong judgments in the goto branch
RDMA/siw: Fix calculation of tx_valid_cpus size
RDMA/hns: Fix type of sq_signal_bits
RDMA/hns: Disable RQ inline by default
clk: divider: fix initialization with parent_hw
spi: pxa2xx: Fix the controller numbering for Wildcat Point
powerpc/uaccess: Avoid might_fault() when user access is enabled
powerpc/kuap: Restore AMR after replaying soft interrupts
regulator: qcom-rpmh: fix pm8009 ldo7
clk: aspeed: Fix APLL calculate formula from ast2600-A2
selftests/ftrace: Update synthetic event syntax errors
perf symbols: Use (long) for iterator for bfd symbols
regulator: bd718x7, bd71828, Fix dvs voltage levels
spi: dw: Avoid stack content exposure
spi: Skip zero-length transfers in spi_transfer_one_message()
printk: avoid prb_first_valid_seq() where possible
perf symbols: Fix return value when loading PE DSO
nfsd: register pernet ops last, unregister first
svcrdma: Hold private mutex while invoking rdma_accept()
ceph: fix flush_snap logic after putting caps
RDMA/hns: Fixes missing error code of CMDQ
RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
RDMA/rtrs-srv: Fix stack-out-of-bounds
RDMA/rtrs: Only allow addition of path to an already established session
RDMA/rtrs-srv: fix memory leak by missing kobject free
RDMA/rtrs-srv-sysfs: fix missing put_device
RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
Input: sur40 - fix an error code in sur40_probe()
perf record: Fix continue profiling after draining the buffer
perf intel-pt: Fix missing CYC processing in PSB
perf intel-pt: Fix premature IPC
perf intel-pt: Fix IPC with CYC threshold
perf test: Fix unaligned access in sample parsing test
Input: elo - fix an error code in elo_connect()
sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
sparc: fix led.c driver when PROC_FS is not enabled
Input: zinitix - fix return type of zinitix_init_touch()
ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled
misc: eeprom_93xx46: Fix module alias to enable module autoprobe
phy: rockchip-emmc: emmc_phy_init() always return 0
phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe()
misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
PCI: rcar: Always allocate MSI addresses in 32bit space
soundwire: cadence: fix ACK/NAK handling
pwm: rockchip: Enable APB clock during register access while probing
pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
pwm: rockchip: Eliminate potential race condition when probing
PCI: xilinx-cpm: Fix reference count leak on error path
VMCI: Use set_page_dirty_lock() when unregistering guest memory
PCI: Align checking of syscall user config accessors
mei: hbm: call mei_set_devstate() on hbm stop response
drm/msm: Fix MSM_INFO_GET_IOVA with carveout
drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
drm/msm/mdp5: Fix wait-for-commit for cmd panels
drm/msm: Fix race of GPU init vs timestamp power management.
drm/msm: Fix races managing the OOB state for timestamp vs timestamps.
drm/msm/dp: trigger unplug event in msm_dp_display_disable
vfio/iommu_type1: Populate full dirty when detach non-pinned group
vfio/iommu_type1: Fix some sanity checks in detach group
vfio-pci/zdev: fix possible segmentation fault issue
ext4: fix potential htree index checksum corruption
phy: USB_LGM_PHY should depend on X86
coresight: etm4x: Skip accessing TRCPDCR in save/restore
nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
nvmem: core: skip child nodes not matching binding
soundwire: bus: use sdw_update_no_pm when initializing a device
soundwire: bus: use sdw_write_no_pm when setting the bus scale registers
soundwire: export sdw_write/read_no_pm functions
soundwire: bus: fix confusion on device used by pm_runtime
misc: fastrpc: fix incorrect usage of dma_map_sgtable
remoteproc/mediatek: acknowledge watchdog IRQ after handled
regmap: sdw: use _no_pm functions in regmap_read/write
ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL
device-dax: Fix default return code of range_parse()
PCI: pci-bridge-emul: Fix array overruns, improve safety
PCI: cadence: Fix DMA range mapping early return error
i40e: Fix flow for IPv6 next header (extension header)
i40e: Add zero-initialization of AQ command structures
i40e: Fix overwriting flow control settings during driver loading
i40e: Fix addition of RX filters after enabling FW LLDP agent
i40e: Fix VFs not created
Take mmap lock in cacheflush syscall
nios2: fixed broken sys_clone syscall
i40e: Fix add TC filter for IPv6
octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()
pwm: iqs620a: Fix overflow and optimize calculations
vfio/type1: Use follow_pte()
ice: report correct max number of TCs
ice: Account for port VLAN in VF max packet size calculation
ice: Fix state bits on LLDP mode switch
ice: update the number of available RSS queues
net: stmmac: fix CBS idleslope and sendslope calculation
net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
PCI: rockchip: Make 'ep-gpios' DT property optional
vxlan: move debug check after netdev unregister
wireguard: device: do not generate ICMP for non-IP packets
wireguard: kconfig: use arm chacha even with no neon
ocfs2: fix a use after free on error
mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
mm: memcontrol: fix slub memory accounting
mm/memory.c: fix potential pte_unmap_unlock pte error
mm/hugetlb: fix potential double free in hugetlb_register_node() error path
mm/hugetlb: suppress wrong warning info when alloc gigantic page
mm/compaction: fix misbehaviors of fast_find_migrateblock()
r8169: fix jumbo packet handling on RTL8168e
NFSv4: Fixes for nfs4_bitmask_adjust()
KVM: SVM: Intercept INVPCID when it's disabled to inject #UD
KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
arm64: Add missing ISB after invalidating TLB in __primary_switch
i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
i2c: exynos5: Preserve high speed master code
mm,thp,shmem: make khugepaged obey tmpfs mount flags
mm: fix memory_failure() handling of dax-namespace metadata
mm/rmap: fix potential pte_unmap on an not mapped pte
proc: use kvzalloc for our kernel buffer
csky: Fix a size determination in gpr_get()
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
block: reopen the device in blkdev_reread_part
ide/falconide: Fix module unload
scsi: sd: Fix Opal support
blk-settings: align max_sectors on "logical_block_size" boundary
soundwire: intel: fix possible crash when no device is detected
ACPI: property: Fix fwnode string properties matching
ACPI: configfs: add missing check after configfs_register_default_group()
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
HID: wacom: Ignore attempts to overwrite the touch_max value from HID
Input: raydium_ts_i2c - do not send zero length
Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
Input: joydev - prevent potential read overflow in ioctl
Input: i8042 - add ASUS Zenbook Flip to noselftest list
media: mceusb: Fix potential out-of-bounds shift
USB: serial: option: update interface mapping for ZTE P685M
usb: musb: Fix runtime PM race in musb_queue_resume_work
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
USB: serial: ftdi_sio: fix FTX sub-integer prescaler
USB: serial: pl2303: fix line-speed handling on newer chips
USB: serial: mos7840: fix error code in mos7840_write()
USB: serial: mos7720: fix error code in mos7720_write()
phy: lantiq: rcu-usb2: wait after clock enable
ALSA: fireface: fix to parse sync status register of latter protocol
ALSA: hda: Add another CometLake-H PCI ID
ALSA: hda/hdmi: Drop bogus check at closing a stream
ALSA: hda/realtek: modify EAPD in the ALC886
ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too
MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes
MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target='
Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y"
Revert "bcache: Kill btree_io_wq"
bcache: Give btree_io_wq correct semantics again
bcache: Move journal work to new flush wq
Revert "drm/amd/display: Update NV1x SR latency values"
drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()
drm/amd/display: Remove Assert from dcn10_get_dig_frontend
drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
drm/amdkfd: Fix recursive lock warnings
drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2)
drm/nouveau/kms: handle mDP connectors
drm/modes: Switch to 64bit maths to avoid integer overflow
drm/sched: Cancel and flush all outstanding jobs before finish.
drm/panel: kd35t133: allow using non-continuous dsi clock
drm/rockchip: Require the YTR modifier for AFBC
ASoC: siu: Fix build error by a wrong const prefix
selinux: fix inconsistency between inode_getxattr and inode_listsecurity
erofs: initialized fields can only be observed after bit is set
tpm_tis: Fix check_locality for correct locality acquisition
tpm_tis: Clean up locality release
KEYS: trusted: Fix incorrect handling of tpm_get_random()
KEYS: trusted: Fix migratable=1 failing
KEYS: trusted: Reserve TPM for seal and unseal operations
btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
btrfs: do not warn if we can't find the reloc root when looking up backref
btrfs: add asserts for deleting backref cache nodes
btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
btrfs: fix reloc root leak with 0 ref reloc roots on recovery
btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
btrfs: account for new extents being deleted in total_bytes_pinned
btrfs: fix extent buffer leak on failure to copy root
drm/i915/gt: Flush before changing register state
drm/i915/gt: Correct surface base address for renderclear
crypto: arm64/sha - add missing module aliases
crypto: aesni - prevent misaligned buffers on the stack
crypto: michael_mic - fix broken misalignment handling
crypto: sun4i-ss - checking sg length is not sufficient
crypto: sun4i-ss - IV register does not work on A10 and A13
crypto: sun4i-ss - handle BigEndian for cipher
crypto: sun4i-ss - initialize need_fallback
soc: samsung: exynos-asv: don't defer early on not-supported SoCs
soc: samsung: exynos-asv: handle reading revision register error
seccomp: Add missing return in non-void function
arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
misc: rtsx: init of rts522a add OCP power off when no card is present
drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
pstore: Fix typo in compression option name
dts64: mt7622: fix slow sd card access
arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2
staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
staging: gdm724x: Fix DMA from stack
staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
floppy: reintroduce O_NDELAY fix
media: i2c: max9286: fix access to unallocated memory
media: ir_toy: add another IR Droid device
media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
media: marvell-ccic: power up the device on mclk enable
media: smipcie: fix interrupt handling and IR timeout
x86/virt: Eat faults on VMXOFF in reboot flows
x86/reboot: Force all cpus to exit VMX root if VMX is supported
x86/fault: Fix AMD erratum #91 errata fixup for user code
x86/entry: Fix instrumentation annotation
powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
kprobes: Fix to delay the kprobes jump optimization
arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs
arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
arm64 module: set plt* section addresses to 0x0
arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
riscv: Disable KSAN_SANITIZE for vDSO
watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ
watchdog: mei_wdt: request stop on unregister
coresight: etm4x: Handle accesses to TRCSTALLCTLR
mtd: spi-nor: sfdp: Fix last erase region marking
mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region
mtd: spi-nor: core: Fix erase type discovery for overlaid region
mtd: spi-nor: core: Add erase size check for erase command initialization
mtd: spi-nor: hisi-sfc: Put child node np on error path
fs/affs: release old buffer head on error path
seq_file: document how per-entry resources are managed.
x86: fix seq_file iteration for pat/memtype.c
mm: memcontrol: fix swap undercounting in cgroup2
mm: memcontrol: fix get_active_memcg return value
hugetlb: fix update_and_free_page contig page struct assumption
hugetlb: fix copy_huge_page_from_user contig page struct assumption
mm/vmscan: restore zone_reclaim_mode ABI
mm, compaction: make fast_isolate_freepages() stay within zone
KVM: nSVM: fix running nested guests when npt=0
nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
mmc: sdhci-esdhc-imx: fix kernel panic when remove module
mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure
powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
powerpc/kexec_file: fix FDT size estimation for kdump kernel
powerpc/32s: Add missing call to kuep_lock on syscall entry
spmi: spmi-pmic-arb: Fix hw_irq overflow
mei: fix transfer over dma with extended header
mei: me: emmitsburg workstation DID
mei: me: add adler lake point S DID
mei: me: add adler lake point LP DID
gpio: pcf857x: Fix missing first interrupt
mfd: gateworks-gsc: Fix interrupt type
printk: fix deadlock when kernel panic
exfat: fix shift-out-of-bounds in exfat_fill_super()
zonefs: Fix file size of zones in full condition
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
proc: don't allow async path resolution of /proc/thread-self components
s390/vtime: fix inline assembly clobber list
virtio/s390: implement virtio-ccw revision 2 correctly
um: mm: check more comprehensively for stub changes
um: defer killing userspace on page table update failures
irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
f2fs: fix out-of-repair __setattr_copy()
f2fs: enforce the immutable flag on open files
f2fs: flush data when enabling checkpoint back
sparc32: fix a user-triggerable oops in clear_user()
spi: fsl: invert spisel_boot signal on MPC8309
spi: spi-synquacer: fix set_cs handling
gfs2: fix glock confusion in function signal_our_withdraw
gfs2: Don't skip dlm unlock if glock has an lvb
gfs2: Lock imbalance on error path in gfs2_recover_one
gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
dm: fix deadlock when swapping to encrypted device
dm table: fix iterate_devices based device capability checks
dm table: fix DAX iterate_devices based device capability checks
dm table: fix zoned iterate_devices based device capability checks
dm writecache: fix performance degradation in ssd mode
dm writecache: return the exact table values that were set
dm writecache: fix writing beyond end of underlying device when shrinking
dm era: Recover committed writeset after crash
dm era: Update in-core bitset after committing the metadata
dm era: Verify the data block size hasn't changed
dm era: Fix bitset memory leaks
dm era: Use correct value size in equality function of writeset tree
dm era: Reinitialize bitset cache before digesting a new writeset
dm era: only resize metadata in preresume
drm/i915: Reject 446-480MHz HDMI clock on GLK
kgdb: fix to kill breakpoints on initmem after boot
ipv6: silence compilation warning for non-IPV6 builds
net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
wireguard: selftests: test multiple parallel streams
wireguard: queueing: get rid of per-peer ring buffers
net: sched: fix police ext initialization
net: qrtr: Fix memory leak in qrtr_tun_open
net_sched: fix RTNL deadlock again caused by request_module()
ARM: dts: aspeed: Add LCLK to lpc-snoop
Linux 5.10.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
commit 97f433c3601a24d3513d06f575a389a2ca4e11e4 upstream.
We get I/O errors when we run md-raid1 on the top of dm-integrity on the
top of ramdisk.
device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1
device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8048, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8147, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8246, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8345, 0xbb
The ramdisk device has logical_block_size 512 and max_sectors 255. The
dm-integrity device uses logical_block_size 4096 and it doesn't affect the
"max_sectors" value - thus, it inherits 255 from the ramdisk. So, we have
a device with max_sectors not aligned on logical_block_size.
The md-raid device sees that the underlying leg has max_sectors 255 and it
will split the bios on 255-sector boundary, making the bios unaligned on
logical_block_size.
In order to fix the bug, we round down max_sectors to logical_block_size.
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4601b4b130de2329fe06df80ed5d77265f2058e5 ]
Historically the BLKRRPART ioctls called into the now defunct ->revalidate
method, which caused the sd driver to check if any media is present.
When the ->revalidate method was removed this revalidation was lost,
leading to lots of I/O errors when using the eject command. Fix this by
reopening the device to rescan the partitions, and thus calling the
revalidation logic in the sd driver.
Fixes: 471bd0af54 ("sd: use bdev_check_media_change")
Reported--by: Tom Seewald <tseewald@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Tom Seewald <tseewald@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 41e76c85660c022c6bf5713bfb6c21e64a487cec upstream.
bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:
[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read
[reader]
numjobs=4
directory=/mnt
where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).
Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.
Fixes: 058fdecc6d ("block, bfq: fix in-service-queue check for queue merging")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a resource-managed variant of blk_ksm_init() so that drivers don't
have to worry about calling blk_ksm_destroy().
Note that the implementation uses a custom devres action to call
blk_ksm_destroy() rather than switching the two allocations to be
directly devres-managed, e.g. with devm_kmalloc(). This is because we
need to keep zeroing the memory containing the keyslots when it is
freed, and also because we want to continue using kvmalloc() (and there
is no devm_kvmalloc()).
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20210121082155.111333-2-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
(cherry picked from commit 5851d3b042b694839d2241fbb3200ce958135cdf)
Bug: 161256007
Change-Id: I3cda7fc5f7509f8d967882a88ca30ddc6f6a0426
Signed-off-by: Eric Biggers <ebiggers@google.com>
Replace the following patches with upstream versions
(well, almost upstream; as of 2021-02-12 they are queued for 5.12 at
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next):
ANDROID-dm-add-support-for-passing-through-inline-crypto-support.patch
ANDROID-dm-enable-may_passthrough_inline_crypto-on-some-targets.patch
ANDROID-block-Introduce-passthrough-keyslot-manager.patch
Also, resolve conflicts with the following non-upstream patches for
hardware-wrapped key support. Notably, we need to handle the field
blk_keyslot_manager::features in a few places:
ANDROID-block-add-hardware-wrapped-key-support.patch
ANDROID-dm-add-support-for-passing-through-derive_raw_secret.patch
Finally, update non-upstream device-mapper targets (dm-bow and
dm-default-key) to use the new way of specifying inline crypto
passthrough support (DM_TARGET_PASSES_CRYPTO) rather than the old way
(may_passthrough_inline_crypto). These changes should be folded into:
ANDROID-dm-bow-Add-dm-bow-feature.patch
ANDROID-dm-add-dm-default-key-target-for-metadata-encryption.patch
Test: tested on db845c; verified that inline crypto support gets passed
through over dm-linear.
Bug: 162257830
Change-Id: I5e3dea1aa09fc1215c90857b5b51d9e3720ef7db
Signed-off-by: Eric Biggers <ebiggers@google.com>
Changes in 5.10.17
objtool: Fix seg fault with Clang non-section symbols
Revert "dts: phy: add GPIO number and active state used for phy reset"
gpio: mxs: GPIO_MXS should not default to y unconditionally
gpio: ep93xx: fix BUG_ON port F usage
gpio: ep93xx: Fix single irqchip with multi gpiochips
tracing: Do not count ftrace events in top level enable output
tracing: Check length before giving out the filter buffer
drm/i915: Fix overlay frontbuffer tracking
arm/xen: Don't probe xenbus as part of an early initcall
cgroup: fix psi monitor for root cgroup
Revert "drm/amd/display: Update NV1x SR latency values"
drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it
drm/dp_mst: Don't report ports connected if nothing is attached to them
dmaengine: move channel device_node deletion to driver
tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1
arm64: dts: rockchip: Fix PCIe DT properties on rk3399
arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc
ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled
arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node
platform/x86: hp-wmi: Disable tablet-mode reporting by default
arm64: dts: rockchip: Disable display for NanoPi R2S
ovl: perform vfs_getxattr() with mounter creds
cap: fix conversions on getxattr
ovl: skip getxattr of security labels
scsi: lpfc: Fix EEH encountering oops with NVMe traffic
x86/split_lock: Enable the split lock feature on another Alder Lake CPU
nvme-pci: ignore the subsysem NQN on Phison E16
drm/amd/display: Fix DPCD translation for LTTPR AUX_RD_INTERVAL
drm/amd/display: Add more Clock Sources to DCN2.1
drm/amd/display: Release DSC before acquiring
drm/amd/display: Fix dc_sink kref count in emulated_link_detect
drm/amd/display: Free atomic state after drm_atomic_commit
drm/amd/display: Decrement refcount of dc_sink before reassignment
riscv: virt_addr_valid must check the address belongs to linear mapping
bfq-iosched: Revert "bfq: Fix computation of shallow depth"
ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
kallsyms: fix nonconverging kallsyms table with lld
ARM: ensure the signal page contains defined contents
ARM: kexec: fix oops after TLB are invalidated
ubsan: implement __ubsan_handle_alignment_assumption
Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
x86/efi: Remove EFI PGD build time checks
lkdtm: don't move ctors to .rodata
KVM: x86: cleanup CR3 reserved bits checks
cgroup-v1: add disabled controller check in cgroup1_parse_param()
dmaengine: idxd: fix misc interrupt completion
ath9k: fix build error with LEDS_CLASS=m
mt76: dma: fix a possible memory leak in mt76_add_fragment()
drm/vc4: hvs: Fix buffer overflow with the dlist handling
dmaengine: idxd: check device state before issue command
bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3
bpf: Check for integer overflow when using roundup_pow_of_two()
netfilter: xt_recent: Fix attempt to update deleted entry
selftests: netfilter: fix current year
netfilter: nftables: fix possible UAF over chains from packet path in netns
netfilter: flowtable: fix tcp and udp header checksum update
xen/netback: avoid race in xenvif_rx_ring_slots_available()
net: hdlc_x25: Return meaningful error code in x25_open
net: ipa: set error code in gsi_channel_setup()
hv_netvsc: Reset the RSC count if NVSP_STAT_FAIL in netvsc_receive()
net: enetc: initialize the RFS and RSS memories
selftests: txtimestamp: fix compilation issue
net: stmmac: set TxQ mode back to DCB after disabling CBS
ibmvnic: Clear failover_pending if unable to schedule
netfilter: conntrack: skip identical origin tuple in same zone only
scsi: scsi_debug: Fix a memory leak
x86/build: Disable CET instrumentation in the kernel for 32-bit too
net: dsa: felix: implement port flushing on .phylink_mac_link_down
net: hns3: add a check for queue_id in hclge_reset_vf_queue()
net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
net: hns3: add a check for index in hclge_get_rss_key()
firmware_loader: align .builtin_fw to 8
drm/sun4i: tcon: set sync polarity for tcon1 channel
drm/sun4i: dw-hdmi: always set clock rate
drm/sun4i: Fix H6 HDMI PHY configuration
drm/sun4i: dw-hdmi: Fix max. frequency for H6
clk: sunxi-ng: mp: fix parent rate change flag check
i2c: stm32f7: fix configuration of the digital filter
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
scripts: set proper OpenSSL include dir also for sign-file
x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init()
arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
rxrpc: Fix clearance of Tx/Rx ring when releasing a call
udp: fix skb_copy_and_csum_datagram with odd segment sizes
net: dsa: call teardown method on probe failure
cpufreq: ACPI: Extend frequency tables to cover boost frequencies
cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
net: gro: do not keep too many GRO packets in napi->rx_list
net: fix iteration for sctp transport seq_files
net/vmw_vsock: fix NULL pointer dereference
net/vmw_vsock: improve locking in vsock_connect_timeout()
net: watchdog: hold device global xmit lock during tx disable
bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
vsock/virtio: update credit only if socket is not closed
vsock: fix locking in vsock_shutdown()
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
net/qrtr: restrict user-controlled length in qrtr_tun_write_iter()
ovl: expand warning in ovl_d_real()
kcov, usb: only collect coverage from __usb_hcd_giveback_urb in softirq
Linux 5.10.17
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id0300681f52b51d3f466f1e66ec3a6c25f65f4d3
[ Upstream commit 388c705b95f23f317fa43e6abf9ff07b583b721a ]
This reverts commit 6d4d273588378c65915acaf7b2ee74e9dd9c130a.
bfq.limit_depth passes word_depths[] as shallow_depth down to sbitmap core
sbitmap_get_shallow, which uses just the number to limit the scan depth of
each bitmap word, formula:
scan_percentage_for_each_word = shallow_depth / (1 << sbimap->shift) * 100%
That means the comments's percentiles 50%, 75%, 18%, 37% of bfq are correct.
But after commit patch 'bfq: Fix computation of shallow depth', we use
sbitmap.depth instead, as a example in following case:
sbitmap.depth = 256, map_nr = 4, shift = 6; sbitmap_word.depth = 64.
The resulsts of computed bfqd->word_depths[] are {128, 192, 48, 96}, and
three of the numbers exceed core dirver's 'sbitmap_word.depth=64' limit
nothing.
Signed-off-by: Lin Feng <linf@wangsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAnzLcACgkQONu9yGCS
aT5FUQ/+LUBYHpWjyV1wrnjwf3AAtcUnZGtPUOEsv9d9lAcituistag2zHXive9g
K7HGria7BVcARnAtdcOLWB7ur9Vj+Ch1XVOVhSdI8EgGPslxoWKxmM03FQtSjQak
OYZAHc/A/mrTtG+rYROx4gp+jxaiaUx8e/zleFgNeN1GU9/owR2H8+d/a2L3bnzN
mgaYG4/0GTy1JfDXwsmiNa376dIViPAkukjS8AV+dPZKFag+TmcE0d/qTtDlmiQO
gSboV/8FzwKgUIxjOt6Rw6AniCfGTew/Dy/NkRiGB4ge5+aMZe78+IZ6xzRlbVix
d1/+7Iviy40pTWOZdRxwefAj0/MS9zZeVrDSA/Ips24EfD/0qxq9QEa3cEXvQkZF
ih5AX9obPBxHRsFwn7x9siP3ZW1W2jaEYzrXxIWBJxFDVRRh3/DMo5rljSkUWxzS
8dBpxfNiRMggsbgKPBtuV5+4Dzdbx5Dn1sbaMgT9pU1f+U0LH0KjIU1evuCFqUo6
C/Y61pDjc8GotBFuKjcbCYBMWpAJ/UwqRn4HrMBRMN+ZOpBQr/2RLaM8ROla8H3W
GrhADQlDuHForKHRuiuBpaUxZGLeZw2dpZClrV0WwzHLLV0KsQC0+xE9ge0/GPtQ
rnJPxYiKg2WJctVBlH2i5uLw6s25+dq4ufSZBmr2AOg8u0YccU4=
=BFeH
-----END PGP SIGNATURE-----
Merge 5.10.16 into android12-5.10
Changes in 5.10.16
io_uring: simplify io_task_match()
io_uring: add a {task,files} pair matching helper
io_uring: don't iterate io_uring_cancel_files()
io_uring: pass files into kill timeouts/poll
io_uring: always batch cancel in *cancel_files()
io_uring: fix files cancellation
io_uring: account io_uring internal files as REQ_F_INFLIGHT
io_uring: if we see flush on exit, cancel related tasks
io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE
io_uring: replace inflight_wait with tctx->wait
io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE
io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE
io_uring: fix list corruption for splice file_get
io_uring: fix sqo ownership false positive warning
io_uring: reinforce cancel on flush during exit
io_uring: drop mm/files between task_work_submit
gpiolib: cdev: clear debounce period if line set to output
powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics
af_key: relax availability checks for skb size calculation
regulator: core: avoid regulator_resolve_supply() race condition
ASoC: wm_adsp: Fix control name parsing for multi-fw
drm/nouveau/nvif: fix method count when pushing an array
mac80211: 160MHz with extended NSS BW in CSA
ASoC: Intel: Skylake: Zero snd_ctl_elem_value
chtls: Fix potential resource leak
pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
pNFS/NFSv4: Improve rejection of out-of-order layouts
ALSA: hda: intel-dsp-config: add PCI id for TGL-H
ASoC: ak4458: correct reset polarity
ASoC: Intel: sof_sdw: set proper flags for Dell TGL-H SKU 0A5E
iwlwifi: mvm: skip power command when unbinding vif during CSA
iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
iwlwifi: pcie: fix context info memory leak
iwlwifi: mvm: invalidate IDs of internal stations at mvm start
iwlwifi: pcie: add rules to match Qu with Hr2
iwlwifi: mvm: guard against device removal in reprobe
iwlwifi: queue: bail out on invalid freeing
SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
SUNRPC: Handle 0 length opaque XDR object data properly
i2c: mediatek: Move suspend and resume handling to NOIRQ phase
blk-cgroup: Use cond_resched() when destroy blkgs
regulator: Fix lockdep warning resolving supplies
bpf: Fix verifier jmp32 pruning decision logic
bpf: Fix 32 bit src register truncation on div/mod
bpf: Fix verifier jsgt branch analysis on max bound
drm/i915: Fix ICL MG PHY vswing handling
drm/i915: Skip vswing programming for TBT
nilfs2: make splice write available again
Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
squashfs: avoid out of bounds writes in decompressors
squashfs: add more sanity checks in id lookup
squashfs: add more sanity checks in inode lookup
squashfs: add more sanity checks in xattr id lookup
Linux 5.10.16
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3d667eb0c90288b118c756a33c70c8ceb097405
[ Upstream commit 6c635caef410aa757befbd8857c1eadde5cc22ed ]
On !PREEMPT kernel, we can get below softlockup when doing stress
testing with creating and destroying block cgroup repeatly. The
reason is it may take a long time to acquire the queue's lock in
the loop of blkcg_destroy_blkgs(), or the system can accumulate a
huge number of blkgs in pathological cases. We can add a need_resched()
check on each loop and release locks and do cond_resched() if true
to avoid this issue, since the blkcg_destroy_blkgs() is not called
from atomic contexts.
[ 4757.010308] watchdog: BUG: soft lockup - CPU#11 stuck for 94s!
[ 4757.010698] Call trace:
[ 4757.010700] blkcg_destroy_blkgs+0x68/0x150
[ 4757.010701] cgwb_release_workfn+0x104/0x158
[ 4757.010702] process_one_work+0x1bc/0x3f0
[ 4757.010704] worker_thread+0x164/0x468
[ 4757.010705] kthread+0x108/0x138
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.13
iwlwifi: provide gso_type to GSO packets
nbd: freeze the queue while we're adding connections
tty: avoid using vfs_iocb_iter_write() for redirected console writes
ACPI: sysfs: Prefer "compatible" modalias
ACPI: thermal: Do not call acpi_thermal_check() directly
kernel: kexec: remove the lock operation of system_transition_mutex
ALSA: hda/realtek: Enable headset of ASUS B1400CEPE with ALC256
ALSA: hda/via: Apply the workaround generically for Clevo machines
parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES
media: cec: add stm32 driver
media: cedrus: Fix H264 decoding
media: hantro: Fix reset_raw_fmt initialization
media: rc: fix timeout handling after switch to microsecond durations
media: rc: ite-cir: fix min_timeout calculation
media: rc: ensure that uevent can be read directly after rc device register
ARM: dts: tbs2910: rename MMC node aliases
ARM: dts: ux500: Reserve memory carveouts
ARM: dts: imx6qdl-gw52xx: fix duplicate regulator naming
wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled
ASoC: AMD Renoir - refine DMI entries for some Lenovo products
Revert "drm/amdgpu/swsmu: drop set_fan_speed_percent (v2)"
drm/nouveau/kms/gk104-gp1xx: Fix > 64x64 cursors
drm/i915: Always flush the active worker before returning from the wait
drm/i915/gt: Always try to reserve GGTT address 0x0
drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes
bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
net: usb: qmi_wwan: added support for Thales Cinterion PLSx3 modem family
s390: uv: Fix sysfs max number of VCPUs reporting
s390/vfio-ap: No need to disable IRQ after queue reset
PM: hibernate: flush swap writer after marking
x86/entry: Emit a symbol for register restoring thunk
efi/apple-properties: Reinstate support for boolean properties
crypto: marvel/cesa - Fix tdma descriptor on 64-bit
drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs
drivers: soc: atmel: add null entry at the end of at91_soc_allowed_list[]
btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
btrfs: fix possible free space tree corruption with online conversion
KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
KVM: arm64: Filter out v8.1+ events on v8.0 HW
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
KVM: x86: get smi pending status correctly
KVM: Forbid the use of tagged userspace addresses for memslots
io_uring: fix wqe->lock/completion_lock deadlock
xen: Fix XenStore initialisation for XS_LOCAL
leds: trigger: fix potential deadlock with libata
arm64: dts: broadcom: Fix USB DMA address translation for Stingray
mt7601u: fix kernel crash unplugging the device
mt76: mt7663s: fix rx buffer refcounting
mt7601u: fix rx buffer refcounting
iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
drm/i915/gt: Clear CACHE_MODE prior to clearing residuals
drm/i915/pmu: Don't grab wakeref when enabling events
net/mlx5e: Fix IPSEC stats
ARM: dts: imx6qdl-kontron-samx6i: fix pwms for lcd-backlight
drm/nouveau/svm: fail NOUVEAU_SVM_INIT ioctl on unsupported devices
drm/vc4: Correct lbm size and calculation
drm/vc4: Correct POS1_SCL for hvs5
drm/nouveau/dispnv50: Restore pushing of all data.
drm/i915: Check for all subplatform bits
drm/i915/selftest: Fix potential memory leak
uapi: fix big endian definition of ipv6_rpl_sr_hdr
KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
tee: optee: replace might_sleep with cond_resched
xen-blkfront: allow discard-* nodes to be optional
blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
clk: imx: fix Kconfig warning for i.MX SCU clk
clk: mmp2: fix build without CONFIG_PM
clk: qcom: gcc-sm250: Use floor ops for sdcc clks
ARM: imx: build suspend-imx6.S with arm instruction set
ARM: zImage: atags_to_fdt: Fix node names on added root nodes
netfilter: nft_dynset: add timeout extension to template
Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"
Revert "block: simplify set_init_blocksize" to regain lost performance
xfrm: Fix oops in xfrm_replay_advance_bmp
xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
selftests: xfrm: fix test return value override issue in xfrm_policy.sh
xfrm: Fix wraparound in xfrm_policy_addr_delta()
arm64: dts: ls1028a: fix the offset of the reset register
ARM: imx: fix imx8m dependencies
ARM: dts: imx6qdl-kontron-samx6i: fix i2c_lcd/cam default status
ARM: dts: imx6qdl-sr-som: fix some cubox-i platforms
arm64: dts: imx8mp: Correct the gpio ranges of gpio3
firmware: imx: select SOC_BUS to fix firmware build
RDMA/cxgb4: Fix the reported max_recv_sge value
ASoC: dt-bindings: lpass: Fix and common up lpass dai ids
ASoC: qcom: Fix incorrect volatile registers
ASoC: qcom: Fix broken support to MI2S TERTIARY and QUATERNARY
ASoC: qcom: lpass-ipq806x: fix bitwidth regmap field
spi: altera: Fix memory leak on error path
ASoC: Intel: Skylake: skl-topology: Fix OOPs ib skl_tplg_complete
powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous interrupt
pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
ASoC: SOF: Intel: soundwire: fix select/depend unmet dependencies
ASoC: qcom: lpass: Fix out-of-bounds DAI ID lookup
iwlwifi: pcie: avoid potential PNVM leaks
iwlwifi: pnvm: don't skip everything when not reloading
iwlwifi: pnvm: don't try to load after failures
iwlwifi: pcie: set LTR on more devices
iwlwifi: pcie: use jiffies for memory read spin time limit
iwlwifi: pcie: reschedule in long-running memory reads
mac80211: pause TX while changing interface type
ice: fix FDir IPv6 flexbyte
ice: Implement flow for IPv6 next header (extension header)
ice: update dev_addr in ice_set_mac_address even if HW filter exists
ice: Don't allow more channels than LAN MSI-X available
ice: Fix MSI-X vector fallback logic
i40e: acquire VSI pointer only after VF is initialized
igc: fix link speed advertising
net/mlx5: Fix memory leak on flow table creation error flow
net/mlx5e: E-switch, Fix rate calculation for overflow
net/mlx5e: free page before return
net/mlx5e: Reduce tc unsupported key print level
net/mlx5: Maintain separate page trees for ECPF and PF functions
net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
net/mlx5e: Fix CT rule + encap slow path offload and deletion
net/mlx5e: Correctly handle changing the number of queues when the interface is down
net/mlx5e: Revert parameters on errors when changing trust state without reset
net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
can: dev: prevent potential information leak in can_fill_info()
ACPI/IORT: Do not blindly trust DMA masks from firmware
of/device: Update dma_range_map only when dev has valid dma-ranges
iommu/amd: Use IVHD EFR for early initialization of IOMMU features
iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()
nvme-multipath: Early exit if no path is available
selftests: forwarding: Specify interface when invoking mausezahn
rxrpc: Fix memory leak in rxrpc_lookup_local
NFC: fix resource leak when target index is invalid
NFC: fix possible resource leak
ASoC: mediatek: mt8183-da7219: ignore TDM DAI link by default
ASoC: mediatek: mt8183-mt6358: ignore TDM DAI link by default
ASoC: topology: Properly unregister DAI on removal
ASoC: topology: Fix memory corruption in soc_tplg_denum_create_values()
scsi: qla2xxx: Fix description for parameter ql2xenforce_iocb_limit
team: protect features update by RCU to avoid deadlock
tcp: make TCP_USER_TIMEOUT accurate for zero window probes
tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN
vsock: fix the race conditions in multi-transport support
Linux 5.10.13
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I75f419b25f24da559e446d62f75ce6bb9b0a5396
commit 2569063c7140c65a0d0ad075e95ddfbcda9ba3c0 upstream.
In case of blk_mq_is_sbitmap_shared(), we should test QUEUE_FLAG_HCTX_ACTIVE against
q->queue_flags instead of BLK_MQ_S_TAG_ACTIVE.
So fix it.
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Fixes: f1b49fdc1c ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It may happen that the underlying device's runtime-pm is
not controlled by block-pm. So it's possible that when
commands are sent to the device, it's suspended and may not
be resumed by blk-pm. Hence explicitly resume the parent
which is the platform device.
Bug: 178653131
Link: https://lore.kernel.org/linux-arm-msm/b1db5394aa3f6cf44cd9adb9c8d569caa0c9e4f5.1611803264.git.asutoshd@codeaurora.org/T/#u
Change-Id: Iead4102ea609d339549694164771e47debb58a00
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAHFpcACgkQONu9yGCS
aT4Vhw/+JLscHnfK//hbS6Nx95MY95VMzy+p2ccADXRy3O/5nr0HwGKnXTKB4Bg+
05S3Hv9ZU/XSszLWvgFQ0Z0peU241ASPz1uLTgtpziBT5plXa5eJULBZ+WknWMef
dNKpvKPphpEbQ0yz6o/4sbNAdiI9BzyGCOicQ2dl9nY7R/JA9YHquUD7iHMnvbs+
yxwwawNHVwszUT/fJT3iFzOAehHGAttHdf3z/bGPS1ogy2S7J5IluJgTAibd3P7G
5o7OUUA5ujEtjBLIkA61fqeL2Qaci83Ff/8KEPEfF1JeLBbMHYcLHnz3RAwBaLZh
nlM4smyTeekcnHIzyRGw16OmpoYwY3MQAt+UFLCzKhlnscB0UqCNkA9zQA9k/taw
cy7/fe5hWFU9DRv4uTUT2H1tkP+pNQ5eIaejPHMtld5JlYXoDN4RyQq7sAyMQgBj
CXADStYSR/f5sWWgRbRs1F7E0lrePsVpjOcqHXxbsS+52yN2CZSKazlOIJ9xArfM
cTzzLUuYbMZoHjIDdMMkjA41VMmyJ+BKrqEgzu3LsJQs57o/ckjnQx4VV5YiHhci
v35OL8oa9IZi8WQikB9bx2WZRWUChOGKwMNeeUwEFD4Zmye1OtyyHuzYQf9QSjRv
zbf1owwsg3xnfkvLcfru8mNMgJkgG8RpuNNVPO8boWZ4pgPu2tk=
=5K55
-----END PGP SIGNATURE-----
Merge 5.10.9 into android12-5.10
Changes in 5.10.9
btrfs: reloc: fix wrong file extent type check to avoid false ENOENT
btrfs: prevent NULL pointer dereference in extent_io_tree_panic
ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines
ALSA: doc: Fix reference to mixart.rst
ASoC: AMD Renoir - add DMI entry for Lenovo ThinkPad X395
ASoC: dapm: remove widget from dirty list on free
x86/hyperv: check cpu mask after interrupt has been disabled
drm/amdgpu: add green_sardine device id (v2)
drm/amdgpu: fix DRM_INFO flood if display core is not supported (bug 210921)
Revert "drm/amd/display: Fixed Intermittent blue screen on OLED panel"
drm/amdgpu: add new device id for Renior
drm/i915: Allow the sysadmin to override security mitigations
drm/i915/gt: Limit VFE threads based on GT
drm/i915/backlight: fix CPU mode backlight takeover on LPT
drm/bridge: sii902x: Refactor init code into separate function
dt-bindings: display: sii902x: Add supply bindings
drm/bridge: sii902x: Enable I/O and core VCC supplies if present
tracing/kprobes: Do the notrace functions check without kprobes on ftrace
tools/bootconfig: Add tracing_on support to helper scripts
ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
ext4: fix wrong list_splice in ext4_fc_cleanup
ext4: fix bug for rename with RENAME_WHITEOUT
cifs: check pointer before freeing
cifs: fix interrupted close commands
riscv: Drop a duplicated PAGE_KERNEL_EXEC
riscv: return -ENOSYS for syscall -1
riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
riscv: Fix KASAN memory mapping.
mips: fix Section mismatch in reference
mips: lib: uncached: fix non-standard usage of variable 'sp'
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
MIPS: relocatable: fix possible boot hangup with KASLR enabled
RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()
ACPI: scan: Harden acpi_device_add() against device ID overflows
xen/privcmd: allow fetching resource sizes
compiler.h: Raise minimum version of GCC to 5.1 for arm64
mm/vmalloc.c: fix potential memory leak
mm/hugetlb: fix potential missing huge page size info
mm/process_vm_access.c: include compat.h
dm raid: fix discard limits for raid1
dm snapshot: flush merged data before committing metadata
dm integrity: fix flush with external metadata device
dm integrity: fix the maximum number of arguments
dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
dm crypt: do not wait for backlogged crypto request completion in softirq
dm crypt: do not call bio_endio() from the dm-crypt tasklet
dm crypt: defer decryption to a tasklet if interrupts disabled
stmmac: intel: change all EHL/TGL to auto detect phy addr
r8152: Add Lenovo Powered USB-C Travel Hub
btrfs: tree-checker: check if chunk item end overflows
ext4: don't leak old mountpoint samples
io_uring: don't take files/mm for a dead task
io_uring: drop mm and files after task_work_run
ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
ARC: build: add uImage.lzma to the top-level target
ARC: build: add boot_targets to PHONY
ARC: build: move symlink creation to arch/arc/Makefile to avoid race
ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875
ath11k: fix crash caused by NULL rx_channel
netfilter: ipset: fixes possible oops in mtype_resize
ath11k: qmi: try to allocate a big block of DMA memory first
btrfs: fix async discard stall
btrfs: merge critical sections of discard lock in workfn
btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
regulator: bd718x7: Add enable times
ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
ARM: dts: ux500/golden: Set display max brightness
habanalabs: adjust pci controller init to new firmware
habanalabs/gaudi: retry loading TPC f/w on -EINTR
habanalabs: register to pci shutdown callback
staging: spmi: hisi-spmi-controller: Fix some error handling paths
spi: altera: fix return value for altera_spi_txrx()
habanalabs: Fix memleak in hl_device_reset
hwmon: (pwm-fan) Ensure that calculation doesn't discard big period values
lib/raid6: Let $(UNROLL) rules work with macOS userland
kconfig: remove 'kvmconfig' and 'xenconfig' shorthands
spi: fix the divide by 0 error when calculating xfer waiting time
io_uring: drop file refs after task cancel
bfq: Fix computation of shallow depth
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
misdn: dsp: select CONFIG_BITREVERSE
net: ethernet: fs_enet: Add missing MODULE_LICENSE
selftests: fix the return value for UDP GRO test
nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
nvme: avoid possible double fetch in handling CQE
nvmet-rdma: Fix list_del corruption on queue establishment failure
drm/amd/display: fix sysfs amdgpu_current_backlight_pwm NULL pointer issue
drm/amdgpu: fix a GPU hang issue when remove device
drm/amd/pm: fix the failure when change power profile for renoir
drm/amdgpu: fix potential memory leak during navi12 deinitialization
usb: typec: Fix copy paste error for NVIDIA alt-mode description
iommu/vt-d: Fix lockdep splat in sva bind()/unbind()
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
drm/msm: Call msm_init_vram before binding the gpu
ARM: picoxcell: fix missing interrupt-parent properties
poll: fix performance regression due to out-of-line __put_user()
rcu-tasks: Move RCU-tasks initialization to before early_initcall()
bpf: Simplify task_file_seq_get_next()
bpf: Save correct stopping point in file seq iteration
x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
cfg80211: select CONFIG_CRC32
nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context
iommu/vt-d: Update domain geometry in iommu_ops.at(de)tach_dev
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
net/mlx5: Fix passing zero to 'PTR_ERR'
net/mlx5: E-Switch, fix changing vf VLANID
blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
mm: fix clear_refs_write locking
mm: don't play games with pinned pages in clear_page_refs
mm: don't put pinned pages into the swap cache
perf intel-pt: Fix 'CPU too large' error
dump_common_audit_data(): fix racy accesses to ->d_name
ASoC: meson: axg-tdm-interface: fix loopback
ASoC: meson: axg-tdmin: fix axg skew offset
ASoC: Intel: fix error code cnl_set_dsp_D0()
nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
nvme: don't intialize hwmon for discovery controllers
nvme-tcp: fix possible data corruption with bio merges
nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
pNFS: We want return-on-close to complete when evicting the inode
pNFS: Mark layout for return if return-on-close was not sent
pNFS: Stricter ordering of layoutget and layoutreturn
NFS: Adjust fs_context error logging
NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
NFS: nfs_delegation_find_inode_server must first reference the superblock
NFS: nfs_igrab_and_active must first reference the superblock
scsi: ufs: Fix possible power drain during system suspend
ext4: fix superblock checksum failure when setting password salt
RDMA/restrack: Don't treat as an error allocation ID wrapping
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
bnxt_en: Improve stats context resource accounting with RDMA driver loaded.
RDMA/mlx5: Fix wrong free of blue flame register on error
IB/mlx5: Fix error unwinding when set_has_smi_cap fails
umount(2): move the flag validity checks first
dm zoned: select CONFIG_CRC32
drm/i915/dsi: Use unconditional msleep for the panel_on_delay when there is no reset-deassert MIPI-sequence
drm/i915/icl: Fix initing the DSI DSC power refcount during HW readout
drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail
mm, slub: consider rest of partial list if acquire_slab() fails
riscv: Trace irq on only interrupt is enabled
iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev()
net: sunrpc: interpret the return value of kstrtou32 correctly
selftests: netfilter: Pass family parameter "-f" to conntrack tool
dm: eliminate potential source of excessive kernel log noise
ALSA: fireface: Fix integer overflow in transmit_midi_msg()
ALSA: firewire-tascam: Fix integer overflow in midi_port_work()
netfilter: conntrack: fix reading nf_conntrack_buckets
netfilter: nf_nat: Fix memleak in nf_nat_init
Linux 5.10.9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I609e501511889081e03d2d18ee7e1be95406f396
[ Upstream commit 02f938e9fed1681791605ca8b96c2d9da9355f6a ]
Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
something like:
root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3
Add the decoding for that flag.
Fixes: 32bc15afed ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d4d273588378c65915acaf7b2ee74e9dd9c130a ]
BFQ computes number of tags it allows to be allocated for each request type
based on tag bitmap. However it uses 1 << bitmap.shift as number of
available tags which is wrong. 'shift' is just an internal bitmap value
containing logarithm of how many bits bitmap uses in each bitmap word.
Thus number of tags allowed for some request types can be far to low.
Use proper bitmap.depth which has the number of tags instead.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.8
powerpc/32s: Fix RTAS machine check with VMAP stack
io_uring: synchronise IOPOLL on task_submit fail
io_uring: limit {io|sq}poll submit locking scope
io_uring: patch up IOPOLL overflow_flush sync
RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context
drm/panfrost: Don't corrupt the queue mutex on open/close
io_uring: Fix return value from alloc_fixed_file_ref_node
scsi: ufs: Fix -Wsometimes-uninitialized warning
btrfs: skip unnecessary searches for xattrs when logging an inode
btrfs: fix deadlock when cloning inline extent and low on free metadata space
btrfs: shrink delalloc pages instead of full inodes
net: cdc_ncm: correct overhead in delayed_ndp_size
net: hns3: fix incorrect handling of sctp6 rss tuple
net: hns3: fix the number of queues actually used by ARQ
net: hns3: fix a phy loopback fail issue
net: stmmac: dwmac-sun8i: Fix probe error handling
net: stmmac: dwmac-sun8i: Balance internal PHY resource references
net: stmmac: dwmac-sun8i: Balance internal PHY power
net: stmmac: dwmac-sun8i: Balance syscon (de)initialization
net: vlan: avoid leaks on register_vlan_dev() failures
net/sonic: Fix some resource leaks in error handling paths
net: bareudp: add missing error handling for bareudp_link_config()
ptp: ptp_ines: prevent build when HAS_IOMEM is not set
net: ipv6: fib: flush exceptions when purging route
tools: selftests: add test for changing routes with PTMU exceptions
net: fix pmtu check in nopmtudisc mode
net: ip: always refragment ip defragmented packets
chtls: Fix hardware tid leak
chtls: Remove invalid set_tcb call
chtls: Fix panic when route to peer not configured
chtls: Avoid unnecessary freeing of oreq pointer
chtls: Replace skb_dequeue with skb_peek
chtls: Added a check to avoid NULL pointer dereference
chtls: Fix chtls resources release sequence
octeontx2-af: fix memory leak of lmac and lmac->name
nexthop: Fix off-by-one error in error path
nexthop: Unlink nexthop group entry in error path
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
s390/qeth: fix deadlock during recovery
s390/qeth: fix locking for discipline setup / removal
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
net/mlx5e: ethtool, Fix restriction of autoneg with 56G
net/mlx5e: In skb build skip setting mark in switchdev mode
net/mlx5: Check if lag is supported before creating one
scsi: lpfc: Fix variable 'vport' set but not used in lpfc_sli4_abts_err_handler()
ionic: start queues before announcing link up
HID: wacom: Fix memory leakage caused by kfifo_alloc
fanotify: Fix sys_fanotify_mark() on native x86-32
ARM: OMAP2+: omap_device: fix idling of devices during probe
i2c: sprd: use a specific timeout to avoid system hang up issue
dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
selftests/bpf: Clarify build error if no vmlinux
can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
can: kvaser_pciefd: select CONFIG_CRC32
spi: spi-geni-qcom: Fail new xfers if xfer/cancel/abort pending
cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()
spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case
spi: stm32: FIFO threshold level - fix align packet size
i2c: i801: Fix the i2c-mux gpiod_lookup_table not being properly terminated
i2c: mediatek: Fix apdma and i2c hand-shake timeout
bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
interconnect: imx: Add a missing of_node_put after of_device_is_available
interconnect: qcom: fix rpmh link failures
dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function
dmaengine: milbeaut-xdmac: Fix a resource leak in the error handling path of the probe function
dmaengine: xilinx_dma: check dma_async_device_register return value
dmaengine: xilinx_dma: fix incompatible param warning in _child_probe()
dmaengine: xilinx_dma: fix mixed_enum_type coverity warning
arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA
qed: select CONFIG_CRC32
phy: dp83640: select CONFIG_CRC32
wil6210: select CONFIG_CRC32
block: rsxx: select CONFIG_CRC32
lightnvm: select CONFIG_CRC32
zonefs: select CONFIG_CRC32
iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()
iommu/intel: Fix memleak in intel_irq_remapping_alloc
bpftool: Fix compilation failure for net.o with older glibc
nvme-tcp: Fix possible race of io_work and direct send
net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
net/mlx5e: Fix two double free cases
regmap: debugfs: Fix a memory leak when calling regmap_attach_dev
wan: ds26522: select CONFIG_BITREVERSE
arm64: cpufeature: remove non-exist CONFIG_KVM_ARM_HOST
regulator: qcom-rpmh-regulator: correct hfsmps515 definition
net: mvpp2: disable force link UP during port init procedure
drm/i915/dp: Track pm_qos per connector
net: mvneta: fix error message when MTU too large for XDP
selftests: fib_nexthops: Fix wrong mausezahn invocation
KVM: arm64: Don't access PMCR_EL0 when no PMU is available
xsk: Fix race in SKB mode transmit with shared cq
xsk: Rollback reservation at NETDEV_TX_BUSY
block/rnbd-clt: avoid module unload race with close confirmation
can: isotp: isotp_getname(): fix kernel information leak
block: fix use-after-free in disk_part_iter_next
net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet
regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init()
drm/panfrost: Remove unused variables in panfrost_job_close()
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
Linux 5.10.8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib8272ec9f47a3c3813509bcacece3b16137332e1
commit aebf5db917055b38f4945ed6d621d9f07a44ff30 upstream.
Make sure that bdgrab() is done on the 'block_device' instance before
referring to it for avoiding use-after-free.
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.7
i40e: Fix Error I40E_AQ_RC_EINVAL when removing VFs
iavf: fix double-release of rtnl_lock
net/sched: sch_taprio: ensure to reset/destroy all child qdiscs
net: mvpp2: Add TCAM entry to drop flow control pause frames
net: mvpp2: prs: fix PPPoE with ipv6 packet parse
net: systemport: set dev->max_mtu to UMAC_MAX_MTU_SIZE
ethernet: ucc_geth: fix use-after-free in ucc_geth_remove()
ethernet: ucc_geth: set dev->max_mtu to 1518
ionic: account for vlan tag len in rx buffer len
atm: idt77252: call pci_disable_device() on error path
net: mvpp2: Fix GoP port 3 Networking Complex Control configurations
net: stmmac: dwmac-meson8b: ignore the second clock input
ibmvnic: fix login buffer memory leak
ibmvnic: continue fatal error reset after passive init
net: ethernet: mvneta: Fix error handling in mvneta_probe
qede: fix offload for IPIP tunnel packets
virtio_net: Fix recursive call to cpus_read_lock()
net/ncsi: Use real net-device for response handler
net: ethernet: Fix memleak in ethoc_probe
net-sysfs: take the rtnl lock when storing xps_cpus
net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc
net-sysfs: take the rtnl lock when storing xps_rxqs
net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc
net: ethernet: ti: cpts: fix ethtool output when no ptp_clock registered
tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
e1000e: Only run S0ix flows if shutdown succeeded
e1000e: bump up timeout to wait when ME un-configures ULP mode
Revert "e1000e: disable s0ix entry and exit flows for ME systems"
e1000e: Export S0ix flags to ethtool
bnxt_en: Check TQM rings for maximum supported value.
net: mvpp2: fix pkt coalescing int-threshold configuration
bnxt_en: Fix AER recovery.
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
net: sched: prevent invalid Scell_log shift count
net: hns: fix return value check in __lb_other_process()
erspan: fix version 1 check in gre_parse_header()
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
bareudp: set NETIF_F_LLTX flag
bareudp: Fix use of incorrect min_headroom size
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
r8169: work around power-saving bug on some chip versions
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
CDC-NCM: remove "connected" log message
ibmvnic: fix: NULL pointer dereference.
net: usb: qmi_wwan: add Quectel EM160R-GL
selftests: mlxsw: Set headroom size of correct port
stmmac: intel: Add PCI IDs for TGL-H platform
selftests/vm: fix building protection keys test
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
workqueue: Kick a worker based on the actual activation of delayed works
scsi: ufs: Fix wrong print message in dev_err()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
local64.h: make <asm/local64.h> mandatory
lib/genalloc: fix the overflow when size is too big
depmod: handle the case of /sbin/depmod without /sbin in PATH
scsi: ufs: Clear UAC for FFU and RPMB LUNs
kbuild: don't hardcode depmod path
Bluetooth: revert: hci_h5: close serdev device and free hu in h5_close
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: block: Do not accept any requests while suspended
crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
crypto: asym_tpm: correct zero out potential secrets
powerpc: Handle .text.{hot,unlikely}.* in linker script
Staging: comedi: Return -EFAULT if copy_to_user() fails
staging: mt7621-dma: Fix a resource leak in an error handling path
usb: gadget: enable super speed plus
USB: cdc-acm: blacklist another IR Droid device
USB: cdc-wdm: Fix use after free in service_outstanding_interrupt().
usb: typec: intel_pmc_mux: Configure HPD first for HPD+IRQ request
usb: dwc3: meson-g12a: disable clk on error handling path in probe
usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
usb: dwc3: gadget: Clear wait flag on dequeue
usb: dwc3: ulpi: Use VStsDone to detect PHY regs access completion
usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend regression
usb: chipidea: ci_hdrc_imx: add missing put_device() call in usbmisc_get_init_data()
USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set
usb: usbip: vhci_hcd: protect shift size
usb: uas: Add PNY USB Portable SSD to unusual_uas
USB: serial: iuu_phoenix: fix DMA from stack
USB: serial: option: add LongSung M5710 module support
USB: serial: option: add Quectel EM160R-GL
USB: yurex: fix control-URB timeout handling
USB: usblp: fix DMA to stack
ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks
usb: gadget: select CONFIG_CRC32
USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug
usb: gadget: f_uac2: reset wMaxPacketSize
usb: gadget: function: printer: Fix a memory leak for interface descriptor
usb: gadget: u_ether: Fix MTU size mismatch with RX packet size
USB: gadget: legacy: fix return error code in acm_ms_bind()
usb: gadget: Fix spinlock lockup on usb_function_deactivate
usb: gadget: configfs: Preserve function ordering after bind failure
usb: gadget: configfs: Fix use-after-free issue with udc_name
USB: serial: keyspan_pda: remove unused variable
hwmon: (amd_energy) fix allocation of hwmon_channel_info config
mm: make wait_on_page_writeback() wait for multiple pending writebacks
x86/mm: Fix leak of pmd ptlock
KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
kvm: check tlbs_dirty directly
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
x86/resctrl: Don't move a task to the same resource group
blk-iocost: fix NULL iocg deref from racing against initialization
ALSA: hda/via: Fix runtime PM for Clevo W35xSS
ALSA: hda/conexant: add a new hda codec CX11970
ALSA: hda/realtek - Fix speaker volume control on Lenovo C940
ALSA: hda/realtek: Add mute LED quirk for more HP laptops
ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7
ALSA: hda/realtek: Add two "Intel Reference board" SSID in the ALC256.
iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev
btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
btrfs: send: fix wrong file path when there is an inode with a pending rmdir
Revert "device property: Keep secondary firmware node secondary by type"
dmabuf: fix use-after-free of dmabuf's file->f_inode
arm64: link with -z norelro for LLD or aarch64-elf
drm/i915: clear the shadow batch
drm/i915: clear the gpu reloc batch
bcache: fix typo from SUUP to SUPP in features.h
bcache: check unsupported feature sets for bcache register
bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
net/mlx5e: Fix SWP offsets when vlan inserted by driver
ARM: dts: OMAP3: disable AES on N950/N9
netfilter: x_tables: Update remaining dereference to RCU
netfilter: ipset: fix shift-out-of-bounds in htable_bits()
netfilter: xt_RATEEST: reject non-null terminated string from userspace
netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
dmaengine: idxd: off by one in cleanup code
x86/mtrr: Correct the range check before performing MTRR type lookups
KVM: x86: fix shift out of bounds reported by UBSAN
xsk: Fix memory leak for failed bind
rtlwifi: rise completion at the last step of firmware callback
scsi: target: Fix XCOPY NAA identifier lookup
Linux 5.10.7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1a7c195af35831fe362b027fe013c0c7e4dc20ea
commit d16baa3f1453c14d680c5fee01cd122a22d0e0ce upstream.
When initializing iocost for a queue, its rqos should be registered before
the blkcg policy is activated to allow policy data initiailization to lookup
the associated ioc. This unfortunately means that the rqos methods can be
called on bios before iocgs are attached to all existing blkgs.
While the race is theoretically possible on ioc_rqos_throttle(), it mostly
happened in ioc_rqos_merge() due to the difference in how they lookup ioc.
The former determines it from the passed in @rqos and then bails before
dereferencing iocg if the looked up ioc is disabled, which most likely is
the case if initialization is still in progress. The latter looked up ioc by
dereferencing the possibly NULL iocg making it a lot more prone to actually
triggering the bug.
* Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look
up ioc for consistency.
* Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before
dereferencing it.
* Explain the danger of NULL iocgs in blk_iocost_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jonathan Lemon <bsd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 52abca64fd9410ea6c9a3a74eab25663b403d7da ]
blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime
power management state. Now that SCSI domain validation no longer depends
on this behavior, modify the behavior of blk_queue_enter() as follows:
- Do not accept any requests while suspended.
- Only process power management requests while suspending or resuming.
Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended
causes runtime-suspended devices not to resume as they should. The request
which should cause a runtime resume instead gets issued directly, without
resuming the device first. Of course the device can't handle it properly,
the I/O fails, and the device remains suspended.
The problem is fixed by checking that the queue's runtime-PM status isn't
RPM_SUSPENDED before allowing a request to be issued, and queuing a
runtime-resume request if it is. In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the
code is unified by merging the surrounding checks into the routine. If the
queue isn't set up for runtime PM, or there currently is no restriction on
allowed requests, the request is allowed. Likewise if the BLK_MQ_REQ_PM
flag is set and the status isn't RPM_SUSPENDED. Otherwise a runtime resume
is queued and the request is blocked until conditions are more suitable.
[ bvanassche: modified commit message and removed Cc: stable because
without the previous patches from this series this patch would break
parallel SCSI domain validation + introduced queue_rpm_status() ]
Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4d34da715e3cb7e0741fe603dcd511bed067e00 ]
Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer
used by any kernel code.
Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Martin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0854bcdcdec26aecdc92c303816f349ee1fba2bc ]
Introduce the BLK_MQ_REQ_PM flag. This flag makes the request allocation
functions set RQF_PM. This is the first step towards removing
BLK_MQ_REQ_PREEMPT.
Link: https://lore.kernel.org/r/20201209052951.16136-3-bvanassche@acm.org
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Can Guo <cang@codeaurora.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc30432605bbbd486dfede3852ea4d42c40a84b4 ]
This was missed in 021a24460d. Leads to the numeric value of
QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
/sys/kernel/debug/block/*/state.
Fixes: 021a24460d
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.5
net/sched: sch_taprio: reset child qdiscs before freeing them
mptcp: fix security context on server socket
ethtool: fix error paths in ethnl_set_channels()
ethtool: fix string set id check
md/raid10: initialize r10_bio->read_slot before use.
drm/amd/display: Add get_dig_frontend implementation for DCEx
io_uring: close a small race gap for files cancel
jffs2: Allow setting rp_size to zero during remounting
jffs2: Fix NULL pointer dereference in rp_size fs option parsing
spi: dw-bt1: Fix undefined devm_mux_control_get symbol
opp: fix memory leak in _allocate_opp_table
opp: Call the missing clk_put() on error
scsi: block: Fix a race in the runtime power management code
mm/hugetlb: fix deadlock in hugetlb_cow error path
mm: memmap defer init doesn't work as expected
lib/zlib: fix inflating zlib streams on s390
io_uring: don't assume mm is constant across submits
io_uring: use bottom half safe lock for fixed file data
io_uring: add a helper for setting a ref node
io_uring: fix io_sqe_files_unregister() hangs
uapi: move constants from <linux/kernel.h> to <linux/const.h>
tools headers UAPI: Sync linux/const.h with the kernel headers
cgroup: Fix memory leak when parsing multiple source parameters
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
scsi: cxgb4i: Fix TLS dependency
Bluetooth: hci_h5: close serdev device and free hu in h5_close
fbcon: Disable accelerated scrolling
reiserfs: add check for an invalid ih_entry_count
misc: vmw_vmci: fix kernel info-leak by initializing dbells in vmci_ctx_get_chkpt_doorbells()
media: gp8psk: initialize stats at power control logic
f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
ALSA: seq: Use bool for snd_seq_queue internal flags
ALSA: rawmidi: Access runtime->avail always in spinlock
bfs: don't use WARNING: string when it's just info.
ext4: check for invalid block size early when mounting a file system
fcntl: Fix potential deadlock in send_sig{io, urg}()
io_uring: check kthread stopped flag when sq thread is unparked
rtc: sun6i: Fix memleak in sun6i_rtc_clk_init
module: set MODULE_STATE_GOING state when a module fails to load
quota: Don't overflow quota file offsets
rtc: pl031: fix resource leak in pl031_probe
powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
i3c master: fix missing destroy_workqueue() on error in i3c_master_register
NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode
f2fs: avoid race condition for shrinker count
f2fs: fix race of pending_pages in decompression
module: delay kobject uevent until after module init call
powerpc/64: irq replay remove decrementer overflow check
fs/namespace.c: WARN if mnt_count has become negative
watchdog: rti-wdt: fix reference leak in rti_wdt_probe
um: random: Register random as hwrng-core device
um: ubd: Submit all data segments atomically
NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails
drm/amd/display: updated wm table for Renoir
tick/sched: Remove bogus boot "safety" check
s390: always clear kernel stack backchain before calling functions
io_uring: remove racy overflow list fast checks
ALSA: pcm: Clear the full allocated memory at hw_params
dm verity: skip verity work if I/O error when system is shutting down
ext4: avoid s_mb_prefetch to be zero in individual scenarios
device-dax: Fix range release
Linux 5.10.5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2b481bfac06bafdef2cf3cc1ac2c2a4ddf9913dc
commit fa4d0f1992a96f6d7c988ef423e3127e613f6ac9 upstream.
With the current implementation the following race can happen:
* blk_pre_runtime_suspend() calls blk_freeze_queue_start() and
blk_mq_unfreeze_queue().
* blk_queue_enter() calls blk_queue_pm_only() and that function returns
true.
* blk_queue_enter() calls blk_pm_request_resume() and that function does
not call pm_request_resume() because the queue runtime status is
RPM_ACTIVE.
* blk_pre_runtime_suspend() changes the queue status into RPM_SUSPENDING.
Fix this race by changing the queue runtime status into RPM_SUSPENDING
before switching q_usage_counter to atomic mode.
Link: https://lore.kernel.org/r/20201209052951.16136-2-bvanassche@acm.org
Fixes: 986d413b7c ("blk-mq: Enable support for runtime power management")
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/L/n0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptxfD/9Dvasr1LIF9jTL5nU3iwwCAKVneI0VXsvn
x9xLv+3AlvyhKJSpDxjyYrrsNwo2r2Hay/3Yix079wMyDNcjuUeaDVscQ2Ed6WXI
slOhefyd4sgZf0kpB7TkwSrU1idoSEhOgcbZSiPLnlb071rYbJm5M6W15j9knUW0
HfomlryBgTX6b0et+YHvboTQ+31ZWTpZdMk9XEvQhCVynPOwU858rRKnFZUQSJ04
LGKXcA5xchxD8HVImwGv6QFTtFSzoyT3g68fEXEM9rPoEM8T0utVFPfF7o8L4Oui
nLxpBThwN18xwcc2iei8Fko2H6RgAmaD4cR4rgZaesa4QavPtT8N+JebMX47Tvhy
BBUbD/gRMPt1nO9ufPuLDyYBda2Ne5Z1DkBSIuYgZrreiOBdYGaJDbGIJ9uGwi+j
u4QvooSMoXb/7XGojxaJCjPDlNyhOb+fbZaOJDU49xQR3k01B6vteDnxGBlvbYHn
xuC82LTpTkWQT65ciaFAsW0L2lE6zlgkWWP3ahzH/qAg0HvODT9LDPNxvUpgZqUT
zoUHua3jAY2JBudUMLlacD+/ZS/T7krMl2XLSvyKaOIAyAlaluxK/uykQ+Z3sWgF
XGYQ/yv/4mQ1rTozqRFKndRPN++FCRzIwcHk8iBQCRlh8yLB/gVBr/Q6SgJrdnEj
WWwc1tRmwA==
=o4GN
-----END PGP SIGNATURE-----
Merge tag 'block-5.10-2020-12-05' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Single fix for an issue with chunk_sectors and stacked devices"
* tag 'block-5.10-2020-12-05' of git://git.kernel.dk/linux-block:
block: use gcd() to fix chunk_sectors limit stacking
Restores splitting in terms of varied per-target ti->max_io_len
rather than use block core's single stacked 'chunk_sectors' limit.
- Like DM crypt, update DM integrity to not use crypto drivers that
have CRYPTO_ALG_ALLOCATES_MEMORY set.
- Fix DM writecache target's argument parsing and status display.
- Remove needless BUG() from dm writecache's persistent_memory_claim()
- Remove old gcc workaround in DM cache target's block_div() for ARM
link errors now that gcc >= 4.9 is required.
- Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range.
- Remove old, and now frowned upon, BUG_ON(in_interrupt()) in
dm_table_event().
- Remove invalid sparse annotations from dm_prepare_ioctl() and
dm_unprepare_ioctl().
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl/KonYTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWgSJCACNYEndubrROJZL+FOUQixzZQiOfphw
9Brb/XbdXWXIv7F+JV85E6olOqz7JjTGrO91uD5kwHEtVhDx5zT/GCm+5FoBrLa/
FuTphPRWNimZSU1umJe2AG9hOiDpPJJUe/wwj3QkBH2TeEHwBHblB8BkRFzxnP+p
0dGybrQBMtrH3GO65YG7qaASeBPl1+G3mVHfzViyhk1uoZL1y9pKbzPK60TkHcsa
VCGTPke5Ri3hvd85hmpDcXmyxjxZfCA8Jc/DrQ+DDEwakHoJFwlSzP7fqwHnpKHT
RDL4iOID54SViSGqzNcxlGtr/EHyN9Mom2d4Nnb0cgsRG4woCeJWJZMM
=+o1m
-----END PGP SIGNATURE-----
Merge tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM's bio splitting changes that were made during v5.9. This
restores splitting in terms of varied per-target ti->max_io_len
rather than use block core's single stacked 'chunk_sectors' limit.
- Like DM crypt, update DM integrity to not use crypto drivers that
have CRYPTO_ALG_ALLOCATES_MEMORY set.
- Fix DM writecache target's argument parsing and status display.
- Remove needless BUG() from dm writecache's persistent_memory_claim()
- Remove old gcc workaround in DM cache target's block_div() for ARM
link errors now that gcc >= 4.9 is required.
- Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range.
- Remove old, and now frowned upon, BUG_ON(in_interrupt()) in
dm_table_event().
- Remove invalid sparse annotations from dm_prepare_ioctl() and
dm_unprepare_ioctl().
* tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: remove invalid sparse __acquires and __releases annotations
dm: fix double RCU unlock in dm_dax_zero_page_range() error path
dm: fix IO splitting
dm writecache: remove BUG() and fail gracefully instead
dm table: Remove BUG_ON(in_interrupt())
dm: fix bug with RCU locking in dm_blk_report_zones
Revert "dm cache: fix arm link errors with inline"
dm writecache: fix the maximum number of arguments
dm writecache: advance the number of arguments when reporting max_age
dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
Commit 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
chunk_sectors must reflect the most limited of all devices in the
IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
.iterate_devices method no longer had there IO split properly.
And commit 5091cdec56 ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).
Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.
Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.
Fixes: 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56 ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
commit 22ada802ed ("block: use lcm_not_zero() when stacking
chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
reflect the most limited of all devices in the IO stack.
Otherwise malformed IO may result. E.g.: prior to this fix,
->chunk_sectors = lcm_not_zero(8, 128) would result in
blk_max_size_offset() splitting IO at 128 sectors rather than the
required more restrictive 8 sectors.
And since commit 07d098e6bb ("block: allow 'chunk_sectors' to be
non-power-of-2") care must be taken to properly stack chunk_sectors to
be compatible with the possibility that a non-power-of-2 chunk_sectors
may be stacked. This is why gcd() is used instead of reverting back
to using min_not_zero().
Fixes: 22ada802ed ("block: use lcm_not_zero() when stacking chunk_sectors")
Fixes: 07d098e6bb ("block: allow 'chunk_sectors' to be non-power-of-2")
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If there is only one keyslot, then blk_ksm_init() computes
slot_hashtable_size=1 and log_slot_ht_size=0. This causes
blk_ksm_find_keyslot() to crash later because it uses
hash_ptr(key, log_slot_ht_size) to find the hash bucket containing the
key, and hash_ptr() doesn't support the bits == 0 case.
Fix this by making the hash table always have at least 2 buckets.
Tested by running:
kvm-xfstests -c ext4 -g encrypt -m inlinecrypt \
-o blk-crypto-fallback.num_keyslots=1
Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
disk_get_part needs to be paired with a disk_put_part.
Cc: stable@vger.kernel.org
Fixes: ef45fe470e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For avoiding use-after-free on flush request, we call its .end_io() from
both timeout code path and __blk_mq_end_request().
When flush request's ref doesn't drop to zero, it is still used, we
can't mark it as IDLE, so fix it by marking IDLE when its refcount drops
to zero really.
Fixes: 65ff5cd045 ("blk-mq: mark flush request as IDLE in flush_end_io()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Return if the function ended up sending an uevent or not.
Cc: stable@vger.kernel.org # v5.9
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mark flush request as IDLE in its .end_io(), aligning it with how normal
requests behave. The flush request stays in in-flight tags if we're not
using an IO scheduler, so we need to change its state into IDLE.
Otherwise, we will hang in blk_mq_tagset_wait_completed_request() during
error recovery because flush the request state is kept as COMPLETED.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Chao Leng <lengchao@huawei.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the bio's size reaches max_append_sectors, bio_add_hw_page returns
0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected
result of building a small enough bio not to be split in the IO path.
However, iov_iter is not advanced in this case, causing the same pages
are filled for the bio again and again.
Fix the case by properly advancing the iov_iter for already processed
pages.
Fixes: 0512a75b98 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Similarly to commit 457e490f2b ("blkcg: allocate struct blkcg_gq
outside request queue spinlock"), blkg_create can also trigger
occasional -ENOMEM failures at the radix insertion because any
allocation inside blkg_create has to be non-blocking, making it more
likely to fail. This causes trouble for userspace tools trying to
configure io weights who need to deal with this condition.
This patch reduces the occurrence of -ENOMEMs on this path by preloading
the radix tree element on a GFP_KERNEL context, such that we guarantee
the later non-blocking insertion won't fail.
A similar solution exists in blkcg_init_queue for the same situation.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Steps on the way to 5.10-rc1
Resolves conflicts in:
fs/userfaultfd.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+UQjkQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvN9D/9iHU7Vgi8J3SiLrHYUiDtMSI5VnEmSBo6K
Ej/wbbrk4tm2UYi550krOk0dMaHxWD3XWSTlpnrE0sUfjs69G676yzxrlnPt50f5
XbcMc0YOHZfffeu9xXykUO8Q2918PTPC08eLaTK1I8lhKAuuTFCT/syGYu+prfd7
AogyuczaDok8nqJEK9QNr0iaEUbe17GQwmvpWyjHl/qfKhWvV2r6jCZZf6pzQj2c
zv3kbiT3u6xw9OEuhY0sgpTEfhAHEXbNIln6Ob4qVgxmOjwgiZdU/QXyw1i2s6pc
ks7e28P43r3VfNYGBfr/hQCeAJT9gOeUG5yBiQr7ooX6uNPL6GOCG7DO/g5y2thQ
NkV4hub/FjYWbSmRzDlJGj1fWn4L+3r/O8g5nMr+F1L3JYeaW0hOyStqBQ4O74Cj
04tvWQ8ndXdPQrm/iDhM6KxfCvR5TC6k4fy9XPpRW8JOxauhIwTZQJyEQUnXTH3v
pwv3IxRmuWGa3mrJZ5kGhsNAEGHdZCL5soLI+BXAD2MUW2IB5v2HpD/z1bvWL/51
uYiVIt/2LxgLkF7BXP40PnY0qqTsOwGxdd6wQhi5Jn9Et+JkmAAR6cVwXx4AhuQg
FT5mq7ZTQBZrErQu4Mr1k3UyqBFm4MB+mbJhWrVWnUnnyA6pcr1NUsUTz5JcyrWz
jWI7T1Si7w==
=dFJi
-----END PGP SIGNATURE-----
Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph
- rdma error handling fixes (Chao Leng)
- fc error handling and reconnect fixes (James Smart)
- fix the qid displace when tracing ioctl command (Keith Busch)
- don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
- fix MTDT for passthru (Logan Gunthorpe)
- blacklist Write Same on more devices (Kai-Heng Feng)
- fix an uninitialized work struct (zhenwei pi)"
- lightnvm out-of-bounds fix (Colin)
- SG allocation leak fix (Doug)
- rnbd fixes (Gioh, Guoqing, Jack)
- zone error translation fixes (Keith)
- kerneldoc markup fix (Mauro)
- zram lockdep fix (Peter)
- Kill unused io_context members (Yufen)
- NUMA memory allocation cleanup (Xianting)
- NBD config wakeup fix (Xiubo)
* tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
block: blk-mq: fix a kernel-doc markup
nvme-fc: shorten reconnect delay if possible for FC
nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
nvme-fc: fix error loop in create_hw_io_queues
nvme-fc: fix io timeout to abort I/O
null_blk: use zone status for max active/open
nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
nvmet: cleanup nvmet_passthru_map_sg()
nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet: fix uninitialized work for zero kato
nvme-pci: disable Write Zeroes on Sandisk Skyhawk
nvme: use queuedata for nvme_req_qid
nvme-rdma: fix crash due to incorrect cqe
nvme-rdma: fix crash when connect rejected
block: remove unused members for io_context
blk-mq: remove the calling of local_memory_node()
zram: Fix __zram_bvec_{read,write}() locking order
skd_main: remove unused including <linux/version.h>
sgl_alloc_order: fix memory leak
lightnvm: fix out-of-bounds write to array devices->info[]
...
This reverts commit 9d3a39a5f1 as that
commit causes a bunch of SELinux errors.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I73dc15c4ecddcf1950ca7fca2cc107be27da8f3f
Steps on the way to 5.10-rc1
Resolves conflicts in:
include/linux/blk-crypto.h
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4012850c2e4b804d9e87e90b8e03a3b9ce21b5e7
We don't need to check whether the node is memoryless numa node before
calling allocator interface. SLUB(and SLAB,SLOB) relies on the page
allocator to pick a node. Page allocator should deal with memoryless
nodes just fine. It has zonelists constructed for each possible nodes.
And it will automatically fall back into a node which is closest to the
requested node. As long as __GFP_THISNODE is not enforced of course.
The code comments of kmem_cache_alloc_node() of SLAB also showed this:
* Fallback to other node is possible if __GFP_THISNODE is not set.
blk-mq code doesn't set __GFP_THISNODE, so we can remove the calling
of local_memory_node().
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix this warning:
./block/bio.c:1098: WARNING: Inline emphasis start-string without end-string.
The thing is that *iter is not a valid markup.
That seems to be a typo:
*iter -> @iter
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Using "@bio's parent" causes the following waring:
./block/bio.c:10: WARNING: Inline emphasis start-string without end-string.
The main problem here is that this would be converted into:
**bio**'s parent
By kernel-doc, which is not a valid notation. It would be
possible to use, instead, this kernel-doc markup:
``bio's`` parent
Yet, here, is probably simpler to just use an altenative language:
the parent of @bio
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
fix bio splitting for bios that were deferred to the worker thread
due to a DM device being suspended.
- Remove DM core's special handling of NVMe devices now that block
core has internalized efficiencies drivers previously needed to
be concerned about (via now removed direct_make_request).
- Fix request-based DM to not bounce through indirect dm_submit_bio;
instead have block core make direct call to blk_mq_submit_bio().
- Various DM core cleanups to simplify and improve code.
- Update DM cryot to not use drivers that set
CRYPTO_ALG_ALLOCATES_MEMORY.
- Fix DM raid's raid1 and raid10 discard limits for the purposes of
linux-stable. But then remove DM raid's discard limits settings now
that MD raid can efficiently handle large discards.
- A couple small cleanups across various targets.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl+Fx1gTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWk5iB/9pONYmtfQ5oBx4jg/PU8cVYYIfOtwS
ZtItFbw7T9bkHVZ8d4hDr5LTq898cADuRD5edlR82gDOcXkiJlb5PqU39RoOTVvF
Xz87sWzHdGAK7rdnCMAc2hiX3oQOje9o7NxGeGQ/uPaNU+U/vJS0AZtEAwltocBd
j9MGESddBC636Gzbg5C0c0frikXd0am6qp6SCYJNpP5I0G2beHk2YX5Jqt9c7zMk
8kyQend5b5RvkPNWTAjkVfWUsIjwYHh6MF48ZoGvD0X3lWjIBiwyxC0UX5hSXq63
kB+nqxbXcvQLEBtJuDZ2bjyvrwzCVLpmfgLgzxOOU8fI5Q2U0zpsPaa0
=6YDu
-----END PGP SIGNATURE-----
Merge tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Improve DM core's bio splitting to use blk_max_size_offset(). Also
fix bio splitting for bios that were deferred to the worker thread
due to a DM device being suspended.
- Remove DM core's special handling of NVMe devices now that block core
has internalized efficiencies drivers previously needed to be
concerned about (via now removed direct_make_request).
- Fix request-based DM to not bounce through indirect dm_submit_bio;
instead have block core make direct call to blk_mq_submit_bio().
- Various DM core cleanups to simplify and improve code.
- Update DM cryot to not use drivers that set
CRYPTO_ALG_ALLOCATES_MEMORY.
- Fix DM raid's raid1 and raid10 discard limits for the purposes of
linux-stable. But then remove DM raid's discard limits settings now
that MD raid can efficiently handle large discards.
- A couple small cleanups across various targets.
* tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix request-based DM to not bounce through indirect dm_submit_bio
dm: remove special-casing of bio-based immutable singleton target on NVMe
dm: export dm_copy_name_and_uuid
dm: fix comment in __dm_suspend()
dm: fold dm_process_bio() into dm_submit_bio()
dm: fix missing imposition of queue_limits from dm_wq_work() thread
dm snap persistent: simplify area_io()
dm thin metadata: Remove unused local variable when create thin and snap
dm raid: remove unnecessary discard limits for raid10
dm raid: fix discard limits for raid1 and raid10
dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
dm: use dm_table_get_device_name() where appropriate in targets
dm table: make 'struct dm_table' definition accessible to all of DM core
dm: eliminate need for start_io_acct() forward declaration
dm: simplify __process_abnormal_io()
dm: push use of on-stack flush_bio down to __send_empty_flush()
dm: optimize max_io_len() by inlining max_io_len_target_boundary()
dm: push md->immutable_target optimization down to __process_bio()
dm: change max_io_len() to use blk_max_size_offset()
dm table: stack 'chunk_sectors' limit to account for target-specific splitting
A zoned device with limited resources to open or activate zones may
return an error when the host exceeds those limits. The same command may
be successful if retried later, but the host needs to wait for specific
zone states before it should expect a retry to succeed. Have the block
layer provide an appropriate status for these conditions so applications
can distinuguish this error for special handling.
Cc: linux-api@vger.kernel.org
Cc: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EYWYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpsCgD/9Izy/mbiQMmcBPBuQFds2b2SwPAoB4RVcU
NU7pcI3EbAlcj7xDF08Z74Sr6MKyg+JhGid15iw47o+qFq6cxDKiESYLIrFmb70R
lUDkPr9J4OLNDSZ6hpM4sE6Qg9bzDPhRbAceDQRtVlqjuQdaOS2qZAjNG4qjO8by
3PDO7XHCW+X4HhXiu2PDCKuwyDlHxggYzhBIFZNf58US2BU8+tLn2gvTSvmTb27F
w0s5WU1Q5Q0W9RLrp4YTQi4SIIOq03BTSqpRjqhomIzhSQMieH95XNKGRitLjdap
2mFNJ+5I+DTB/TW2BDBrBRXnoV/QNBJsR0DDFnUZsHEejjXKEVt5BRCpSQC9A0WW
XUyVE1K+3GwgIxSI8tjPtyPEGzzhnqJjzHPq4LJLGlQje95v9JZ6bpODB7HHtZQt
rbNp8IoVQ0n01nIvkkt/vnzCE9VFbWFFQiiu5/+x26iKZXW0pAF9Dnw46nFHoYZi
llYvbKDcAUhSdZI8JuqnSnKhi7sLRNPnApBxs52mSX8qaE91sM2iRFDewYXzaaZG
NjijYCcUtopUvojwxYZaLnIpnKWG4OZqGTNw1IdgzUtfdxoazpg6+4wAF9vo7FEP
AePAUTKrfkGBm95uAP4bRvXBzS9UhXJvBrFW3grzRZybMj617F01yAR4N0xlMXeN
jMLrGe7sWA==
=xE9E
-----END PGP SIGNATURE-----
Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.
A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.
This contains:
- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)
- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu
- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)
- null_blk open/active zone limit support (Niklas)
- Set of bcache updates (Coly, Dongsheng, Qinglang)"
* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
ZwfpDh2+Tg==
=LzyE
-----END PGP SIGNATURE-----
Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in <linux/compat.h>
blk_exit_queue will free elevator_data, while blk_mq_run_work_fn
will access it. Move cancel of hctx->run_work to the front of
blk_exit_queue to avoid use-after-free.
Fixes: 1b97871b50 ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release")
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After commit 923218f616 ("blk-mq: don't allocate driver tag upfront
for flush rq"), blk_mq_submit_bio() will call blk_insert_flush()
directly to handle flush request rather than blk_mq_sched_insert_request()
in the case of elevator.
Then, all flush request either have set RQF_FLUSH_SEQ flag when call
blk_mq_sched_insert_request(), or have inserted into hctx->dispatch.
So, remove the dead code path.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since whole elevator register is protectd by sysfs_lock, we
don't need extras 'has_elevator'. Just use q->elevator directly.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After commit b89f625e28 ("block: don't release queue's sysfs
lock during switching elevator"), whole elevator register and
unregister function are covered by sysfs_lock. So, remove wrong
comment and add lockdep assert.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have introduced helper function blk_mq_hctx_stopped() to test
BLK_MQ_S_STOPPED.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have defined common interface blk_queue_registered() to
test QUEUE_FLAG_REGISTERED. Just use it.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
elv_support_iosched() will check queue_is_mq() for us. So, remove
the redundant check to clean code.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We will register debugfs for scheduler no matter whether it have
defined callback funciton .exit_sched. So, blk_mq_exit_sched()
is always needed to unregister debugfs. Also, q->elevator should
be set as NULL after exiting scheduler.
For now, since all register scheduler have defined .exit_sched,
it will not cause any actual problem. But It will be more reasonable
to do this change.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9/uU0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnQvD/wNEBP6d4ISx2/I6sDon9SKJgiY3CLF7x3f
F//GHMYP9+ZzoLdQRlebGiP6c5PVRL6ExJUVNT+Wc4h5jOuThuxy63j/zvv/RSFw
WH9lFiTG44zjbWjp3sCDOuIlHnCTsqA4zYb6os62q3v4SzenW/TA65C+yLn823AF
1VKeVvcoHDu3bvLwtLmAyqZAm2iJH02yKdclKgyaLSKdaGGPX2MJ4tW3GxqzA71i
7R/qer8KqYXSdJdghGI5eFycLnv/TE/bky02TlE+qUhIFwIhDNyo69IQzlMSQXmw
ECaAxMJYvzh6ruztkdJP0wOjYEryLY1oCusQEseB9M//qMlue/4Mi2D3bX5Ni1g4
blQQbIi1gu1J/fZrFtW7G/qHxDvT8oA5cFSv5e/72QRIghvavV6cvEP3s9Uu9v9l
3pA2LcErEgVellzvAe9q192mPpAUgR42VlUyYi7P74By+m7pWob2jWR0WsSbXqNk
pVhhW3s02hIf9HUAwJkqH46Y3FZmbpTBQvYByFnQh1VSRzmx69zZxs4SrKJTJq9L
Id83gBW+r1cuJ8QuZUX4D3ttIGuaZ7J8IdSY4JUBJPMOavbykb6YiWtZ4W5IW5R/
VYcuVTmJr37hcSBHJLw3FmlEN4IH/2QX+mrtJvCEWgeJACo3TVpv0QGw+gD1V5iS
EQzTCgctTg==
=THH6
-----END PGP SIGNATURE-----
Merge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes that should go into this release:
- NVMe controller error path reference fix (Chaitanya)
- Fix regression with IBM partitions on non-dasd devices (Christoph)
- Fix a missing clear in the compat CDROM packet structure (Peilin)"
* tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block:
partitions/ibm: fix non-DASD devices
nvme-core: put ctrl ref when module ref get fail
block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()
syzbot is reporting unkillable task [1], for the caller is failing to
handle a corrupted filesystem image which attempts to access beyond
the end of the device. While we need to fix the caller, flooding the
console with handle_bad_sector() message is unlikely useful.
[1] https://syzkaller.appspot.com/bug?id=f1f49fb971d7a3e01bd8ab8cff2ff4572ccf3092
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The __throtl_de/enqueue_tg() functions are only be called by
throtl_de/enqueue_tg(), thus we can just open code them to
make code more readable.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The throtl_schedule_next_dispatch() will validate if the service queue
is empty before calling update_min_dispatch_time(), and the
update_min_dispatch_time() will call throtl_rb_first(), which will
validate service queue again.
Thus we can move the service queue validation out of the
throtl_rb_first() to remove the redundant validation in the fast path.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We should move the list operation after validation.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It can not scale up in throtl_adjusted_limit() if we set bps or iops is
1, which will cause IO hang when enable low limit. Thus we should treat
1 as a illegal value to avoid this issue.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The IO latency tracking is only for LOW limit, so we should add a
validation to avoid redundant latency tracking if the LOW limit
is not valid.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We only update the tg->last_finish_time when the low limitaion is
enabled, so we can move the tg->last_finish_time validation a little
forward to avoid getting the unnecessary current time stamp if the
the low limitation is not enabled.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The throtl_downgrade_state() is always used to change to LIMIT_LOW
limitation, thus remove the latter meaningless parameter which
indicates the limitation index.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is unnecessary to force request-based DM to call into bio-based
dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then
call blk_mq_submit_bio().
Fix this by establishing a request-based DM block_device_operations
(dm_rq_blk_dops, which doesn't have .submit_bio) and update
dm_setup_md_queue() to set md->disk->fops to it for
DM_TYPE_REQUEST_BASED.
Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport
blk_mq_submit_bio.
Fixes: c62b37d96b ("block: move ->make_request_fn to struct block_device_operations")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Don't error out if the dasd_biodasdinfo symbol is not available.
Cc: stable@vger.kernel.org
Fixes: 26d7e28e38 ("s390/dasd: remove ioctl_by_bdev calls")
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
According to Documentation/block/stat.rst, inflight should not include
I/O requests that are in the queue but not yet dispatched to the device,
but blk-mq identifies as inflight any request that has a tag allocated,
which, for queues without elevator, happens at request allocation time
and before it is queued in the ctx (default case in blk_mq_submit_bio).
In addition, current behavior is different for queues with elevator from
queues without it, since for the former the driver tag is allocated at
dispatch time. A more precise approach would be to only consider
requests with state MQ_RQ_IN_FLIGHT.
This effectively reverts commit 6131837b1d ("blk-mq: count allocated
but not started requests in iostats inflight") to consolidate blk-mq
behavior with itself (elevator case) and with original documentation,
but it differs from the behavior used by the legacy path.
This version differs from v1 by using blk_mq_rq_state to access the
state attribute. Avoid using blk_mq_request_started, which was
suggested, since we don't want to include MQ_RQ_COMPLETE.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move blk_mq_sched_try_merge to blk-merge.c, which allows to mark
a lot of the merge infrastructure static there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Also move the definition from the public blkdev.h to the private
block/blk.h header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Also move the definition from the public blkdev.h to the private
block/blk.h header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_crypt_set_ctx() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.
For now this assumption is still fine, since no callers violate it.
Making bio_crypt_set_ctx() able to fail would add unneeded complexity.
However, if a caller didn't use __GFP_DIRECT_RECLAIM, it would be very
hard to notice the bug. Make it easier by adding a WARN_ON_ONCE().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Satya Tangirala <satyat@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_crypto_rq_bio_prep() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.
However, blk_crypto_rq_bio_prep() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c.
This case isn't currently reachable with a bio that actually has an
encryption context. However, it's fragile to rely on this. Just make
blk_crypto_rq_bio_prep() able to fail.
Suggested-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_crypt_clone() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.
However, bio_crypt_clone() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
kcryptd_io_read() in drivers/md/dm-crypt.c.
Neither case is currently reachable with a bio that actually has an
encryption context. However, it's fragile to rely on this. Just make
bio_crypt_clone() able to fail, analogous to bio_integrity_clone().
Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Cc: Satya Tangirala <satyat@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All remaining callers of bdget() outside of fs/block_dev.c want to get a
reference to the struct block_device for a given struct hd_struct. Add
a helper just for that and then mark bdget static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.
This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
scsi_put_cdrom_generic_arg() is copying uninitialized stack memory to
userspace, since the compiler may leave a 3-byte hole in the middle of
`cgc32`. Fix it by adding a padding field to `struct
compat_cdrom_generic_command`.
Cc: stable@vger.kernel.org
Fixes: f3ee6e63a9 ("compat_ioctl: move CDROM_SEND_PACKET handling into scsi")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: syzbot+85433a479a646a064ab3@syzkaller.appspotmail.com
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
DM depends on these block 5.10 commits:
22ada802ed block: use lcm_not_zero() when stacking chunk_sectors
07d098e6bb block: allow 'chunk_sectors' to be non-power-of-2
021a24460d block: add QUEUE_FLAG_NOWAIT
6abc49468e dm: add support for REQ_NOWAIT and enable it for linear target
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
'f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")' introduce a bug what we may sleep between
rcu lock. Then '530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter
callbacks")' fix it by get request_queue's ref. And 'a9a808084d6a ("block:
Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()")'
remove the synchronize_rcu in __blk_mq_update_nr_hw_queues. We need
update the confused comments in blk_mq_queue_tag_busy_iter.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Blk-mq should call commit_rqs once 'bd.last != true' and no more
request will come(so virtscsi can kick the virtqueue, e.g.). We already
do that in 'blk_mq_dispatch_rq_list/blk_mq_try_issue_list_directly' while
list not empty and 'queued > 0'. However, we can seen the same scene
once the last request in list call queue_rq and return error like
BLK_STS_IOERR which will not requeue the request, and lead that list
empty but need call commit_rqs too(Or the request for virtscsi will stay
timeout until other request kick virtqueue).
We found this problem by do fsstress test with offline/online virtscsi
device repeat quickly.
Fixes: d666ba98f8 ("blk-mq: add mq_ops->commit_rqs()")
Reported-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We found blk_mq_alloc_rq_maps() takes more time in kernel space when
testing nvme device hot-plugging. The test and anlysis as below.
Debug code,
1, blk_mq_alloc_rq_maps():
u64 start, end;
depth = set->queue_depth;
start = ktime_get_ns();
pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
current->pid, current->comm, current->nvcsw, current->nivcsw,
set->queue_depth, set->nr_hw_queues);
do {
err = __blk_mq_alloc_rq_maps(set);
if (!err)
break;
set->queue_depth >>= 1;
if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
err = -ENOMEM;
break;
}
} while (set->queue_depth);
end = ktime_get_ns();
pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n",
current->pid, current->comm,
current->nvcsw, current->nivcsw, end - start);
2, __blk_mq_alloc_rq_maps():
u64 start, end;
for (i = 0; i < set->nr_hw_queues; i++) {
start = ktime_get_ns();
if (!__blk_mq_alloc_rq_map(set, i))
goto out_unwind;
end = ktime_get_ns();
pr_err("hw queue %d init cost time %lld ns\n", i, end - start);
}
Test nvme hot-plugging with above debug code, we found it totally cost more
than 3ms in kernel space without being scheduled out when alloc rqs for all
16 hw queues with depth 1023, each hw queue cost about 140-250us. The cost
time will be increased with hw queue number and queue depth increasing. And
in an extreme case, if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will try
"queue_depth >>= 1", more time will be consumed.
[ 428.428771] nvme nvme0: pci function 10000:01:00.0
[ 428.428798] nvme 10000:01:00.0: enabling device (0000 -> 0002)
[ 428.428806] pcieport 10000:00:00.0: can't derive routing for PCI INT A
[ 428.428809] nvme 10000:01:00.0: PCI INT A: no GSI
[ 432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, nr_hw_queues 1
[ 432.593404] hw queue 0 init cost time 22883 ns
[ 432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init cost time 35960 ns
[ 432.595953] nvme nvme0: 16/0/0 default/read/poll queues
[ 432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, nr_hw_queues 16
[ 432.596203] hw queue 0 init cost time 242630 ns
[ 432.596441] hw queue 1 init cost time 235913 ns
[ 432.596659] hw queue 2 init cost time 216461 ns
[ 432.596877] hw queue 3 init cost time 215851 ns
[ 432.597107] hw queue 4 init cost time 228406 ns
[ 432.597336] hw queue 5 init cost time 227298 ns
[ 432.597564] hw queue 6 init cost time 224633 ns
[ 432.597785] hw queue 7 init cost time 219954 ns
[ 432.597937] hw queue 8 init cost time 150930 ns
[ 432.598082] hw queue 9 init cost time 143496 ns
[ 432.598231] hw queue 10 init cost time 147261 ns
[ 432.598397] hw queue 11 init cost time 164522 ns
[ 432.598542] hw queue 12 init cost time 143401 ns
[ 432.598692] hw queue 13 init cost time 148934 ns
[ 432.598841] hw queue 14 init cost time 147194 ns
[ 432.598991] hw queue 15 init cost time 148942 ns
[ 432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init cost time 3035099 ns
[ 432.602611] nvme0n1: p1
So use this patch to trigger schedule between each hw queue init, to avoid
other threads getting stuck. It is not in atomic context when executing
__blk_mq_alloc_rq_maps(), so it is safe to call cond_resched().
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fixes up a merge issue in:
net/ipv6/route.c
on the way to a 5.9-rc7 release.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4eb508eb3761b95ad8f39dd79f03b3352481ceaf
Three fixes: one in drivers (lpfc) and two for zoned block devices.
The latter also impinges on the block layer but only to introduce a
new block API for setting the zone model rather than fiddling with the
queue directly in the zoned block driver.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX29mRyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishabnAP48vMYD
/cjyGAJfq/0k/U/t6pRPc5tUm89LOWcOJz0SjwD/YXcQNz7mx8MxnypAV1jbWXR7
iyWkPMYVc4EJh7oTARE=
=SQhI
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes: one in drivers (lpfc) and two for zoned block devices.
The latter also impinges on the block layer but only to introduce a
new block API for setting the zone model rather than fiddling with the
queue directly in the zoned block driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: sd_zbc: Fix ZBC disk initialization
scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks
scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported
An iocg may have 0 debt but non-zero delay. The current debt forgiveness
logic doesn't act on such iocgs. This can lead to unexpected behaviors - an
iocg with a little bit of debt will have its delay canceled through debt
forgiveness but one w/o any debt but active delay will have to wait out
until its delay decays out.
This patch updates the debt handling logic so that it treats delays the same
as debts. If either debt or delay is active, debt forgiveness logic kicks in
and acts on both the same way.
Also, avoid turning the debt and delay directly to zero as that can confuse
state transitions.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Debt forgiveness logic was counting the number of consecutive !busy periods
as the trigger condition. While this usually works, it can easily be thrown
off by temporary fluctuations especially on configurations w/ short periods.
This patch reimplements debt forgiveness so that:
* Use the average usage over the forgiveness period instead of counting
consecutive periods.
* Debt is reduced at around the target rate (1/2 every 100ms) regardless of
ioc period duration.
* Usage threshold is raised to 50%. Combined with the preceding changes and
the switch to average usage, this makes debt forgivness a lot more
effective at reducing the amount of unnecessary idleness.
* Constants are renamed with DFGV_ prefix.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Debt sets the initial delay duration which is decayed over time. The current
debt reduction halved the debt but didn't change the delay. It prevented
future debts from increasing delay but didn't do anything to lower the
existing delay, limiting the mechanism's ability to reduce unnecessary
idling.
Reset iocg->delay to 0 after debt reduction so that iocg_kick_waitq()
recalculates new delay value based on the reduced debt amount.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Debt reduction was blocked if any iocg was short on budget in the past
period to avoid reducing debts while some iocgs are saturated. However, this
ends up unnecessarily blocking debt reduction due to temporary local
imbalances when the device is generally being underutilized, while also
failing to block when the underlying device is overwhelmed and the usage
becomes low from high latency.
Given that debt accumulation mostly happens with swapout bursts which can
significantly deteriorate the underlying device's latency response, the
current logic is not great.
Let's replace it with ioc->busy_level based condition so that we block debt
reduction when the underlying device is being saturated. ioc_forgive_debts()
call is moved after busy_level determination.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Debt reduction logic is going to be improved and expanded. Factor it out
into ioc_forgive_debts() and generalize the comment a bit. No functional
change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for
REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where
applicable.
Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT. Also
update submit_bio_checks() to verify it is set for REQ_NOWAIT bios.
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No need to go through the hd_struct to find the partition number.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a littler helper to make the somewhat arcane bd_contains checks a
little more obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-5.10/block: (140 commits)
bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag
bdi: invert BDI_CAP_NO_ACCT_WB
bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
mm: use SWP_SYNCHRONOUS_IO more intelligently
bdi: remove BDI_CAP_SYNCHRONOUS_IO
bdi: remove BDI_CAP_CGROUP_WRITEBACK
block: lift setting the readahead size into the block layer
md: update the optimal I/O size on reshape
bdi: initialize ->ra_pages and ->io_pages in bdi_init
aoe: set an optimal I/O size
bcache: inherit the optimal I/O size
drbd: remove dead code in device_to_statistics
fs: remove the unused SB_I_MULTIROOT flag
block: mark blkdev_get static
PM: mm: cleanup swsusp_swap_check
mm: split swap_type_of
PM: rewrite is_hibernate_resume_dev to not require an inode
mm: cleanup claim_swapfile
ocfs2: cleanup o2hb_region_dev_store
dasd: cleanup dasd_scan_partitions
...
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it. This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.
One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore. It is replaced with a queue attribute which
also is writable for easier testing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just checking SB_I_CGROUPWB for cgroup writeback support is enough.
Either the file system allocates its own bdi (e.g. btrfs), in which case
it is known to support cgroup writeback, or the bdi comes from the block
layer, which always supports cgroup writeback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Drivers shouldn't really mess with the readahead size, as that is a VM
concept. Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk. Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors. To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Set up a readahead size by default, as very few users have a good
reason to change it. This means code, ecryptfs, and orangefs now
set up the values while they were previously missing it, while ubifs,
mtd and vboxsf manually set it to 0 to avoid readahead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Richard Weinberger <richard@nod.at> [ubifs, mtd]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use blkdev_get_by_dev instead of open coding it using bdget_disk +
blkdev_get, and split the code to read the partition table into a
separate helper to make it a little more obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is possible, albeit more unlikely, for a block device to have a non
power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors,
which results in a full-stripe size of 1280K. This causes the RAID6's
io_opt to be advertised as 1280K, and a stacked device _could_ then be
made to use a blocksize, aka chunk_sectors, that matches non power-of-2
io_opt of underlying RAID6 -- resulting in stacked device's
chunk_sectors being a non power-of-2).
Update blk_queue_chunk_sectors() and blk_max_size_offset() to
accommodate drivers that need a non power-of-2 chunk_sectors.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Like 'io_opt', blk_stack_limits() should stack 'chunk_sectors' using
lcm_not_zero() rather than min_not_zero() -- otherwise the final
'chunk_sectors' could result in sub-optimal alignment of IO to
component devices in the IO stack.
Also, if 'chunk_sectors' isn't a multiple of 'physical_block_size'
then it is a bug in the driver and the device should be flagged as
'misaligned'.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bmd is allocated using kmalloc in bio_alloc_map_data, so make sure
is_null_mapped is properly initialized to false for the !null_mapped
case.
Fixes: f3256075ba ("block: remove the BIO_NULL_MAPPED flag")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sg_init_table zeroes its first argument, so the allocation of that argument
doesn't have to.
the semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x;
@@
x =
- kzalloc
+ kmalloc
(...)
...
sg_init_table(x,...)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as
regular disks. In this case, ensure that command completion is correctly
executed by changing sd_zbc_complete() to return good_bytes instead of 0
and causing a hang during device probe (endless retries).
When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to
have partitions, it will be used as a regular disk. In this case, make sure
to not do anything in sd_zbc_revalidate_zones() as that triggers warnings.
Since all these different cases result in subtle settings of the disk queue
zoned model, introduce the block layer helper function
blk_queue_set_zoned() to generically implement setting up the effective
zoned model according to the disk type, the presence of partitions on the
disk and CONFIG_BLK_DEV_ZONED configuration.
Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com
Fixes: b72053072c ("block: allow partitions on host aware zone devices")
Cc: <stable@vger.kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Do not need check the bps or iops limitation if bps or iops is unlimited.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The tg_may_dispatch() will call tg_with_in_bps_limit() and
tg_with_in_iops_limit() to check if we can dispatch a bio or
not, which will calculate bps/iops limitation multiple times.
But tg_may_dispatch() is always called under queue lock, which
means the bps/iops limitation will not change in tg_may_dispatch().
So we can calculate the bps/iops limitation only once, and pass
them to tg_with_in_bps_limit() and tg_with_in_iops_limit() to
avoid calculating bps/iops limitation repeatedly.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The 'throtl_grp_quantum' and 'throtl_quantum' are both read-only
variables, thus better to use readable macros instead of static
variables, which can also save some spaces for .bss area.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
adjust_inuse_and_calc_cost() is responsible for reducing the amount of
donated weights dynamically in period as the budget runs low. Because we
don't want to do full donation calculation in period, we keep latching up
inuse by INUSE_ADJ_STEP_PCT of the active weight of the cgroup until the
resulting hweight_inuse is satisfactory.
Unfortunately, the adj_step calculation was reading the active weight before
acquiring ioc->lock. Because the current thread could have lost race to
activate the iocg to another thread before entering this function, it may
read the active weight as zero before acquiring ioc->lock. When this
happens, the adj_step is calculated as zero and the incremental adjustment
loop becomes an infinite one.
Fix it by fetching the active weight after acquiring ioc->lock.
Fixes: b0853ab4a2 ("blk-iocost: revamp in-period donation snapbacks")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Conceptually, root_iocg->hweight_donating must be less than WEIGHT_ONE but
all hweight calculations round up and thus it may end up >= WEIGHT_ONE
triggering divide-by-zero and other issues. Bound the value to avoid
surprises.
Fixes: e08d02aa5f ("blk-iocost: implement Andy's method for donation weight updates")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These functions can be used to enable iostat for partitions on devices
like md, bcache.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- NVMe pull request from Christoph:
- cancel async events before freeing them (David Milburn)
- revert a broken race fix (James Smart)
- fix command processing during resets (Sagi Grimberg)
- Fix a kyber crash with requeued flushes (Omar)
- Fix __bio_try_merge_page() same_page error for no merging (Ritesh)
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9boNoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpm++D/9oEC1RazLFXwZD7rtXUMQ0bWRmbyM77Qtq
P7wn0poSSvHT6fNyd9ytf9STlTXeJz81Gk4jTRiau1HKAhc9GudYEzYFw0baNN82
AX5dO1Gt2vww+k4XAHCM0l0k2/IOgQg8d2hDJBt68bnDIW/T1T3GORqS5Ki0dw9R
EYVFbBePZTyUIAxDWnSKtNRR3TpMrfZfi9AAUpwGkKVcCZkHD4SlrNPGKd0ckD5Z
GnHdJtWjb5mIgVHMbHgWjcIjKhC7BTrL+sCqdBJ55NvfWXZ20QoKKDSx5BWl6rMI
g/eMAJjoYJ6Ih13sjIbrC7fHZBXzPRTRfqKBq8fM6oytD0cO9ZcUfpBeqiCWOyrT
SU3C1MkkqeskDGNXhjOq8lFWeyQlUgBg0rXIDDeFNusUB3QOZa3T7oirqZlfZsOi
G7WVd4/aftr+qB8GVl1HmLCg7U3rO2q6EuJ+aJDGh07TuiFi5qaPwRzmRcykKs62
UJ15W9JaNEHdGQs5rim7evz9qLCTyQqrwF7nDFBpM8hsraPPCNbwGoUbXLACtXGR
htjr5nxEoOEJs9SKZCWl9jXzvyoMkqLp4j6soVS7cZKUJU1qxMhf68FGylbHitEq
Pe1z7dG/3Pq/zV77aGTt1J40tB43tHr3gOSQ2swwjxqvYIjlvbP4xnl6SIHvLlof
blntc17XWQ==
=J16G
-----END PGP SIGNATURE-----
Merge tag 'block-5.9-2020-09-11' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix a regression in bdev partition locking (Christoph)
- NVMe pull request from Christoph:
- cancel async events before freeing them (David Milburn)
- revert a broken race fix (James Smart)
- fix command processing during resets (Sagi Grimberg)
- Fix a kyber crash with requeued flushes (Omar)
- Fix __bio_try_merge_page() same_page error for no merging (Ritesh)
* tag 'block-5.9-2020-09-11' of git://git.kernel.dk/linux-block:
block: Set same_page to false in __bio_try_merge_page if ret is false
nvme-fabrics: allow to queue requests for live queues
block: only call sched requeue_request() for scheduled requests
nvme-tcp: cancel async events before freeing event struct
nvme-rdma: cancel async events before freeing event struct
nvme-fc: cancel async events before freeing event struct
nvme: Revert: Fix controller creation races with teardown flow
block: restore a specific error code in bdev_del_partition
NVMe shares tagset between fabric queue and admin queue or between
connect_q and NS queue, so hctx_may_queue() can be called to allocate
request for these queues.
Tags can be reserved in these tagset. Before error recovery, there is
often lots of in-flight requests which can't be completed, and new
reserved request may be needed in error recovery path. However,
hctx_may_queue() can always return false because there is too many
in-flight requests which can't be completed during error handling.
Finally, nothing can proceed.
Fix this issue by always allowing reserved tag allocation in
hctx_may_queue(). This is reasonable because reserved tags are supposed
to always be available.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: David Milburn <dmilburn@redhat.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
scsi/sg.h is included more than once, Remove the one that isn't
necessary.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The test and the explaination of the patch as bellow.
Before test we added more debug code in blkg_async_bio_workfn():
int count = 0
if (bios.head && bios.head->bi_next) {
need_plug = true;
blk_start_plug(&plug);
}
while ((bio = bio_list_pop(&bios))) {
/*io_punt is a sysctl user interface to control the print*/
if(io_punt) {
printk("[%s:%d] bio start,size:%llu,%d count=%d plug?%d\n",
current->comm, current->pid, bio->bi_iter.bi_sector,
(bio->bi_iter.bi_size)>>9, count++, need_plug);
}
submit_bio(bio);
}
if (need_plug)
blk_finish_plug(&plug);
Steps that need to be set to trigger *PUNT* io before testing:
mount -t btrfs -o compress=lzo /dev/sda6 /btrfs
mount -t cgroup2 nodev /cgroup2
mkdir /cgroup2/cg3
echo "+io" > /cgroup2/cgroup.subtree_control
echo "8:0 wbps=1048576000" > /cgroup2/cg3/io.max #1000M/s
echo $$ > /cgroup2/cg3/cgroup.procs
Then use dd command to test btrfs PUNT io in current shell:
dd if=/dev/zero of=/btrfs/file bs=64K count=100000
Test hardware environment as below:
[root@localhost btrfs]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
With above debug code, test command and test environment, I did the
tests under 3 different system loads, which are triggered by stress:
1, Run 64 threads by command "stress -c 64 &"
[53615.975974] [kworker/u66:18:1490] bio start,size:45583056,8 count=0 plug?1
[53615.975980] [kworker/u66:18:1490] bio start,size:45583064,8 count=1 plug?1
[53615.975984] [kworker/u66:18:1490] bio start,size:45583072,8 count=2 plug?1
[53615.975987] [kworker/u66:18:1490] bio start,size:45583080,8 count=3 plug?1
[53615.975990] [kworker/u66:18:1490] bio start,size:45583088,8 count=4 plug?1
[53615.975993] [kworker/u66:18:1490] bio start,size:45583096,8 count=5 plug?1
... ...
[53615.977041] [kworker/u66:18:1490] bio start,size:45585480,8 count=303 plug?1
[53615.977044] [kworker/u66:18:1490] bio start,size:45585488,8 count=304 plug?1
[53615.977047] [kworker/u66:18:1490] bio start,size:45585496,8 count=305 plug?1
[53615.977050] [kworker/u66:18:1490] bio start,size:45585504,8 count=306 plug?1
[53615.977053] [kworker/u66:18:1490] bio start,size:45585512,8 count=307 plug?1
[53615.977056] [kworker/u66:18:1490] bio start,size:45585520,8 count=308 plug?1
[53615.977058] [kworker/u66:18:1490] bio start,size:45585528,8 count=309 plug?1
2, Run 32 threads by command "stress -c 32 &"
[50586.290521] [kworker/u66:6:32351] bio start,size:45806496,8 count=0 plug?1
[50586.290526] [kworker/u66:6:32351] bio start,size:45806504,8 count=1 plug?1
[50586.290529] [kworker/u66:6:32351] bio start,size:45806512,8 count=2 plug?1
[50586.290531] [kworker/u66:6:32351] bio start,size:45806520,8 count=3 plug?1
[50586.290533] [kworker/u66:6:32351] bio start,size:45806528,8 count=4 plug?1
[50586.290535] [kworker/u66:6:32351] bio start,size:45806536,8 count=5 plug?1
... ...
[50586.299640] [kworker/u66:5:32350] bio start,size:45808576,8 count=252 plug?1
[50586.299643] [kworker/u66:5:32350] bio start,size:45808584,8 count=253 plug?1
[50586.299646] [kworker/u66:5:32350] bio start,size:45808592,8 count=254 plug?1
[50586.299649] [kworker/u66:5:32350] bio start,size:45808600,8 count=255 plug?1
[50586.299652] [kworker/u66:5:32350] bio start,size:45808608,8 count=256 plug?1
[50586.299663] [kworker/u66:5:32350] bio start,size:45808616,8 count=257 plug?1
[50586.299665] [kworker/u66:5:32350] bio start,size:45808624,8 count=258 plug?1
[50586.299668] [kworker/u66:5:32350] bio start,size:45808632,8 count=259 plug?1
3, Don't run thread by stress
[50861.355246] [kworker/u66:19:32376] bio start,size:13544504,8 count=0 plug?0
[50861.355288] [kworker/u66:19:32376] bio start,size:13544512,8 count=0 plug?0
[50861.355322] [kworker/u66:19:32376] bio start,size:13544520,8 count=0 plug?0
[50861.355353] [kworker/u66:19:32376] bio start,size:13544528,8 count=0 plug?0
[50861.355392] [kworker/u66:19:32376] bio start,size:13544536,8 count=0 plug?0
[50861.355431] [kworker/u66:19:32376] bio start,size:13544544,8 count=0 plug?0
[50861.355468] [kworker/u66:19:32376] bio start,size:13544552,8 count=0 plug?0
[50861.355499] [kworker/u66:19:32376] bio start,size:13544560,8 count=0 plug?0
[50861.355532] [kworker/u66:19:32376] bio start,size:13544568,8 count=0 plug?0
[50861.355575] [kworker/u66:19:32376] bio start,size:13544576,8 count=0 plug?0
[50861.355618] [kworker/u66:19:32376] bio start,size:13544584,8 count=0 plug?0
[50861.355659] [kworker/u66:19:32376] bio start,size:13544592,8 count=0 plug?0
[50861.355740] [kworker/u66:0:32346] bio start,size:13544600,8 count=0 plug?1
[50861.355748] [kworker/u66:0:32346] bio start,size:13544608,8 count=1 plug?1
[50861.355962] [kworker/u66:2:32347] bio start,size:13544616,8 count=0 plug?0
[50861.356272] [kworker/u66:7:31962] bio start,size:13544624,8 count=0 plug?0
[50861.356446] [kworker/u66:7:31962] bio start,size:13544632,8 count=0 plug?0
[50861.356567] [kworker/u66:7:31962] bio start,size:13544640,8 count=0 plug?0
[50861.356707] [kworker/u66:19:32376] bio start,size:13544648,8 count=0 plug?0
[50861.356748] [kworker/u66:15:32355] bio start,size:13544656,8 count=0 plug?0
[50861.356825] [kworker/u66:17:31970] bio start,size:13544664,8 count=0 plug?0
Analysis of above 3 test results with different system load:
>From above test, we can see more and more continuous bios can be plugged
with system load increasing. When run "stress -c 64 &", 310 continuous
bios are plugged; When run "stress -c 32 &", 260 continuous bios are
plugged; When don't run stress, at most only 2 continuous bios are
plugged, in most cases, bio_list only contains one single bio.
How to explain above phenomenon:
We know, in submit_bio(), if the bio is a REQ_CGROUP_PUNT io, it will
queue a work to workqueue blkcg_punt_bio_wq. But when the workqueue is
scheduled, it depends on the system load. When system load is low, the
workqueue will be quickly scheduled, and the bio in bio_list will be
quickly processed in blkg_async_bio_workfn(), so there is less chance
that the same io submit thread can add multiple continuous bios to
bio_list before workqueue is scheduled to run. The analysis aligned with
above test "3".
When system load is high, there is some delay before the workqueue can
be scheduled to run, the higher the system load the greater the delay.
So there is more chance that the same io submit thread can add multiple
continuous bios to bio_list. Then when the workqueue is scheduled to run,
there are more continuous bios in bio_list, which will be processed in
blkg_async_bio_workfn(). The analysis aligned with above test "1" and "2".
According to test, we can get io performance improved with the patch,
especially when system load is higher. Another optimazition is to use
the plug only when bio_list contains at least 2 bios.
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Like check_disk_changed, except that it does not call ->revalidate_disk
but leaves that to the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway
not merging this page in this bio, then it make sense to make same_page
also as false before returning.
Without this patch, we hit below WARNING in iomap.
This mostly happens with very large memory system and / or after tweaking
vm dirty threshold params to delay writeback of dirty data.
WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150
CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6
Call Trace:
__remove_mapping+0x154/0x320 (unreliable)
iomap_releasepage+0x80/0x180
try_to_release_page+0x94/0xe0
invalidate_inode_page+0xc8/0x110
invalidate_mapping_pages+0x1dc/0x540
generic_fadvise+0x3c8/0x450
xfs_file_fadvise+0x2c/0xe0 [xfs]
vfs_fadvise+0x3c/0x60
ksys_fadvise64_64+0x68/0xe0
sys_fadvise64+0x28/0x40
system_call_exception+0xf8/0x1c0
system_call_common+0xf0/0x278
Fixes: cc90bc6842 ("block: fix "check bi_size overflow before merge"")
Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yang Yang reported the following crash caused by requeueing a flush
request in Kyber:
[ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
...
[ 2.517468] pc : clear_bit+0x18/0x2c
[ 2.517502] lr : sbitmap_queue_clear+0x40/0x228
[ 2.517503] sp : ffffff800832bc60 pstate : 00c00145
...
[ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
[ 2.517602] Call trace:
[ 2.517606] clear_bit+0x18/0x2c
[ 2.517619] kyber_finish_request+0x74/0x80
[ 2.517627] blk_mq_requeue_request+0x3c/0xc0
[ 2.517637] __scsi_queue_insert+0x11c/0x148
[ 2.517640] scsi_softirq_done+0x114/0x130
[ 2.517643] blk_done_softirq+0x7c/0xb0
[ 2.517651] __do_softirq+0x208/0x3bc
[ 2.517657] run_ksoftirqd+0x34/0x60
[ 2.517663] smpboot_thread_fn+0x1c4/0x2c0
[ 2.517667] kthread+0x110/0x120
[ 2.517669] ret_from_fork+0x10/0x18
This happens because Kyber doesn't track flush requests, so
kyber_finish_request() reads a garbage domain token. Only call the
scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
the finish_request() hook in blk_mq_free_request()). Now that we're
handling it in blk-mq, also remove the check from BFQ.
Reported-by: Yang Yang <yang.yang@vivo.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Switch to the naming used by the other entries so that we can use the
QUEUE_RW_ENTRY helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add two helpers macros to avoid boilerplate code for the queue sysfs
entries.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
mdadm relies on the fact that deleting an invalid partition returns
-ENXIO or -ENOTTY to detect if a block device is a partition or a
whole device.
Fixes: 08fc1ab6d7 ("block: fix locking in bdev_del_partition")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now we usually free the hctx->sched_data by e->type->ops.exit_hctx(),
and no users will use blk_mq_sched_free_hctx_data() function.
Remove it.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Discarding blocks and buffers under a mounted filesystem is hardly
anything admin wants to do. Usually it will confuse the filesystem and
sometimes the loss of buffer_head state (including b_private field) can
even cause crashes like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
Call Trace:
__jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
kjournald2+0xbd/0x270 [jbd2]
So if we don't have block device open with O_EXCL already, claim the
block device while we truncate buffer cache. This makes sure any
exclusive block device user (such as filesystem) cannot operate on the
device while we are discarding buffer cache.
Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[axboe: fix !CONFIG_BLOCK error in truncate_bdev_range()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9SWMMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphIcD/488Q7rXb2eABp1fGs4gu+VFOCLogeHL8xh
5xHNiOPnZG2SGr8DQJY/7EX2kE65rbZi8/g+2N6anovI2nduRu0tzSra7fRgzbys
ZQC1CUel0MbCd7e8OaEfg108PSHNxBf1PqDcE7zCeyZ0DIs3s4vK/bQtmzzxZHgU
wNw4OIP9gOdqgjowb6GGHo9SLN4GT8rZ0jZVPLa7GwFsvxCTwv/7lHO8rqeSeuCu
5H6i3M/rSbtTXPLHf4Fy97x9WmBmdgu4epTXiwbOxaagpx3lm/7n1P3CpavR+Gcq
O5VGIIzazxPwnZl9y/6rZFLGYqcj38RxUvC8KtK6tDXxEu/BDJa1d6hXI03SyXAO
ZAiEpQTKOkJE3R8ewUDrXLvl3p6FvwZVZ5SIFwUb+0JFrVQYwrgfoRJtzb5SIUan
T9/bSYge7lFRI92FZRIqhvk8rsEBRdu7N/rQCyGf6GuZ0vRXWRAqN7T02iDn3czX
pdGAepU5ymw8CwyUiNNnkY0DUaQLBIO9tCA9epxLwdroQ95vJtMPRBX1STQ65GVk
XvMFAJqDAehQ/nP5xO60cWGZHyL7L/ccpofZlA/ytgAIZRa85GvhrdVy7yc6DKto
wu6h2tkX9+ldoUjVbn/60T+Ft3QUTlfAuDfherkNoFNB/G5i1pzOHbwvL7B3czr3
ZMjoNiOIqA==
=8fvz
-----END PGP SIGNATURE-----
Merge tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A bit larger than usual this week, mostly due to the NVMe fixes
arriving late for -rc3 and hence didn't make last weeks pull request.
- NVMe:
- instance leak and io boundary fixes from Keith
- fc locking fix from Christophe
- various tcp/rdma reset during traffic fixes from Sagi
- pci use-after-free fix from Tong
- tcp target null deref fix from Ziye
- Locking fix for partition removal (Christoph)
- Ensure bdi->io_pages is always set (me)
- Fixup for hd struct reference (Ming)
- Fix for zero length bvecs (Ming)
- Two small blk-iocost fixes (Tejun)"
* tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
block: allow for_each_bvec to support zero len bvec
blk-stat: make q->stats->lock irqsafe
blk-iocost: ioc_pd_free() shouldn't assume irq disabled
block: fix locking in bdev_del_partition
block: release disk reference in hd_struct_free_work
block: ensure bdi->io_pages is always initialized
nvme-pci: cancel nvme device request before disabling
nvme: only use power of two io boundaries
nvme: fix controller instance leak
nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
nvme: Fix NULL dereference for pci nvme controllers
nvme-rdma: fix reset hang if controller died in the middle of a reset
nvme-rdma: fix timeout handler
nvme-rdma: serialize controller teardown sequences
nvme-tcp: fix reset hang if controller died in the middle of a reset
nvme-tcp: fix timeout handler
nvme-tcp: serialize controller teardown sequences
nvme: have nvme_wait_freeze_timeout return if it timed out
nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
contention is possible for mq-deadline and bfq IO schedulers
when nr_hw_queues is more than one.
It is because kblockd work queue can submit IO from all online CPUs
(through blk_mq_run_hw_queues()) even though only one hctx has pending
commands.
The elevator callback .has_work for mq-deadline and bfq scheduler considers
pending work if there are any IOs on request queue but it does not account
hctx context.
Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering
the elevator even though there are no requests queued.
[jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap]
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.
Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.
Originally-from: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The per-hctx nr_active value can no longer be used to fairly assign a share
of tag depth per request queue for when using a shared sbitmap, as it does
not consider that the tags are shared tags over all hctx's.
For this case, record the nr_active_requests per request_queue, and make
the judgement based on that value.
Co-developed-with: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk-mq.h and blk-mq-tag.h include on each other, which is less than ideal.
Locate hctx_may_queue() to blk-mq.h, as it is not really tag specific code.
In this way, we can drop the blk-mq-tag.h include of blk-mq.h
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.
In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec060
("blk-mq: drain I/O when all CPUs in a hctx are offline").
However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.
In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.
However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e092 ("scsi:
core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.
To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.
New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.
Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.
This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].
[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Introduce pointers for the blk_mq_tags regular and reserved bitmap tags,
with the goal of later being able to use a common shared tag bitmap across
all HW contexts in a set.
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass hctx/tagset flags argument down to blk_mq_init_tags() and
blk_mq_free_tags() for selective init/free.
For now, make it include the alloc policy flag, which can be evaluated
when needed (in blk_mq_init_tags()).
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since the tags are allocated in blk_mq_init_tags(), it's better practice
to free in that same function upon error, rather than a callee which is to
init the bitmap tags (blk_mq_init_tags()).
[jpg: Split from an earlier patch with a new commit message]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The function does not set the depth, but rather transitions from
shared to non-shared queues and vice versa.
So rename it to blk_mq_update_tag_set_shared() to better reflect
its purpose.
[jpg: take out some unrelated changes in blk_mq_init_bitmap_tags()]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.
So rename it to make the point explicitly.
[jpg: rebase a few times, add rnbd-clt.c change]
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only virtio_blk and xen-blkfront set the revalidate argument to true,
and both do not implement the ->revalidate_disk method. So switch
to the helper that just updates the size instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove a duplicative condition to remove below cppcheck warnings:
"warning: Redundant condition: sched_allow_merge. '!A || (A && B)' is
equivalent to '!A || B' [redundantCondition]"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If WRITE_ZERO/WRITE_SAME operation is not supported by the storage,
blk_cloned_rq_check_limits() will return IO error which will cause
device-mapper to fail the paths.
Instead, if the queue limit is set to 0, return BLK_STS_NOTSUPP.
BLK_STS_NOTSUPP will be ignored by device-mapper and will not fail the
paths.
Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These are really cheap to collect and can be useful in debugging iocost
behavior. Add them as debug stats for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When an iocg accumulates too much vtime or gets deactivated, we throw away
some vtime, which lowers the overall device utilization. As the exact amount
which is being thrown away is known, we can compensate by accelerating the
vrate accordingly so that the extra vtime generated in the current period
matches what got lost.
This significantly improves work conservation when involving high weight
cgroups with intermittent and bursty IO patterns.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A low weight iocg can amass a large amount of debt, for example, when
anonymous memory gets reclaimed aggressively. If the system has a lot of
memory paired with a slow IO device, the debt can span multiple seconds or
more. If there are no other subsequent IO issuers, the in-debt iocg may end
up blocked paying its debt while the IO device is idle.
This patch implements a mechanism to protect against such pathological
cases. If the device has been sufficiently idle for a substantial amount of
time, the debts are halved. The criteria are on the conservative side as we
want to resolve the rare extreme cases without impacting regular operation
by forgiving debts too readily.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Debt handling had several issues.
* How much inuse a debtor carries wasn't clearly defined. inuse would be
driven down over time from not issuing IOs but it'd be better to clamp it
to minimum immediately once in debt.
* How much can be paid off was determined by hweight_inuse. As inuse was
driven down, the payment amount would fall together regardless of the
debtor's active weight. This means that the debtors were punished harshly.
* ioc_rqos_merge() wasn't calling blkcg_schedule_throttle() after
iocg_kick_delay().
This patch revamps debt handling so that
* Debt handling owns inuse for iocgs in debt and keeps them at zero.
* Payment amount is determined by hweight_active. This is more deterministic
and safer than hweight_inuse but still far from ideal in that it doesn't
factor in possible donations from other iocgs for debt payments. This
likely needs further improvements in the future.
* iocg_rqos_merge() now calls blkcg_schedule_throttle() as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, iocost doesn't collect or expose any statistics punting off all
monitoring duties to drgn based iocost_monitor.py. While it works for some
scenarios, there are some usability and data availability challenges. For
example, accurate per-cgroup usage information can't be tracked by vtime
progression at all and the number available in iocg->usages[] are really
short-term snapshots used for control heuristics with possibly significant
errors.
This patch implements per-cgroup absolute usage stat counter and exposes it
through io.stat along with the current vrate. Usage stat collection and
flushing employ the same method as cgroup rstat on the active iocg's and the
only hot path overhead is preemption toggling and adding to a percpu
counter.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Curently, iocost syncs the delay duration to the outstanding debt amount,
which seemed enough to protect the system from anon memory hogs. However,
that was mostly because the delay calcuation was using hweight_inuse which
quickly converges towards zero under debt for delay duration calculation,
often pusnishing debtors overly harshly for longer than deserved.
The previous patch fixed the delay calcuation and now the protection against
anonymous memory hogs isn't enough because the effect of delay is indirect
and non-linear and a huge amount of future debt can accumulate abruptly
while unthrottled.
This patch implements delay hysteresis so that delay is decayed
exponentially over time instead of getting cleared immediately as debt is
paid off. While the overall behavior is similar to the blk-cgroup
implementation used by blk-iolatency, a lot of the details are different and
due to the empirical nature of the mechanism, it's challenging to adapt the
mechanism for one controller without negatively impacting the other.
As the delay is gradually decayed now, there's no point in running it from
its own hrtimer. Periodic updates are now performed from ioc_timer_fn() and
the dedicated hrtimer is removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the margin drops below the minimum on a donating iocg, donation is
immediately canceled in full. There are a couple shortcomings with the
current behavior.
* It's abrupt. A small temporary budget deficit can lead to a wide swing in
weight allocation and a large surplus.
* It's open coded in the issue path but not implemented for the merge path.
A series of merges at a low inuse can make the iocg incur debts and stall
incorrectly.
This patch reimplements in-period donation snapbacks so that
* inuse adjustment and cost calculations are factored into
adjust_inuse_and_calc_cost() which is called from both the issue and merge
paths.
* Snapbacks are more gradual. It occurs in quarter steps.
* A snapback triggers if the margin goes below the low threshold and is
lower than the budget at the time of the last adjustment.
* For the above, __propagate_weights() stores the margin in
iocg->saved_margin. Move iocg->last_inuse storing together into
__propagate_weights() for consistency.
* Full snapback is guaranteed when there are waiters.
* With precise donation and gradual snapbacks, inuse adjustments are now a
lot more effective and the value of scaling inuse on weight changes isn't
clear. Removed inuse scaling from weight_update().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, debt handling requires only iocg->waitq.lock. In the future, we
want to adjust and propagate inuse changes depending on debt status. Let's
grab ioc->lock in debt handling paths in preparation.
* Because ioc->lock nests outside iocg->waitq.lock, the decision to grab
ioc->lock needs to be made before entering the critical sections.
* Add and use iocg_[un]lock() which handles the conditional double locking.
* Add @pay_debt to iocg_kick_waitq() so that debt payment happens only when
the caller grabbed both locks.
This patch is prepatory and the comments contain references to future
changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
iocost has various safety nets to combat inuse adjustment calculation
inaccuracies. With Andy's method implemented in transfer_surpluses(), inuse
adjustment calculations are now accurate and we can make donation amount
determinations accurate too.
* Stop keeping track of past usage history and using the maximum. Act on the
immediate usage information.
* Remove donation constraints defined by SURPLUS_* constants. Donate
whatever isn't used.
* Determine the donation amount so that the iocg will end up with
MARGIN_TARGET_PCT budget at the end of the coming period assuming the same
usage as the previous period. TARGET is set at 50% of period, which is the
previous maximum. This provides smooth convergence for most repetitive IO
patterns.
* Apply donation logic early at 20% budget. There's no risk in doing so as
the calculation is based on the delta between the current budget and the
target budget at the end of the coming period.
* Remove preemptive iocg activation for zero cost IOs. As donation can reach
near zero now, the mere activation doesn't provide any protection anymore.
In the unlikely case that this becomes a problem, the right solution is
assigning appropriate costs for such IOs.
This significantly improves the donation determination logic while also
simplifying it. Now all donations are immediate, exact and smooth.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The margin handling was pretty inconsistent.
* ioc->margin_us and ioc->inuse_margin_vtime were used as vtime margin
thresholds. However, the two are in different units with the former
requiring conversion to vtime on use.
* iocg_kick_waitq() was using a quarter of WAITQ_TIMER_MARGIN_PCT of
period_us as the timer slack - ~1.2%. While iocg_kick_delay() was using a
quarter of ioc->margin_us - ~12.5%. There aren't strong reasons to use
different values for the two.
This patch cleans up margin and timer slack handling:
* vtime margins are now recorded in ioc->margins.{min, max} on period
duration changes and used consistently.
* Timer slack is now 1% of period_us and recorded in ioc->timer_slack_ns and
used consistently for iocg_kick_waitq() and iocg_kick_delay().
The only functional change is shortening of timer slack. No meaningful
visible change is expected.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
iocost implements work conservation by reducing iocg->inuse and propagating
the adjustment upwards proportionally. However, while I knew the target
absolute hierarchical proportion - adjusted hweight_inuse, I couldn't figure
out how to determine the iocg->inuse adjustment to achieve that and
approximated the adjustment by scaling iocg->inuse using the proportion of
the needed hweight_inuse changes.
When nested, these scalings aren't accurate even when adjusting a single
node as the donating node also receives the benefit of the donated portion.
When multiple nodes are donating as they often do, they can be wildly wrong.
iocost employed various safety nets to combat the inaccuracies. There are
ample buffers in determining how much to donate, the adjustments are
conservative and gradual. While it can achieve a reasonable level of work
conservation in simple scenarios, the inaccuracies can easily add up leading
to significant loss of total work. This in turn makes it difficult to
closely cap vrate as vrate adjustment is needed to compensate for the loss
of work. The combination of inaccurate donation calculations and vrate
adjustments can lead to wide fluctuations and clunky overall behaviors.
Andy Newell devised a method to calculate the needed ->inuse updates to
achieve the target hweight_inuse's. The method is compatible with the
proportional inuse adjustment propagation which allows all hot path
operations to be local to each iocg.
To roughly summarize, Andy's method divides the tree into donating and
non-donating parts, calculates global donation rate which is used to
determine the target hweight_inuse for each node, and then derives per-level
proportions. There's non-trivial amount of math involved. Please refer to
the following pdfs for detailed descriptions.
https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bohttps://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsEhttps://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN
This patch implements Andy's method in transfer_surpluses(). This makes the
donation calculations accurate per cycle and enables further improvements in
other parts of the donation logic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>