-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXYTLkACgkQONu9yGCS
aT4+fhAAqqR/Cvx53ZKMQ8GZTCudAZnr/Dz6kWYwxhhhIbQjDpCaf9mgsrEDaQS2
ancSZjzYaOUIXq/IsthXxQIUhiZbuM3iuSEi7+odWgSYdkFyzuUt8MWLBGSaB5Er
ojn+APtq7vPXTSnp7uMwqMC3/BHCKkeYIjRVevhhHBKG5d3lzkV1xU8NcvMkLaly
CIRxpWXD3w2b7K0GEbb/zN1GQEHDCQcxjuaJoe/5FKGJkqd3T31eyiJTRumCCMcz
j8vkGkYmcMJpWf04iLgVA1p13I5/HGrXdEBI/GutN8IABIC3Cp42jW8phHYKW5ZM
a4R25LZG5buND1Ubpq+EDrYn3EaPek5XRki0w8ZAXfNa3rYc+N6mQjkzNSOzhJ/5
VNsn3EAE1Dwtar5Z3ASe9ugDbh+0bgx85PbfaADK88V+qWb3DVr1TBWmDNu2vfVP
rv4I0EKu9r3vOE8aNMEBuhAVkIK3mEQUxwab6RKNrMby/5Uwa+ugrrUtQd8V+T1S
j6r6v7u7aZ8mhYO7d6WSvAKL85lCWGbs3WRIKCJZmDRyqWrWW9tVWRN9wrZ2QnRr
iaCQKk8P474P7/j1zwnmih8l4wS1oszveNziWwd0fi1Nn/WQYM+JKYQvpuQijmQ+
J9jLyWo7a59zffIE6mzJdNwFy9hlw9X+VnJmExk/Q88Z7Bt5wPQ=
=laYd
-----END PGP SIGNATURE-----
Merge 5.10.210 into android12-5.10-lts
Changes in 5.10.210
usb: cdns3: Fixes for sparse warnings
usb: cdns3: fix uvc failure work since sg support enabled
usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config
usb: cdns3: fix iso transfer error when mult is not zero
usb: cdns3: Fix uvc fail when DMA cross 4k boundery since sg enabled
PCI: mediatek: Clear interrupt status before dispatching handler
units: change from 'L' to 'UL'
units: add the HZ macros
serial: sc16is7xx: set safe default SPI clock frequency
spi: introduce SPI_MODE_X_MASK macro
serial: sc16is7xx: add check for unsupported SPI modes during probe
iio: adc: ad7091r: Set alert bit in config register
iio: adc: ad7091r: Allow users to configure device events
iio: adc: ad7091r: Enable internal vref if external vref is not supplied
dmaengine: fix NULL pointer in channel unregistration function
iio:adc:ad7091r: Move exports into IIO_AD7091R namespace.
ext4: allow for the last group to be marked as trimmed
crypto: api - Disallow identical driver names
PM: hibernate: Enforce ordering during image compression/decompression
hwrng: core - Fix page fault dead lock on mmap-ed hwrng
crypto: s390/aes - Fix buffer overread in CTR mode
rpmsg: virtio: Free driver_override when rpmsg_remove()
bus: mhi: host: Drop chan lock before queuing buffers
parisc/firmware: Fix F-extend for PDC addresses
async: Split async_schedule_node_domain()
async: Introduce async_schedule_dev_nocall()
arm64: dts: qcom: sdm845: fix USB wakeup interrupt types
arm64: dts: qcom: sdm845: fix USB DP/DM HS PHY interrupts
lsm: new security_file_ioctl_compat() hook
scripts/get_abi: fix source path leak
mmc: core: Use mrq.sbc in close-ended ffu
mmc: mmc_spi: remove custom DMA mapped buffers
rtc: Adjust failure return code for cmos_set_alarm()
nouveau/vmm: don't set addr on the fail path to avoid warning
ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path
rename(): fix the locking of subdirectories
block: Remove special-casing of compound pages
stddef: Introduce DECLARE_FLEX_ARRAY() helper
smb3: Replace smb2pdu 1-element arrays with flex-arrays
mm: vmalloc: introduce array allocation functions
KVM: use __vcalloc for very large allocations
net/smc: fix illegal rmb_desc access in SMC-D connection dump
tcp: make sure init the accept_queue's spinlocks once
bnxt_en: Wait for FLR to complete during probe
vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
llc: make llc_ui_sendmsg() more robust against bonding changes
llc: Drop support for ETH_P_TR_802_2.
net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
tracing: Ensure visibility when inserting an element into tracing_map
afs: Hide silly-rename files from userspace
tcp: Add memory barrier to tcp_push()
netlink: fix potential sleeping issue in mqueue_flush_file
ipv6: init the accept_queue's spinlocks in inet6_create
net/mlx5: DR, Use the right GVMI number for drop action
net/mlx5e: fix a double-free in arfs_create_groups
netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
netfilter: nf_tables: validate NFPROTO_* family
net: mvpp2: clear BM pool before initialization
selftests: netdevsim: fix the udp_tunnel_nic test
fjes: fix memleaks in fjes_hw_setup
net: fec: fix the unhandled context fault from smmu
btrfs: ref-verify: free ref cache before clearing mount opt
btrfs: tree-checker: fix inline ref size in error messages
btrfs: don't warn if discard range is not aligned to sector
btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
btrfs: don't abort filesystem when attempting to snapshot deleted subvolume
rbd: don't move requests to the running list on errors
exec: Fix error handling in begin_new_exec()
wifi: iwlwifi: fix a memory corruption
netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
netfilter: nf_tables: reject QUEUE/DROP verdict parameters
gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
drm: Don't unref the same fb many times by mistake due to deadlock handling
drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
drm/tidss: Fix atomic_flush check
drm/bridge: nxp-ptn3460: simplify some error checking
PM: sleep: Use dev_printk() when possible
PM: sleep: Avoid calling put_device() under dpm_list_mtx
PM: core: Remove unnecessary (void *) conversions
PM: sleep: Fix possible deadlocks in core system-wide PM code
fs/pipe: move check to pipe_has_watch_queue()
pipe: wakeup wr_wait after setting max_usage
ARM: dts: samsung: exynos4210-i9100: Unconditionally enable LDO12
arm64: dts: qcom: sc7180: Use pdc interrupts for USB instead of GIC interrupts
arm64: dts: qcom: sc7180: fix USB wakeup interrupt types
media: mtk-jpeg: Fix use after free bug due to error path handling in mtk_jpeg_dec_device_run
mm: use __pfn_to_section() instead of open coding it
mm/sparsemem: fix race in accessing memory_section->usage
btrfs: remove err variable from btrfs_delete_subvolume
btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted
drm: panel-simple: add missing bus flags for Tianma tm070jvhg[30/33]
drm/exynos: fix accidental on-stack copy of exynos_drm_plane
drm/exynos: gsc: minor fix for loop iteration in gsc_runtime_resume
gpio: eic-sprd: Clear interrupt after set the interrupt type
spi: bcm-qspi: fix SFDP BFPT read by usig mspi read
mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
tick/sched: Preserve number of idle sleeps across CPU hotplug events
x86/entry/ia32: Ensure s32 is sign extended to s64
powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
drivers/perf: pmuv3: don't expose SW_INCR event in sysfs
powerpc: Fix build error due to is_valid_bugaddr()
powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
x86/boot: Ignore NMIs during very early boot
powerpc: pmd_move_must_withdraw() is only needed for CONFIG_TRANSPARENT_HUGEPAGE
powerpc/lib: Validate size for vector operations
x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
debugobjects: Stop accessing objects after releasing hash bucket lock
regulator: core: Only increment use_count when enable_count changes
audit: Send netlink ACK before setting connection in auditd_set
ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
PNP: ACPI: fix fortify warning
ACPI: extlog: fix NULL pointer dereference check
PM / devfreq: Synchronize devfreq_monitor_[start/stop]
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
UBSAN: array-index-out-of-bounds in dtSplitRoot
jfs: fix slab-out-of-bounds Read in dtSearch
jfs: fix array-index-out-of-bounds in dbAdjTree
jfs: fix uaf in jfs_evict_inode
pstore/ram: Fix crash when setting number of cpus to an odd number
crypto: stm32/crc32 - fix parsing list of devices
afs: fix the usage of read_seqbegin_or_lock() in afs_lookup_volume_rcu()
afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock()
jfs: fix array-index-out-of-bounds in diNewExt
s390/ptrace: handle setting of fpc register correctly
KVM: s390: fix setting of fpc register
SUNRPC: Fix a suspicious RCU usage warning
ecryptfs: Reject casefold directory inodes
ext4: fix inconsistent between segment fstrim and full fstrim
ext4: unify the type of flexbg_size to unsigned int
ext4: remove unnecessary check from alloc_flex_gd()
ext4: avoid online resizing failures due to oversized flex bg
wifi: rt2x00: restart beacon queue when hardware reset
selftests/bpf: satisfy compiler by having explicit return in btf test
selftests/bpf: Fix pyperf180 compilation failure with clang18
scsi: lpfc: Fix possible file string name overflow when updating firmware
PCI: Add no PM reset quirk for NVIDIA Spectrum devices
bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk
scsi: arcmsr: Support new PCI device IDs 1883 and 1886
ARM: dts: imx7d: Fix coresight funnel ports
ARM: dts: imx7s: Fix lcdif compatible
ARM: dts: imx7s: Fix nand-controller #size-cells
wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus()
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
scsi: libfc: Don't schedule abort twice
scsi: libfc: Fix up timeout error in fc_fcp_rec_error()
bpf: Set uattr->batch.count as zero before batched update or deletion
ARM: dts: rockchip: fix rk3036 hdmi ports node
ARM: dts: imx25/27-eukrea: Fix RTC node name
ARM: dts: imx: Use flash@0,0 pattern
ARM: dts: imx27: Fix sram node
ARM: dts: imx1: Fix sram node
ionic: pass opcode to devcmd_wait
block/rnbd-srv: Check for unlikely string overflow
ARM: dts: imx25: Fix the iim compatible string
ARM: dts: imx25/27: Pass timing0
ARM: dts: imx27-apf27dev: Fix LED name
ARM: dts: imx23-sansa: Use preferred i2c-gpios properties
ARM: dts: imx23/28: Fix the DMA controller node name
net: dsa: mv88e6xxx: Fix mv88e6352_serdes_get_stats error path
block: prevent an integer overflow in bvec_try_merge_hw_page
md: Whenassemble the array, consult the superblock of the freshest device
arm64: dts: qcom: msm8996: Fix 'in-ports' is a required property
arm64: dts: qcom: msm8998: Fix 'out-ports' is a required property
wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices
wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift()
wifi: cfg80211: free beacon_ies when overridden from hidden BSS
Bluetooth: qca: Set both WIDEBAND_SPEECH and LE_STATES quirks for QCA2066
Bluetooth: L2CAP: Fix possible multiple reject send
i40e: Fix VF disable behavior to block all traffic
f2fs: fix to check return value of f2fs_reserve_new_block()
ALSA: hda: Refer to correct stream index at loops
ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument
fast_dput(): handle underflows gracefully
RDMA/IPoIB: Fix error code return in ipoib_mcast_join
drm/amd/display: Fix tiled display misalignment
f2fs: fix write pointers on zoned device after roll forward
drm/drm_file: fix use of uninitialized variable
drm/framebuffer: Fix use of uninitialized variable
drm/mipi-dsi: Fix detach call without attach
media: stk1160: Fixed high volume of stk1160_dbg messages
media: rockchip: rga: fix swizzling for RGB formats
PCI: add INTEL_HDA_ARL to pci_ids.h
ALSA: hda: Intel: add HDA_ARL PCI ID support
ALSA: hda: intel-dspcfg: add filters for ARL-S and ARL
drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time
IB/ipoib: Fix mcast list locking
media: ddbridge: fix an error code problem in ddb_probe
drm/msm/dpu: Ratelimit framedone timeout msgs
clk: hi3620: Fix memory leak in hi3620_mmc_clk_init()
clk: mmp: pxa168: Fix memory leak in pxa168_clk_init()
watchdog: it87_wdt: Keep WDTCTRL bit 3 unmodified for IT8784/IT8786
drm/amdgpu: Let KFD sync with VM fences
drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'
leds: trigger: panic: Don't register panic notifier if creating the trigger failed
um: Fix naming clash between UML and scheduler
um: Don't use vfprintf() for os_info()
um: net: Fix return type of uml_net_start_xmit()
i3c: master: cdns: Update maximum prescaler value for i2c clock
xen/gntdev: Fix the abuse of underlying struct page in DMA-buf import
mfd: ti_am335x_tscadc: Fix TI SoC dependencies
PCI: Only override AMD USB controller if required
PCI: switchtec: Fix stdev_release() crash after surprise hot remove
usb: hub: Replace hardcoded quirk value with BIT() macro
tty: allow TIOCSLCKTRMIOS with CAP_CHECKPOINT_RESTORE
fs/kernfs/dir: obey S_ISGID
PCI/AER: Decode Requester ID when no error info found
libsubcmd: Fix memory leak in uniq()
virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
blk-mq: fix IO hang from sbitmap wakeup race
ceph: fix deadlock or deadcode of misusing dget()
drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
perf: Fix the nr_addr_filters fix
wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
drm: using mul_u32_u32() requires linux/math64.h
scsi: isci: Fix an error code problem in isci_io_request_build()
scsi: core: Introduce enum scsi_disposition
scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
ip6_tunnel: use dev_sw_netstats_rx_add()
ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()
net-zerocopy: Refactor frag-is-remappable test.
tcp: add sanity checks to rx zerocopy
ixgbe: Remove non-inclusive language
ixgbe: Refactor returning internal error codes
ixgbe: Refactor overtemp event handling
ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
llc: call sock_orphan() at release time
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
net: ipv4: fix a memleak in ip_setup_cork
af_unix: fix lockdep positive in sk_diag_dump_icons()
net: sysfs: Fix /sys/class/net/<iface> path
HID: apple: Add support for the 2021 Magic Keyboard
HID: apple: Add 2021 magic keyboard FN key mapping
bonding: remove print in bond_verify_device_path
uapi: stddef.h: Fix __DECLARE_FLEX_ARRAY for C++
PM: sleep: Fix error handling in dpm_prepare()
dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools
dmaengine: ti: k3-udma: Report short packet errors
dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA
dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA
phy: renesas: rcar-gen3-usb2: Fix returning wrong error code
dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
drm/msm/dp: return correct Colorimetry for DP_TEST_DYNAMIC_RANGE_CEA case
net: stmmac: xgmac: fix handling of DPP safety error for DMA channels
selftests: net: avoid just another constant wait
tunnels: fix out of bounds access when building IPv6 PMTU error
atm: idt77252: fix a memleak in open_card_ubr0
hwmon: (aspeed-pwm-tacho) mutex for tach reading
hwmon: (coretemp) Fix out-of-bounds memory access
hwmon: (coretemp) Fix bogus core_id to attr name mapping
inet: read sk->sk_family once in inet_recv_error()
rxrpc: Fix response to PING RESPONSE ACKs to a dead call
tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
ppp_async: limit MRU to 64K
netfilter: nft_compat: reject unused compat flag
netfilter: nft_compat: restrict match/target protocol to u16
netfilter: nft_ct: reject direction for ct id
netfilter: nft_set_pipapo: store index in scratch maps
netfilter: nft_set_pipapo: add helper to release pcpu scratch area
netfilter: nft_set_pipapo: remove scratch_aligned pointer
scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
blk-iocost: Fix an UBSAN shift-out-of-bounds warning
net/af_iucv: clean up a try_then_request_module()
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: cp210x: add ID for IMST iM871A-USB
usb: host: xhci-plat: Add support for XHCI_SG_TRB_CACHE_SIZE_QUIRK
hrtimer: Report offline hrtimer enqueue
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
vhost: use kzalloc() instead of kmalloc() followed by memset()
clocksource: Skip watchdog check for large watchdog intervals
net: stmmac: xgmac: use #define for string constants
net: stmmac: xgmac: fix a typo of register name in DPP safety handling
netfilter: nft_set_rbtree: skip end interval element from gc
btrfs: forbid creating subvol qgroups
btrfs: do not ASSERT() if the newly created subvolume already got read
btrfs: forbid deleting live subvol qgroup
btrfs: send: return EOPNOTSUPP on unknown flags
of: unittest: Fix compile in the non-dynamic case
net: openvswitch: limit the number of recursions from action sets
spi: ppc4xx: Drop write-only variable
ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
net: sysfs: Fix /sys/class/net/<iface> path for statistics
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
i40e: Fix waiting for queues of all VSIs to be disabled
tracing/trigger: Fix to return error if failed to alloc snapshot
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
ALSA: hda/realtek: Fix the external mic not being recognised for Acer Swift 1 SF114-32
ALSA: hda/realtek: Enable Mute LED on HP Laptop 14-fq0xxx
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
HID: wacom: Do not register input devices until after hid_hw_start
usb: ucsi_acpi: Fix command completion handling
USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
usb: f_mass_storage: forbid async queue when shutdown happen
media: ir_toy: fix a memleak in irtoy_tx
powerpc/kasan: Fix addr error caused by page alignment
i2c: i801: Remove i801_set_block_buffer_mode
i2c: i801: Fix block process call transactions
modpost: trim leading spaces when processing source files list
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
lsm: fix the logic in security_inode_getsecctx()
firewire: core: correct documentation of fw_csr_string() kernel API
kbuild: Fix changing ELF file type for output of gen_btf for big endian
nfc: nci: free rx_data_reassembly skb on NCI device cleanup
net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
xen-netback: properly sync TX responses
ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
binder: signal epoll threads of self-work
misc: fastrpc: Mark all sessions as invalid in cb_remove
ext4: fix double-free of blocks due to wrong extents moved_len
tracing: Fix wasted memory in saved_cmdlines logic
staging: iio: ad5933: fix type mismatch regression
iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
iio: accel: bma400: Fix a compilation problem
media: rc: bpf attach/detach requires write permission
hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
ring-buffer: Clean ring_buffer_poll_wait() error return
serial: max310x: set default value when reading clock ready bit
serial: max310x: improve crystal stable clock detection
x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
mmc: slot-gpio: Allow non-sleeping GPIO ro
ALSA: hda/conexant: Add quirk for SWS JS201D
nilfs2: fix data corruption in dsync block recovery for small block sizes
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
nfp: use correct macro for LengthSelect in BAR config
nfp: flower: prevent re-adding mac index for bonded port
wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
s390/qeth: Fix potential loss of L3-IP@ in case of network issues
ceph: prevent use-after-free in encode_cap_msg()
of: property: fix typo in io-channels
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
pmdomain: core: Move the unused cleanup to a _sync initcall
tracing: Inform kmemleak of saved_cmdlines allocation
Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
bus: moxtet: Add spi device table
PCI: dwc: endpoint: Fix dw_pcie_ep_raise_msix_irq() alignment support
mips: Fix max_mapnr being uninitialized on early stages
crypto: lib/mpi - Fix unexpected pointer access in mpi_ec_init
serial: Add rs485_supported to uart_port
serial: 8250_exar: Fill in rs485_supported
serial: 8250_exar: Set missing rs485_supported flag
scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
scripts/decode_stacktrace.sh: support old bash version
scripts: decode_stacktrace: demangle Rust symbols
scripts/decode_stacktrace.sh: optionally use LLVM utilities
netfilter: ipset: fix performance regression in swap operation
netfilter: ipset: Missing gc cancellations fixed
hrtimer: Ignore slack time for RT tasks in schedule_hrtimeout_range()
Revert "arm64: Stash shadow stack pointer in the task struct on interrupt"
net: prevent mss overflow in skb_segment()
sched/membarrier: reduce the ability to hammer on sys_membarrier
nilfs2: fix potential bug in end_buffer_async_write
nilfs2: replace WARN_ONs for invalid DAT metadata block requests
dm: limit the number of targets and parameter size area
PM: runtime: add devm_pm_runtime_enable helper
PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
drm/msm/dsi: Enable runtime PM
netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
net: bcmgenet: Fix EEE implementation
PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
Linux 5.10.210
Change-Id: I5e7327f58dd6abd26ac2b1e328a81c1010d1147c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
android_vh_bio_free: We have added some additional information to
each bio, which needs to be freed simultaneously when the bio is free
Bug: 319582497
Change-Id: I532f98fa0569f2eb8da66cff746349c828e0912c
Signed-off-by: hao lv <hao.lv5@transsion.com>
[ Upstream commit 3f034c374ad55773c12dd8f3c1607328e17c0072 ]
Reordered a check to avoid a possible overflow when adding len to bv_len.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20231204173419.782378-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 upstream.
The special casing was originally added in pre-git history; reproducing
the commit log here:
> commit a318a92567d77
> Author: Andrew Morton <akpm@osdl.org>
> Date: Sun Sep 21 01:42:22 2003 -0700
>
> [PATCH] Speed up direct-io hugetlbpage handling
>
> This patch short-circuits all the direct-io page dirtying logic for
> higher-order pages. Without this, we pointlessly bounce BIOs up to
> keventd all the time.
In the last twenty years, compound pages have become used for more than
just hugetlb. Rewrite these functions to operate on folios instead
of pages and remove the special case for hugetlbfs; I don't think
it's needed any more (and if it is, we can put it back in as a call
to folio_test_hugetlb()).
This was found by inspection; as far as I can tell, this bug can lead
to pages used as the destination of a direct I/O read not being marked
as dirty. If those pages are then reclaimed by the MM without being
dirtied for some other reason, they won't be written out. Then when
they're faulted back in, they will not contain the data they should.
It'll take a pretty unusual setup to produce this problem with several
races all going the wrong way.
This problem predates the folio work; it could for example have been
triggered by mmaping a THP in tmpfs and using that as the target of an
O_DIRECT read.
Fixes: 800d8c63b2 ("shmem: add huge pages support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b82d9fa257cb3725c49d94d2aeafc4677c34448a ]
Returning 0 early from __bio_iov_append_get_pages() for the
max_append_sectors warning just creates an infinite loop since 0 means
success, and the bio will never fill from the unadvancing iov_iter. We
could turn the return into an error value, but it will already be turned
into an error value later on, so just remove the warning. Clearly no one
ever hit it anyway.
Fixes: 0512a75b98 ("block: Introduce REQ_OP_ZONE_APPEND")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220610195830.3574005-2-kbusch@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3ee859e384d453d6ac68bfd5971f630d9fa46ad3 upstream.
bio_truncate() clears the buffer outside of last block of bdev, however
current bio_truncate() is using the wrong offset of page. So it can
return the uninitialized data.
This happened when both of truncated/corrupted FS and userspace (via
bdev) are trying to read the last of bdev.
Reported-by: syzbot+ac94ae5f68b84197f41c@syzkaller.appspotmail.com
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/875yqt1c9g.fsf@mail.parknet.co.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9cf3bd531844ffbfe94b16e417037a16efc988d upstream.
__bio_iov_append_get_pages() doesn't put not appended pages on
bio_add_hw_page() failure, so potentially leaking them, fix it. Also, do
the same for __bio_iov_iter_get_pages(), even though it looks like it
can't be triggered by userspace in this case.
Fixes: 0512a75b98 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1edfa6a2ffd66d55e6345a477df5387d2c1415d0.1626653825.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3edf5346e4f2ce2fa0c94651a90a8dda169565ee ]
For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:
split bio(READ) kworker
nvme_complete_rq
blk_update_request //split error=0
bio_endio
bio_integrity_endio
queue_work(kintegrityd_wq, &bip->bip_work);
bio_integrity_verify_fn
bio_endio //split bio
__bio_chain_endio
if (!parent->bi_status)
<interrupt entry>
nvme_irq
blk_update_request //parent error=7
req_bio_endio
bio->bi_status = 7 //parent bio
<interrupt exit>
parent->bi_status = 0
parent->bi_end_io() // return bi_status=0
The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.
In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
When the bio's size reaches max_append_sectors, bio_add_hw_page returns
0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected
result of building a small enough bio not to be split in the IO path.
However, iov_iter is not advanced in this case, causing the same pages
are filled for the bio again and again.
Fix the case by properly advancing the iov_iter for already processed
pages.
Fixes: 0512a75b98 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix this warning:
./block/bio.c:1098: WARNING: Inline emphasis start-string without end-string.
The thing is that *iter is not a valid markup.
That seems to be a typo:
*iter -> @iter
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Using "@bio's parent" causes the following waring:
./block/bio.c:10: WARNING: Inline emphasis start-string without end-string.
The main problem here is that this would be converted into:
**bio**'s parent
By kernel-doc, which is not a valid notation. It would be
possible to use, instead, this kernel-doc markup:
``bio's`` parent
Yet, here, is probably simpler to just use an altenative language:
the parent of @bio
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
ZwfpDh2+Tg==
=LzyE
-----END PGP SIGNATURE-----
Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
bio_crypt_clone() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.
However, bio_crypt_clone() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
kcryptd_io_read() in drivers/md/dm-crypt.c.
Neither case is currently reachable with a bio that actually has an
encryption context. However, it's fragile to rely on this. Just make
bio_crypt_clone() able to fail, analogous to bio_integrity_clone().
Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Cc: Satya Tangirala <satyat@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway
not merging this page in this bio, then it make sense to make same_page
also as false before returning.
Without this patch, we hit below WARNING in iomap.
This mostly happens with very large memory system and / or after tweaking
vm dirty threshold params to delay writeback of dirty data.
WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150
CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6
Call Trace:
__remove_mapping+0x154/0x320 (unreliable)
iomap_releasepage+0x80/0x180
try_to_release_page+0x94/0xe0
invalidate_inode_page+0xc8/0x110
invalidate_mapping_pages+0x1dc/0x540
generic_fadvise+0x3c8/0x450
xfs_file_fadvise+0x2c/0xe0 [xfs]
vfs_fadvise+0x3c/0x60
ksys_fadvise64_64+0x68/0xe0
sys_fadvise64+0x28/0x40
system_call_exception+0xf8/0x1c0
system_call_common+0xf0/0x278
Fixes: cc90bc6842 ("block: fix "check bi_size overflow before merge"")
Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we pass in an offset which is larger than PAGE_SIZE, then
page_is_mergeable() thinks it's not mergeable with the previous bio_vec,
leading to a large number of bio_vecs being used. Use a slightly more
obvious test that the two pages are compatible with each other.
Fixes: 52d52d1c98 ("block: only allow contiguous page structs in a bio_vec")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Drop the repeated words "a" and "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_associate_blkg_from_page is a special purpose helper for swap bios
that doesn't need access to bio internals. Move it to the swap code
instead of having it in bio.c.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge __bio_associate_blkg into the only caller, which allows to slightly
reduce the RCU crticial section and better explain the code flow.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_clone_blkg_association is supposed to clone the associatation, but
actually ends up doing a search with a tryget. As we know we have a
reference on the source cgroup just get an unconditional additional
reference to it and call it a day. That also removes the need for
a RCU critical section.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_disassociate_blkg has two callers, of which one immediately assigns
a new value to >bi_blkg. Just open code the function in the two callers.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
This code was detected with the help of Coccinelle and, audited and
fixed manually.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The status can be trivially derived from the bio itself. That also avoid
callers like NVMe to incorrectly pass a blk_status_t instead of the errno,
and the overhead of translating the blk_status_t to the errno in the I/O
completion fast path when no tracing is enabled.
Fixes: 35fe0d12c8 ("nvme: trace bio completion")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All callers are in blk-core.c, so move update_io_ticks over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove these now unused functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
part_inc_in_flight and part_dec_in_flight only have one caller each, and
those callers are purely for bio based drivers. Merge each function into
the only caller, and remove the superflous blk-mq checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the upper layers (like the filesystem/fscrypt) that know about and
manages encryption contexts. As such, when the upper layer submits a bio
to the block layer, and this bio eventually reaches a device driver with
support for inline encryption, the device driver will need to have been
told the encryption context for that bio.
We want to communicate the encryption context from the upper layer to the
storage device along with the bio, when the bio is submitted to the block
layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.
We also make changes to blk-mq to make it handle bios with encryption
contexts. blk-mq can merge many bios into the same request. These bios need
to have contiguous data unit numbers (the necessary changes to blk-merge
are also made to ensure this) - as such, it suffices to keep the data unit
number of just the first bio, since that's all a storage driver needs to
infer the data unit number to use for each data block in each bio in a
request. blk-mq keeps track of the encryption context to be used for all
the bios in a request with the request's rq_crypt_ctx. When the first bio
is added to an empty request, blk-mq will program the encryption context
of that bio into the request_queue's keyslot manager, and store the
returned keyslot in the request's rq_crypt_ctx. All the functions to
operate on encryption contexts are in blk-crypto.c.
Upper layers only need to call bio_crypt_set_ctx with the encryption key,
algorithm and data_unit_num; they don't have to worry about getting a
keyslot for each encryption context, as blk-mq/blk-crypto handles that.
Blk-crypto also makes it possible for request-based layered devices like
dm-rq to make use of inline encryption hardware by cloning the
rq_crypt_ctx and programming a keyslot in the new request_queue when
necessary.
Note that any user of the block layer can submit bios with an
encryption context, such as filesystems, device-mapper targets, etc.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Export bio_release_pages and bio_iov_iter_get_pages, so they can be used
from modular code.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
block device. This is a no-merge write operation.
A zone append write BIO must:
* Target a zoned block device
* Have a sector position indicating the start sector of the target zone
* The target zone must be a sequential write zone
* The BIO must not cross a zone boundary
* The BIO size must not be split to ensure that a single range of LBAs
is written with a single command.
Implement these checks in generic_make_request_checks() using the
helper function blk_check_zone_append(). To avoid write append BIO
splitting, introduce the new max_zone_append_sectors queue limit
attribute and ensure that a BIO size is always lower than this limit.
Export this new limit through sysfs and check these limits in bio_full().
Also when a LLDD can't dispatch a request to a specific zone, it
will return BLK_STS_ZONE_RESOURCE indicating this request needs to
be delayed, e.g. because the zone it will be dispatched to is still
write-locked. If this happens set the request aside in a local list
to continue trying dispatching requests such as READ requests or a
WRITE/ZONE_APPEND requests targetting other zones. This way we can
still keep a high queue depth without starving other requests even if
one request can't be served due to zone write-locking.
Finally, make sure that the bio sector position indicates the actual
write position as indicated by the device on completion.
Signed-off-by: Keith Busch <kbusch@kernel.org>
[ jth: added zone-append specific add_page and merge_page helpers ]
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename __bio_add_pc_page() to bio_add_hw_page() and explicitly pass in a
max_sectors argument.
This max_sectors argument can be used to specify constraints from the
hardware.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ jth: rebased and made public for blk-map.c ]
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The bio_map_* helpers are just the low-level helpers for the
blk_rq_map_* APIs. Move them together for better logical grouping,
as no there isn't much overlap with other code in bio.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is bio layer functionality and not related to buffer heads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.
This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.
If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.
Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.
Example: common HDD executes random read 4k requests around 12ms.
fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb
Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:
Before:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43
After:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99
Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.
For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.
Fixes: 5b18b5a737 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thes functions aren't really related to partition support, so move them
to a more suitable place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.
Another report is that task hang can be observed when running mkfs
over raid10 which takes a small max discard sectors limit because
of chunk size.
So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.
Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi <sqazi@google.com>
Reviewed-by: Jesse Barnes <jsbarnes@google.com>
Reviewed-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.
So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.
Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.
Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Fold in kerneldoc and bio_op() change.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.
Commit dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.
Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.
Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:
- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.
- zero all truncated segments for read bio
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This partially reverts commit e3a5d8e386.
Commit e3a5d8e386 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached. Instead, what we want here is only
the bi_size overflow check.
Fixes: e3a5d8e386 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
7c20f11680 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.
So memory leak of bio integrity data is caused by commit 7c20f11680.
Fixes this issue by re-adding bio_integrity_free() to bio_uninit().
Fixes: 7c20f11680 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by Justin Tee <justin.tee@broadcom.com>
Add commit log, and simplify/fix the original patch wroten by Justin.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hiding page refcount manipulation inside a low-level bio helper is
somewhat awkward. Instead return the same page information to the
callers, where it fits in much better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Passsthrough bio handling should be the same as normal bio handling,
except that we need to take hardware limitations into account. Thus
use the common try_merge implementation after checking the hardware
limits. This changes behavior in that we now also check segment
and dma boundary settings for same page merges, which is a little
more work but has no effect as those need to be larger than the
page size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we can add more data into an existing segment we do not create a gap
per definition, so move the check for a gap after the attempt to merge
into the segment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
psi tracks the time tasks wait for refaulting pages to become
uptodate, but it does not track the time spent submitting the IO. The
submission part can be significant if backing storage is contended or
when cgroup throttling (io.latency) is in effect - a lot of time is
spent in submit_bio(). In that case, we underreport memory pressure.
Annotate submit_bio() to account submission time as memory stall when
the bio is reading userspace workingset pages.
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that there no module users left of bio_map_kern, stop exporting the
symbol.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hans Holmberg <hans@owltronix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since what the bio splitting functions do is nontrivial, document these
functions.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To allow the SCSI subsystem scsi_execute_req() function to issue
requests using large buffers that are better allocated with vmalloc()
rather than kmalloc(), modify bio_map_kern() to allow passing a buffer
allocated with vmalloc().
To do so, detect vmalloc-ed buffers using is_vmalloc_addr(). For
vmalloc-ed buffers, flush the buffer using flush_kernel_vmap_range(),
use vmalloc_to_page() instead of virt_to_page() to obtain the pages of
the buffer, and invalidate the buffer addresses with
invalidate_kernel_vmap_range() on completion of read BIOs. This last
point is executed using the function bio_invalidate_vmalloc_pages()
which is defined only if the architecture defines
ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, that is, if the architecture
actually needs the invalidation done.
Fixes: 515ce60613 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a374 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
bytes.
Before 07173c3ec2 ("block: enable multipage bvecs"), one bio can
include very limited pages, and usually at most 256, so the fs bio
size won't be bigger than 1M bytes most of times.
Since we support multi-page bvec, in theory one fs bio really can
be added > 1M pages, especially in case of hugepage, or big writeback
with too many dirty pages. Then there is chance in which .bi_size
is overflowed.
Fixes this issue by using bio_full() to check if the added segment may
overflow .bi_size.
Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 07173c3ec2 ("block: enable multipage bvecs")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>