-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl8GyVUACgkQONu9yGCS
aT6MCBAAxQpPQYQ8XNq40G2U3rENx2gIOgSA9WjkGsCij6Ad8VjkbA2KqqJ6v0GF
McIomgdR+bviZKYPJHfdVlOGWGGTtDe1hi6GVoXuc/CTOPK386/v/vMTumdQ2bLb
5fkuvpFGNdxGmhi+Vpu6zajvF8CVHmCggQQGlrIyIIOXPwN/DB/7XK6lmtx0vcff
YYZI3L1+q1rJz6vV5K5xTwpPwCQDsdYW93KQpcccVKMAmnAzFolaWyvWitvqOgT5
xrnelzjrIy5v2BST0S7jL0nzEeXB6bq1tmJHbd+cTvIaKJx4NR8yx2cXzvospi3l
TpwuycBo/0LRWAiFZ+29+HOHkvUx7sa+BFCLiZEu1dLAxRXdi/W0a/i0QFGb0ex/
YO8dBVEdHVRm0aXlfnipJYuNdTxzSzUG0GHOvZB11h/zxybiLWjJGutofX0tZXYj
JuO9UhP0eSldtfuATrXRPsSsTWWen1igVTWFVHEZPWGvR4TiU8jpFPOTFcrraglM
07M3ooA6FlWrTt7n9YdGLaQY6w2z+pREMgnUesNY9yQeRxY2EAsa95Lu+Y6aTeYP
YfTPQHhbbz3XggFljGYIbvM6wlzgghSLH09CX8zxFqeYyTSFsNDKBuLslmTMS/21
zwpG81hm7fE58N3uUHqSU3/bN8kzCH6GsU78bFAXb+jaNKgG5Qg=
=0uVY
-----END PGP SIGNATURE-----
Merge 5.4.51 into android11-5.4
Changes in 5.4.51
io_uring: make sure async workqueue is canceled on exit
mm: fix swap cache node allocation mask
EDAC/amd64: Read back the scrub rate PCI register on F15h
usbnet: smsc95xx: Fix use-after-free after removal
sched/debug: Make sd->flags sysctl read-only
mm/slub.c: fix corrupted freechain in deactivate_slab()
mm/slub: fix stack overruns with SLUB_STATS
rxrpc: Fix race between incoming ACK parser and retransmitter
usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect
tools lib traceevent: Add append() function helper for appending strings
tools lib traceevent: Handle __attribute__((user)) in field names
s390/debug: avoid kernel warning on too large number of pages
nvme-multipath: set bdi capabilities once
nvme-multipath: fix deadlock between ana_work and scan_work
nvme-multipath: fix deadlock due to head->lock
nvme-multipath: fix bogus request queue reference put
kgdb: Avoid suspicious RCU usage warning
selftests: tpm: Use /bin/sh instead of /bin/bash
tpm: Fix TIS locality timeout problems
crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
drm/msm/dpu: fix error return code in dpu_encoder_init
rxrpc: Fix afs large storage transmission performance drop
RDMA/counter: Query a counter before release
cxgb4: use unaligned conversion for fetching timestamp
cxgb4: parse TC-U32 key values and masks natively
cxgb4: fix endian conversions for L4 ports in filters
cxgb4: use correct type for all-mask IP address comparison
cxgb4: fix SGE queue dump destination buffer context
hwmon: (max6697) Make sure the OVERT mask is set correctly
hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add()
thermal/drivers/mediatek: Fix bank number settings on mt8183
thermal/drivers/rcar_gen3: Fix undefined temperature if negative
nfsd4: fix nfsdfs reference count loop
nfsd: fix nfsdfs inode reference count leak
drm: sun4i: hdmi: Remove extra HPD polling
virtio-blk: free vblk-vqs in error path of virtblk_probe()
SMB3: Honor 'posix' flag for multiuser mounts
nvme: fix identify error status silent ignore
nvme: fix a crash in nvme_mpath_add_disk
samples/vfs: avoid warning in statx override
i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
i2c: mlxcpld: check correct size of maximum RECV_LEN packet
spi: spi-fsl-dspi: Fix external abort on interrupt in resume or exit paths
nfsd: apply umask on fs without ACL support
Revert "ALSA: usb-audio: Improve frames size computation"
SMB3: Honor 'seal' flag for multiuser mounts
SMB3: Honor persistent/resilient handle flags for multiuser mounts
SMB3: Honor lease disabling for multiuser mounts
SMB3: Honor 'handletimeout' flag for multiuser mounts
cifs: Fix the target file was deleted when rename failed.
MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names
MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
drm/amd/display: Only revalidate bandwidth on medium and fast updates
drm/amdgpu: use %u rather than %d for sclk/mclk
drm/amdgpu/atomfirmware: fix vram_info fetching for renoir
dma-buf: Move dma_buf_release() from fops to dentry_ops
irqchip/gic: Atomically update affinity
mm, compaction: fully assume capture is not NULL in compact_zone_order()
mm, compaction: make capture control handling safe wrt interrupts
x86/resctrl: Fix memory bandwidth counter width for AMD
dm zoned: assign max_io_len correctly
efi: Make it possible to disable efivar_ssdt entirely
Linux 5.4.51
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib29e1b61540451995ee757cf90c9ba24e6b8d6b3
commit 7b2377486767503d47265e4d487a63c651f6b55d upstream.
The unit of max_io_len is sector instead of byte (spotted through
code review), so fix it.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5qJSMACgkQONu9yGCS
aT6/Dw//Usg9m0cBB4Ip4fYxI0EVz8BgnVe9KSdt+71gM63QCOi1ZeTS0NDMUtO0
MTsQSudUpfntrT8QHCmBwCZ5LlAAZvxDS9UOqnhkWbqNY5jGmUhH5u28RJL28dp2
8wJY6zZKg+pfOWXd81slW86uN27QZvURNEthT81sN2ucxe5DXV1gs87FILSdMpXm
I0Z3LpUoZDjpONeA6WTZqkDNA0J7Z9QjULx9/4LFi/gc0q1ApWC7FV1A9gpQHaBa
w4kDWJCGqq3mNx8Hi9BHau50VUHX5tuKvpn9RcmSl9BBba25pE5h0EVIGo8Dlq+9
T9hkVR5iXeMbFERnLm5iR0DjFHog/mOgAgUHSTTXB3BcdgIKWwUcc2gCcr2Y7KIK
CD7l+kX1nWUk4yYre7zXiG/vO9ilYgeboc8C5Qdq3XR6zaO90+8NUbCOpa2+6yEF
H7kugstb6l+iCJ1k8YJd0ORGOobl68+P79TLxAOFnkNGJRzuAoXmBH+xkqAugz1H
YKKAbE+MzW75sre7PxU1g1uPOHxfMfd5e3uRtUU5OETJv0A2kTte8ay5rqLNbe7H
QYqdfwTr2oFssnWKW5d/KdSopD5A/31/Kjkmzl6ED2xaLMEpA7zyed5p+G/Beu5s
dkPlteya8wCQ1W/KtDJRhbCauoG/NyCKIeoQitHBJwMapcEo8ZU=
=rDP8
-----END PGP SIGNATURE-----
Merge 5.4.25 into android-5.4
Changes in 5.4.25
block, bfq: get extra ref to prevent a queue from being freed during a group move
block, bfq: do not insert oom queue into position tree
ALSA: hda/realtek - Fix a regression for mute led on Lenovo Carbon X1
net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
net: stmmac: fix notifier registration
dm thin metadata: fix lockdep complaint
RDMA/core: Fix pkey and port assignment in get_new_pps
RDMA/core: Fix use of logical OR in get_new_pps
kbuild: fix 'No such file or directory' warning when cleaning
kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
blktrace: fix dereference after null check
ALSA: hda: do not override bus codec_mask in link_get()
serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
selftests: fix too long argument
usb: gadget: composite: Support more than 500mA MaxPower
usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
usb: gadget: serial: fix Tx stall after buffer overflow
habanalabs: halt the engines before hard-reset
habanalabs: do not halt CoreSight during hard reset
habanalabs: patched cb equals user cb in device memset
drm/msm/mdp5: rate limit pp done timeout warnings
drm: msm: Fix return type of dsi_mgr_connector_mode_valid for kCFI
drm/modes: Make sure to parse valid rotation value from cmdline
drm/modes: Allow DRM_MODE_ROTATE_0 when applying video mode parameters
scsi: megaraid_sas: silence a warning
drm/msm/dsi: save pll state before dsi host is powered off
drm/msm/dsi/pll: call vco set rate explicitly
selftests: forwarding: use proto icmp for {gretap, ip6gretap}_mac testing
selftests: forwarding: vxlan_bridge_1d: fix tos value
net: atlantic: check rpc result and wait for rpc address
net: ks8851-ml: Remove 8-bit bus accessors
net: ks8851-ml: Fix 16-bit data access
net: ks8851-ml: Fix 16-bit IO operation
net: ethernet: dm9000: Handle -EPROBE_DEFER in dm9000_parse_dt()
watchdog: da9062: do not ping the hw during stop()
s390/cio: cio_ignore_proc_seq_next should increase position index
s390: make 'install' not depend on vmlinux
efi: Only print errors about failing to get certs if EFI vars are found
net/mlx5: DR, Fix matching on vport gvmi
iommu/amd: Disable IOMMU on Stoney Ridge systems
nvme/pci: Add sleep quirk for Samsung and Toshiba drives
nvme-pci: Use single IRQ vector for old Apple models
x86/boot/compressed: Don't declare __force_order in kaslr_64.c
s390/qdio: fill SL with absolute addresses
nvme: Fix uninitialized-variable warning
ice: Don't tell the OS that link is going down
x86/xen: Distribute switch variables for initialization
net: thunderx: workaround BGX TX Underflow issue
csky/mm: Fixup export invalid_pte_table symbol
csky: Set regs->usp to kernel sp, when the exception is from kernel
csky/smp: Fixup boot failed when CONFIG_SMP
csky: Fixup ftrace modify panic
csky: Fixup compile warning for three unimplemented syscalls
arch/csky: fix some Kconfig typos
selftests: forwarding: vxlan_bridge_1d: use more proper tos value
firmware: imx: scu: Ensure sequential TX
binder: prevent UAF for binderfs devices
binder: prevent UAF for binderfs devices II
ALSA: hda/realtek - Add Headset Mic supported
ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1
ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
ALSA: hda/realtek - Enable the headset of ASUS B9450FA with ALC294
cifs: don't leak -EAGAIN for stat() during reconnect
cifs: fix rename() by ensuring source handle opened with DELETE bit
usb: storage: Add quirk for Samsung Fit flash
usb: quirks: add NO_LPM quirk for Logitech Screen Share
usb: dwc3: gadget: Update chain bit correctly when using sg list
usb: cdns3: gadget: link trb should point to next request
usb: cdns3: gadget: toggle cycle bit before reset endpoint
usb: core: hub: fix unhandled return by employing a void function
usb: core: hub: do error out if usb_autopm_get_interface() fails
usb: core: port: do error out if usb_autopm_get_interface() fails
vgacon: Fix a UAF in vgacon_invert_region
mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
fat: fix uninit-memory access for partial initialized inode
btrfs: fix RAID direct I/O reads with alternate csums
arm64: dts: socfpga: agilex: Fix gmac compatible
arm: dts: dra76x: Fix mmc3 max-frequency
tty:serial:mvebu-uart:fix a wrong return
tty: serial: fsl_lpuart: free IDs allocated by IDA
serial: 8250_exar: add support for ACCES cards
vt: selection, close sel_buffer race
vt: selection, push console lock down
vt: selection, push sel_lock up
media: hantro: Fix broken media controller links
media: mc-entity.c: use & to check pad flags, not ==
media: vicodec: process all 4 components for RGB32 formats
media: v4l2-mem2mem.c: fix broken links
perf intel-pt: Fix endless record after being terminated
perf intel-bts: Fix endless record after being terminated
perf cs-etm: Fix endless record after being terminated
perf arm-spe: Fix endless record after being terminated
spi: spidev: Fix CS polarity if GPIO descriptors are used
x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
s390/pci: Fix unexpected write combine on resource
s390/mm: fix panic in gup_fast on large pud
dmaengine: imx-sdma: fix context cache
dmaengine: imx-sdma: Fix the event id check to include RX event for UART6
dmaengine: tegra-apb: Fix use-after-free
dmaengine: tegra-apb: Prevent race conditions of tasklet vs free list
dm integrity: fix recalculation when moving from journal mode to bitmap mode
dm integrity: fix a deadlock due to offloading to an incorrect workqueue
dm integrity: fix invalid table returned due to argument count mismatch
dm cache: fix a crash due to incorrect work item cancelling
dm: report suspended device during destroy
dm writecache: verify watermark during resume
dm zoned: Fix reference counter initial value of chunk works
dm: fix congested_fn for request-based device
arm64: dts: meson-sm1-sei610: add missing interrupt-names
ARM: dts: ls1021a: Restore MDIO compatible to gianfar
spi: bcm63xx-hsspi: Really keep pll clk enabled
drm/virtio: make resource id workaround runtime switchable.
drm/virtio: fix resource id creation race
ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
ASoC: topology: Fix memleak in soc_tplg_manifest_load()
ASoC: SOF: Fix snd_sof_ipc_stream_posn()
ASoC: intel: skl: Fix pin debug prints
ASoC: intel: skl: Fix possible buffer overflow in debug outputs
powerpc: define helpers to get L1 icache sizes
powerpc: Convert flush_icache_range & friends to C
powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
ASoC: pcm: Fix possible buffer overflow in dpcm state sysfs output
ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
ASoC: Intel: Skylake: Fix available clock counter incrementation
ASoC: dapm: Correct DAPM handling of active widgets during shutdown
spi: atmel-quadspi: fix possible MMIO window size overrun
drm/panfrost: Don't try to map on error faults
drm: kirin: Revert "Fix for hikey620 display offset problem"
drm/sun4i: Add separate DE3 VI layer formats
drm/sun4i: Fix DE2 VI layer format support
drm/sun4i: de2/de3: Remove unsupported VI layer formats
drm/i915: Program MBUS with rmw during initialization
drm/i915/selftests: Fix return in assert_mmap_offset()
phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
ARM: dts: imx6: phycore-som: fix emmc supply
arm64: dts: imx8qxp-mek: Remove unexisting Ethernet PHY
firmware: imx: misc: Align imx sc msg structs to 4
firmware: imx: scu-pd: Align imx sc msg structs to 4
firmware: imx: Align imx_sc_msg_req_cpu_start to 4
soc: imx-scu: Align imx sc msg structs to 4
Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
RDMA/rw: Fix error flow during RDMA context initialization
RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
RDMA/siw: Fix failure handling during device creation
RDMA/iwcm: Fix iwcm work deallocation
RDMA/core: Fix protection fault in ib_mr_pool_destroy
regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling
RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
IB/hfi1, qib: Ensure RCU is locked when accessing list
ARM: imx: build v7_cpu_resume() unconditionally
ARM: dts: am437x-idk-evm: Fix incorrect OPP node names
ARM: dts: dra7xx-clocks: Fixup IPU1 mux clock parent source
ARM: dts: imx7-colibri: Fix frequency for sd/mmc
hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
dma-buf: free dmabuf->name in dma_buf_release()
dmaengine: coh901318: Fix a double lock bug in dma_tc_handle()
arm64: dts: meson: fix gxm-khadas-vim2 wifi
bus: ti-sysc: Fix 1-wire reset quirk
EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
efi: READ_ONCE rng seed size before munmap
block, bfq: get a ref to a group when adding it to a service tree
block, bfq: remove ifdefs from around gets/puts of bfq groups
csky: Implement copy_thread_tls
drm/virtio: module_param_named() requires linux/moduleparam.h
Linux 5.4.25
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ba29f273c7a2b02bfa54593f7a9087c34607cd5
commit ee63634bae02e13c8c0df1209a6a0ca5326f3189 upstream.
Dm-zoned initializes reference counters of new chunk works with zero
value and refcount_inc() is called to increment the counter. However, the
refcount_inc() function handles the addition to zero value as an error
and triggers the warning as follows:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0
...
CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134
...
Call Trace:
dmz_map+0x2d2/0x350 [dm_zoned]
__map_bio+0x42/0x1a0
__split_and_process_non_flush+0x14a/0x1b0
__split_and_process_bio+0x83/0x240
? kmem_cache_alloc+0x165/0x220
dm_process_bio+0x90/0x230
? generic_make_request_checks+0x2e7/0x680
dm_make_request+0x3e/0xb0
generic_make_request+0xcf/0x320
? memcg_drain_all_list_lrus+0x1c0/0x1c0
submit_bio+0x3c/0x160
? guard_bio_eod+0x2c/0x130
mpage_readpages+0x182/0x1d0
? bdev_evict_inode+0xf0/0xf0
read_pages+0x6b/0x1b0
__do_page_cache_readahead+0x1ba/0x1d0
force_page_cache_readahead+0x93/0x100
generic_file_read_iter+0x83a/0xe40
? __seccomp_filter+0x7b/0x670
new_sync_read+0x12a/0x1c0
vfs_read+0x9d/0x150
ksys_read+0x5f/0xe0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
After this warning, following refcount API calls for the counter all fail
to change the counter value.
Fix this by setting the initial reference counter value not zero but one
for the new chunk works. Instead, do not call refcount_inc() via
dmz_get_chunk_work() for the new chunks works.
The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL
enabled. Refcount rework was merged to linux version 5.5 by the
commit 168829ad09ca ("Merge branch 'locking-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this
commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed
regardless of kernel configuration.
Linux version 4.20 merged the commit 092b564876 ("dm zoned: target: use
refcount_t for dm zoned reference counters"). Before this commit, dm
zoned used atomic_t APIs which does not check addition to zero, then this
fix is not necessary.
Fixes: 092b564876 ("dm zoned: target: use refcount_t for dm zoned reference counters")
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merged in v5.5-rc1.
* aosp/upstream-f2fs-stable-linux-5.4.y:
docs: fs-verity: mention statx() support
f2fs: support STATX_ATTR_VERITY
ext4: support STATX_ATTR_VERITY
statx: define STATX_ATTR_VERITY
docs: fs-verity: document first supported kernel version
f2fs: add support for IV_INO_LBLK_64 encryption policies
ext4: add support for IV_INO_LBLK_64 encryption policies
fscrypt: add support for IV_INO_LBLK_64 policies
fscrypt: avoid data race on fscrypt_mode::logged_impl_name
fscrypt: zeroize fscrypt_info before freeing
fscrypt: remove struct fscrypt_ctx
fscrypt: invoke crypto API for ESSIV handling
null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
block: set the zone size in blk_revalidate_disk_zones atomically
block: don't handle bio based drivers in blk_revalidate_disk_zones
null_blk: cleanup null_gendisk_register
null_blk: fix zone size paramter check
block: allocate the zone bitmaps lazily
block: replace seq_zones_bitmap with conv_zones_bitmap
block: simplify blkdev_nr_zones
block: remove the empty line at the end of blk-zoned.c
scsi: sd_zbc: Improve report zones error printout
scsi: sd_zbc: Remove set but not used variable 'buflen'
block: rework zone reporting
scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
null_blk: clean up report zones
null_blk: clean up the block device operations
null_blk: return fixed zoned reads > write pointer
scsi: sd_zbc: add zone open, close, and finish support
block: Remove partition support for zoned block devices
block: Simplify report zones execution
block: cleanup the !zoned case in blk_revalidate_disk_zones
block: Enhance blk_revalidate_disk_zones()
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
f2fs: stop GC when the victim becomes fully valid
f2fs: expose main_blkaddr in sysfs
f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
f2fs: show f2fs instance in printk_ratelimited
f2fs: fix potential overflow
f2fs: fix to update dir's i_pino during cross_rename
f2fs: support aligned pinned file
f2fs: avoid kernel panic on corruption test
f2fs: fix wrong description in document
f2fs: cache global IPU bio
f2fs: fix to avoid memory leakage in f2fs_listxattr
f2fs: check total_segments from devices in raw_super
f2fs: update multi-dev metadata in resize_fs
f2fs: mark recovery flag correctly in read_raw_super_block()
f2fs: fix to update time in lazytime mode
Change-Id: I9325127228fb82b67f064ce8b3bc8d40ac76e65b
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity. This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit e7fad909b68aa37470d9f2d2731b5bec355ee5d6 upstream.
Commit 75d66ffb48 added backing device health checks and as a part
of these checks, check_events() block ops template call is invoked in
dm-zoned mapping path as well as in reclaim and flush path. Calling
check_events() with ATA or SCSI backing devices introduces a blocking
scsi_test_unit_ready() call being made in sd_check_events(). Even though
the overhead of calling scsi_test_unit_ready() is small for ATA zoned
devices, it is much larger for SCSI and it affects performance in a very
negative way.
Fix this performance regression by executing check_events() only in case
of any I/O errors. The function dmz_bdev_is_dying() is modified to call
only blk_queue_dying(), while calls to check_events() are made in a new
helper function, dmz_check_bdev().
Reported-by: zhangxiaoxu <zhangxiaoxu5@huawei.com>
Fixes: 75d66ffb48 ("dm zoned: properly handle backing device failure")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm-zoned is observed to lock up or livelock in case of hardware
failure or some misconfiguration of the backing zoned device.
This patch adds a new dm-zoned target function that checks the status of
the backing device. If the request queue of the backing device is found
to be in dying state or the SCSI backing device enters offline state,
the health check code sets a dm-zoned target flag prompting all further
incoming I/O to be rejected. In order to detect backing device failures
timely, this new function is called in the request mapping path, at the
beginning of every reclaim run and before performing any metadata I/O.
The proper way out of this situation is to do
dmsetup remove <dm-zoned target>
and recreate the target when the problem with the backing device
is resolved.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Some errors are ignored in the I/O path during queueing chunks
for processing by chunk works. Since at least these errors are
transient in nature, it should be possible to retry the failed
incoming commands.
The fix -
Errors that can happen while queueing chunks are carried upwards
to the main mapping function and it now returns DM_MAPIO_REQUEUE
for any incoming requests that can not be properly queued.
Error logging/debug messages are added where needed.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
My static checker complains about this line from dmz_get_zoned_device()
aligned_capacity = dev->capacity & ~(blk_queue_zone_sectors(q) - 1);
The problem is that "aligned_capacity" and "dev->capacity" are sector_t
type (which is a u64 under most configs) but blk_queue_zone_sectors(q)
returns a u32 so the higher 32 bits in aligned_capacity are cleared to
zero. This patch adds a cast to address the issue.
Fixes: 114e025968 ("dm zoned: ignore last smaller runt zone")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits. A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
struct bioctx includes the ref refcount_t to track the number of I/O
fragments used to process a target BIO as well as ensure that the zone
of the BIO is kept in the active state throughout the lifetime of the
BIO. However, since decrementing of this reference count is done in the
target .end_io method, the function bio_endio() must be called multiple
times for read and write target BIOs, which causes problems with the
value of the __bi_remaining struct bio field for chained BIOs (e.g. the
clone BIO passed by dm core is large and splits into fragments by the
block layer), resulting in incorrect values and inconsistencies with the
BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:
BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);
in bio_remaining_done() called from bio_endio().
Fix this ensuring that bio_endio() is called only once for any target
BIO by always using internal clone BIOs for processing any read or
write target BIO. This allows reference counting using the target BIO
context counter to trigger the target BIO completion bio_endio() call
once all data, metadata and other zone work triggered by the BIO
complete.
Overall, this simplifies the code too as the target .end_io becomes
unnecessary and differences between read and write BIO issuing and
completion processing disappear.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(.request_fn) from request-based DM. Jens has already started
preparing for complete removal of the legacy IO path in 4.21 but this
earlier removal of support from DM has been coordinated with Jens (as
evidenced by the commit being attributed to him). Making
request-based DM exclussively blk-mq only cleans up that portion of DM
core quite nicely.
- Convert the thinp and zoned targets over to using refcount_t where
applicable.
- A couple fixes to the DM zoned target for refcounting and other races
buried in the implementation of metadata block creation and use.
- Small cleanups to remove redundant unlikely() around a couple
WARN_ON_ONCE().
- Simplify how dm-ioctl copies from userspace, eliminating some
potential for a malicious user trying to change the executed ioctl
after its processing has begun.
- Tweaked DM crypt target to use the DM device name when naming the
various workqueues created for a particular DM crypt device (makes the
N workqueues for a DM crypt device more easily understood and enhances
user's accounting capabilities at a glance via "ps")
- Small fixup to remove dead branch in DM writecache's memory_entry().
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJb0zCmAAoJEMUj8QotnQNaUKwIAKFC3cuTtQyh3LomSbT4uAr5
6apBSVcxVhqU+isriW3HmBPkO4HGyqMWjX5oQGHrOj0YK0i1H65Nq3qH9ATaiSHn
awdo8A4YmClF5Mojc51UebXIH0IfnGSOKH/FHNhQzT3jAdn+vYinMSZ28JwFPgKW
DsVOSM1dlJZBWRXhQNpyCjVl9Xb3rRUOnkfG0endyMfOsnoxKurhwSkXoStzCdQn
O5ubt1XT3wMKoI1k9QWjfrBU1NtZZYD+kQ6EfkYXfL9RNhhZjwzO/eNtqT3jnKsq
qbcd8/0JIUttPf7+F0URG9mbMbebfJGqNAaJWcnlRbCHmgUBBGVWsnl8MTWOLkw=
=MdDN
-----END PGP SIGNATURE-----
Merge tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- The biggest change this cycle is to remove support for the legacy IO
path (.request_fn) from request-based DM.
Jens has already started preparing for complete removal of the legacy
IO path in 4.21 but this earlier removal of support from DM has been
coordinated with Jens (as evidenced by the commit being attributed to
him).
Making request-based DM exclussively blk-mq only cleans up that
portion of DM core quite nicely.
- Convert the thinp and zoned targets over to using refcount_t where
applicable.
- A couple fixes to the DM zoned target for refcounting and other races
buried in the implementation of metadata block creation and use.
- Small cleanups to remove redundant unlikely() around a couple
WARN_ON_ONCE().
- Simplify how dm-ioctl copies from userspace, eliminating some
potential for a malicious user trying to change the executed ioctl
after its processing has begun.
- Tweaked DM crypt target to use the DM device name when naming the
various workqueues created for a particular DM crypt device (makes
the N workqueues for a DM crypt device more easily understood and
enhances user's accounting capabilities at a glance via "ps")
- Small fixup to remove dead branch in DM writecache's memory_entry().
* tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm writecache: remove disabled code in memory_entry()
dm zoned: fix various dmz_get_mblock() issues
dm zoned: fix metadata block ref counting
dm raid: avoid bitmap with raid4/5/6 journal device
dm crypt: make workqueue names device-specific
dm: add dm_table_device_name()
dm ioctl: harden copy_params()'s copy_from_user() from malicious users
dm: remove unnecessary unlikely() around WARN_ON_ONCE()
dm zoned: target: use refcount_t for dm zoned reference counters
dm thin: use refcount_t for thin_c reference counting
dm table: require that request-based DM be layered on blk-mq devices
dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED
dm: remove legacy request-based IO path
Introduce the blkdev_nr_zones() helper function to get the total
number of zones of a zoned block device. This number is always 0 for a
regular block device (q->limits.zoned == BLK_ZONED_NONE case).
Replace hard-coded number of zones calculation in dmz_get_zoned_device()
with a call to this helper.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The API surrounding refcount_t should be used in place of atomic_t
when variables are being used as reference counters. This API can
prevent issues such as counter overflows and use-after-free
conditions. Within the dm zoned target stack, the atomic_t API is
used for bioctx->ref and cw->refcount. Change these to use
refcount_t, avoiding the issues mentioned.
Signed-off-by: John Pittman <jpittman@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Eliminate most holes in DM data structures that were modified by
commit 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()").
Also prevent structure members from unnecessarily spanning cache
lines.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Convert dm to embedded bio sets.
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.
All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.
Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.
While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().
Reported-by: Peter Desnoyers <pjd@ccs.neu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use GFP_NOIO for memory allocations in the I/O path. Other memory
allocations in the initialization path can use GFP_KERNEL.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The value REQ_OP_FLUSH is only used by the block code for
request-based devices.
Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The dm-zoned device mapper target provides transparent write access
to zoned block devices (ZBC and ZAC compliant block devices).
dm-zoned hides to the device user (a file system or an application
doing raw block device accesses) any constraint imposed on write
requests by the device, equivalent to a drive-managed zoned block
device model.
Write requests are processed using a combination of on-disk buffering
using the device conventional zones and direct in-place processing for
requests aligned to a zone sequential write pointer position.
A background reclaim process implemented using dm_kcopyd_copy ensures
that conventional zones are always available for executing unaligned
write requests. The reclaim process overhead is minimized by managing
buffer zones in a least-recently-written order and first targeting the
oldest buffer zones. Doing so, blocks under regular write access (such
as metadata blocks of a file system) remain stored in conventional
zones, resulting in no apparent overhead.
dm-zoned implementation focus on simplicity and on minimizing overhead
(CPU, memory and storage overhead). For a 14TB host-managed disk with
256 MB zones, dm-zoned memory usage per disk instance is at most about
3 MB and as little as 5 zones will be used internally for storing metadata
and performing buffer zone reclaim operations. This is achieved using
zone level indirection rather than a full block indirection system for
managing block movement between zones.
dm-zoned primary target is host-managed zoned block devices but it can
also be used with host-aware device models to mitigate potential
device-side performance degradation due to excessive random writing.
Zoned block devices can be formatted and checked for use with the dm-zoned
target using the dmzadm utility available at:
https://github.com/hgst/dm-zoned-tools
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer partly refactored Damien's original work to cleanup the code]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>