Commit Graph

27 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
4022b5a85f This is the 5.4.51 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl8GyVUACgkQONu9yGCS
 aT6MCBAAxQpPQYQ8XNq40G2U3rENx2gIOgSA9WjkGsCij6Ad8VjkbA2KqqJ6v0GF
 McIomgdR+bviZKYPJHfdVlOGWGGTtDe1hi6GVoXuc/CTOPK386/v/vMTumdQ2bLb
 5fkuvpFGNdxGmhi+Vpu6zajvF8CVHmCggQQGlrIyIIOXPwN/DB/7XK6lmtx0vcff
 YYZI3L1+q1rJz6vV5K5xTwpPwCQDsdYW93KQpcccVKMAmnAzFolaWyvWitvqOgT5
 xrnelzjrIy5v2BST0S7jL0nzEeXB6bq1tmJHbd+cTvIaKJx4NR8yx2cXzvospi3l
 TpwuycBo/0LRWAiFZ+29+HOHkvUx7sa+BFCLiZEu1dLAxRXdi/W0a/i0QFGb0ex/
 YO8dBVEdHVRm0aXlfnipJYuNdTxzSzUG0GHOvZB11h/zxybiLWjJGutofX0tZXYj
 JuO9UhP0eSldtfuATrXRPsSsTWWen1igVTWFVHEZPWGvR4TiU8jpFPOTFcrraglM
 07M3ooA6FlWrTt7n9YdGLaQY6w2z+pREMgnUesNY9yQeRxY2EAsa95Lu+Y6aTeYP
 YfTPQHhbbz3XggFljGYIbvM6wlzgghSLH09CX8zxFqeYyTSFsNDKBuLslmTMS/21
 zwpG81hm7fE58N3uUHqSU3/bN8kzCH6GsU78bFAXb+jaNKgG5Qg=
 =0uVY
 -----END PGP SIGNATURE-----

Merge 5.4.51 into android11-5.4

Changes in 5.4.51
	io_uring: make sure async workqueue is canceled on exit
	mm: fix swap cache node allocation mask
	EDAC/amd64: Read back the scrub rate PCI register on F15h
	usbnet: smsc95xx: Fix use-after-free after removal
	sched/debug: Make sd->flags sysctl read-only
	mm/slub.c: fix corrupted freechain in deactivate_slab()
	mm/slub: fix stack overruns with SLUB_STATS
	rxrpc: Fix race between incoming ACK parser and retransmitter
	usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect
	tools lib traceevent: Add append() function helper for appending strings
	tools lib traceevent: Handle __attribute__((user)) in field names
	s390/debug: avoid kernel warning on too large number of pages
	nvme-multipath: set bdi capabilities once
	nvme-multipath: fix deadlock between ana_work and scan_work
	nvme-multipath: fix deadlock due to head->lock
	nvme-multipath: fix bogus request queue reference put
	kgdb: Avoid suspicious RCU usage warning
	selftests: tpm: Use /bin/sh instead of /bin/bash
	tpm: Fix TIS locality timeout problems
	crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
	drm/msm/dpu: fix error return code in dpu_encoder_init
	rxrpc: Fix afs large storage transmission performance drop
	RDMA/counter: Query a counter before release
	cxgb4: use unaligned conversion for fetching timestamp
	cxgb4: parse TC-U32 key values and masks natively
	cxgb4: fix endian conversions for L4 ports in filters
	cxgb4: use correct type for all-mask IP address comparison
	cxgb4: fix SGE queue dump destination buffer context
	hwmon: (max6697) Make sure the OVERT mask is set correctly
	hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add()
	thermal/drivers/mediatek: Fix bank number settings on mt8183
	thermal/drivers/rcar_gen3: Fix undefined temperature if negative
	nfsd4: fix nfsdfs reference count loop
	nfsd: fix nfsdfs inode reference count leak
	drm: sun4i: hdmi: Remove extra HPD polling
	virtio-blk: free vblk-vqs in error path of virtblk_probe()
	SMB3: Honor 'posix' flag for multiuser mounts
	nvme: fix identify error status silent ignore
	nvme: fix a crash in nvme_mpath_add_disk
	samples/vfs: avoid warning in statx override
	i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
	i2c: mlxcpld: check correct size of maximum RECV_LEN packet
	spi: spi-fsl-dspi: Fix external abort on interrupt in resume or exit paths
	nfsd: apply umask on fs without ACL support
	Revert "ALSA: usb-audio: Improve frames size computation"
	SMB3: Honor 'seal' flag for multiuser mounts
	SMB3: Honor persistent/resilient handle flags for multiuser mounts
	SMB3: Honor lease disabling for multiuser mounts
	SMB3: Honor 'handletimeout' flag for multiuser mounts
	cifs: Fix the target file was deleted when rename failed.
	MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names
	MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
	drm/amd/display: Only revalidate bandwidth on medium and fast updates
	drm/amdgpu: use %u rather than %d for sclk/mclk
	drm/amdgpu/atomfirmware: fix vram_info fetching for renoir
	dma-buf: Move dma_buf_release() from fops to dentry_ops
	irqchip/gic: Atomically update affinity
	mm, compaction: fully assume capture is not NULL in compact_zone_order()
	mm, compaction: make capture control handling safe wrt interrupts
	x86/resctrl: Fix memory bandwidth counter width for AMD
	dm zoned: assign max_io_len correctly
	efi: Make it possible to disable efivar_ssdt entirely
	Linux 5.4.51

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib29e1b61540451995ee757cf90c9ba24e6b8d6b3
2020-07-09 10:41:26 +02:00
Hou Tao
43986c32ee dm zoned: assign max_io_len correctly
commit 7b2377486767503d47265e4d487a63c651f6b55d upstream.

The unit of max_io_len is sector instead of byte (spotted through
code review), so fix it.

Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-09 09:37:57 +02:00
Greg Kroah-Hartman
6d52041543 This is the 5.4.25 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5qJSMACgkQONu9yGCS
 aT6/Dw//Usg9m0cBB4Ip4fYxI0EVz8BgnVe9KSdt+71gM63QCOi1ZeTS0NDMUtO0
 MTsQSudUpfntrT8QHCmBwCZ5LlAAZvxDS9UOqnhkWbqNY5jGmUhH5u28RJL28dp2
 8wJY6zZKg+pfOWXd81slW86uN27QZvURNEthT81sN2ucxe5DXV1gs87FILSdMpXm
 I0Z3LpUoZDjpONeA6WTZqkDNA0J7Z9QjULx9/4LFi/gc0q1ApWC7FV1A9gpQHaBa
 w4kDWJCGqq3mNx8Hi9BHau50VUHX5tuKvpn9RcmSl9BBba25pE5h0EVIGo8Dlq+9
 T9hkVR5iXeMbFERnLm5iR0DjFHog/mOgAgUHSTTXB3BcdgIKWwUcc2gCcr2Y7KIK
 CD7l+kX1nWUk4yYre7zXiG/vO9ilYgeboc8C5Qdq3XR6zaO90+8NUbCOpa2+6yEF
 H7kugstb6l+iCJ1k8YJd0ORGOobl68+P79TLxAOFnkNGJRzuAoXmBH+xkqAugz1H
 YKKAbE+MzW75sre7PxU1g1uPOHxfMfd5e3uRtUU5OETJv0A2kTte8ay5rqLNbe7H
 QYqdfwTr2oFssnWKW5d/KdSopD5A/31/Kjkmzl6ED2xaLMEpA7zyed5p+G/Beu5s
 dkPlteya8wCQ1W/KtDJRhbCauoG/NyCKIeoQitHBJwMapcEo8ZU=
 =rDP8
 -----END PGP SIGNATURE-----

Merge 5.4.25 into android-5.4

Changes in 5.4.25
	block, bfq: get extra ref to prevent a queue from being freed during a group move
	block, bfq: do not insert oom queue into position tree
	ALSA: hda/realtek - Fix a regression for mute led on Lenovo Carbon X1
	net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
	net: stmmac: fix notifier registration
	dm thin metadata: fix lockdep complaint
	RDMA/core: Fix pkey and port assignment in get_new_pps
	RDMA/core: Fix use of logical OR in get_new_pps
	kbuild: fix 'No such file or directory' warning when cleaning
	kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
	blktrace: fix dereference after null check
	ALSA: hda: do not override bus codec_mask in link_get()
	serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
	selftests: fix too long argument
	usb: gadget: composite: Support more than 500mA MaxPower
	usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
	usb: gadget: serial: fix Tx stall after buffer overflow
	habanalabs: halt the engines before hard-reset
	habanalabs: do not halt CoreSight during hard reset
	habanalabs: patched cb equals user cb in device memset
	drm/msm/mdp5: rate limit pp done timeout warnings
	drm: msm: Fix return type of dsi_mgr_connector_mode_valid for kCFI
	drm/modes: Make sure to parse valid rotation value from cmdline
	drm/modes: Allow DRM_MODE_ROTATE_0 when applying video mode parameters
	scsi: megaraid_sas: silence a warning
	drm/msm/dsi: save pll state before dsi host is powered off
	drm/msm/dsi/pll: call vco set rate explicitly
	selftests: forwarding: use proto icmp for {gretap, ip6gretap}_mac testing
	selftests: forwarding: vxlan_bridge_1d: fix tos value
	net: atlantic: check rpc result and wait for rpc address
	net: ks8851-ml: Remove 8-bit bus accessors
	net: ks8851-ml: Fix 16-bit data access
	net: ks8851-ml: Fix 16-bit IO operation
	net: ethernet: dm9000: Handle -EPROBE_DEFER in dm9000_parse_dt()
	watchdog: da9062: do not ping the hw during stop()
	s390/cio: cio_ignore_proc_seq_next should increase position index
	s390: make 'install' not depend on vmlinux
	efi: Only print errors about failing to get certs if EFI vars are found
	net/mlx5: DR, Fix matching on vport gvmi
	iommu/amd: Disable IOMMU on Stoney Ridge systems
	nvme/pci: Add sleep quirk for Samsung and Toshiba drives
	nvme-pci: Use single IRQ vector for old Apple models
	x86/boot/compressed: Don't declare __force_order in kaslr_64.c
	s390/qdio: fill SL with absolute addresses
	nvme: Fix uninitialized-variable warning
	ice: Don't tell the OS that link is going down
	x86/xen: Distribute switch variables for initialization
	net: thunderx: workaround BGX TX Underflow issue
	csky/mm: Fixup export invalid_pte_table symbol
	csky: Set regs->usp to kernel sp, when the exception is from kernel
	csky/smp: Fixup boot failed when CONFIG_SMP
	csky: Fixup ftrace modify panic
	csky: Fixup compile warning for three unimplemented syscalls
	arch/csky: fix some Kconfig typos
	selftests: forwarding: vxlan_bridge_1d: use more proper tos value
	firmware: imx: scu: Ensure sequential TX
	binder: prevent UAF for binderfs devices
	binder: prevent UAF for binderfs devices II
	ALSA: hda/realtek - Add Headset Mic supported
	ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1
	ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
	ALSA: hda/realtek - Enable the headset of ASUS B9450FA with ALC294
	cifs: don't leak -EAGAIN for stat() during reconnect
	cifs: fix rename() by ensuring source handle opened with DELETE bit
	usb: storage: Add quirk for Samsung Fit flash
	usb: quirks: add NO_LPM quirk for Logitech Screen Share
	usb: dwc3: gadget: Update chain bit correctly when using sg list
	usb: cdns3: gadget: link trb should point to next request
	usb: cdns3: gadget: toggle cycle bit before reset endpoint
	usb: core: hub: fix unhandled return by employing a void function
	usb: core: hub: do error out if usb_autopm_get_interface() fails
	usb: core: port: do error out if usb_autopm_get_interface() fails
	vgacon: Fix a UAF in vgacon_invert_region
	mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
	mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
	mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
	fat: fix uninit-memory access for partial initialized inode
	btrfs: fix RAID direct I/O reads with alternate csums
	arm64: dts: socfpga: agilex: Fix gmac compatible
	arm: dts: dra76x: Fix mmc3 max-frequency
	tty:serial:mvebu-uart:fix a wrong return
	tty: serial: fsl_lpuart: free IDs allocated by IDA
	serial: 8250_exar: add support for ACCES cards
	vt: selection, close sel_buffer race
	vt: selection, push console lock down
	vt: selection, push sel_lock up
	media: hantro: Fix broken media controller links
	media: mc-entity.c: use & to check pad flags, not ==
	media: vicodec: process all 4 components for RGB32 formats
	media: v4l2-mem2mem.c: fix broken links
	perf intel-pt: Fix endless record after being terminated
	perf intel-bts: Fix endless record after being terminated
	perf cs-etm: Fix endless record after being terminated
	perf arm-spe: Fix endless record after being terminated
	spi: spidev: Fix CS polarity if GPIO descriptors are used
	x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
	s390/pci: Fix unexpected write combine on resource
	s390/mm: fix panic in gup_fast on large pud
	dmaengine: imx-sdma: fix context cache
	dmaengine: imx-sdma: Fix the event id check to include RX event for UART6
	dmaengine: tegra-apb: Fix use-after-free
	dmaengine: tegra-apb: Prevent race conditions of tasklet vs free list
	dm integrity: fix recalculation when moving from journal mode to bitmap mode
	dm integrity: fix a deadlock due to offloading to an incorrect workqueue
	dm integrity: fix invalid table returned due to argument count mismatch
	dm cache: fix a crash due to incorrect work item cancelling
	dm: report suspended device during destroy
	dm writecache: verify watermark during resume
	dm zoned: Fix reference counter initial value of chunk works
	dm: fix congested_fn for request-based device
	arm64: dts: meson-sm1-sei610: add missing interrupt-names
	ARM: dts: ls1021a: Restore MDIO compatible to gianfar
	spi: bcm63xx-hsspi: Really keep pll clk enabled
	drm/virtio: make resource id workaround runtime switchable.
	drm/virtio: fix resource id creation race
	ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
	ASoC: topology: Fix memleak in soc_tplg_manifest_load()
	ASoC: SOF: Fix snd_sof_ipc_stream_posn()
	ASoC: intel: skl: Fix pin debug prints
	ASoC: intel: skl: Fix possible buffer overflow in debug outputs
	powerpc: define helpers to get L1 icache sizes
	powerpc: Convert flush_icache_range & friends to C
	powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
	ASoC: pcm: Fix possible buffer overflow in dpcm state sysfs output
	ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
	ASoC: Intel: Skylake: Fix available clock counter incrementation
	ASoC: dapm: Correct DAPM handling of active widgets during shutdown
	spi: atmel-quadspi: fix possible MMIO window size overrun
	drm/panfrost: Don't try to map on error faults
	drm: kirin: Revert "Fix for hikey620 display offset problem"
	drm/sun4i: Add separate DE3 VI layer formats
	drm/sun4i: Fix DE2 VI layer format support
	drm/sun4i: de2/de3: Remove unsupported VI layer formats
	drm/i915: Program MBUS with rmw during initialization
	drm/i915/selftests: Fix return in assert_mmap_offset()
	phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
	phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
	ARM: dts: imx6: phycore-som: fix emmc supply
	arm64: dts: imx8qxp-mek: Remove unexisting Ethernet PHY
	firmware: imx: misc: Align imx sc msg structs to 4
	firmware: imx: scu-pd: Align imx sc msg structs to 4
	firmware: imx: Align imx_sc_msg_req_cpu_start to 4
	soc: imx-scu: Align imx sc msg structs to 4
	Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
	RDMA/rw: Fix error flow during RDMA context initialization
	RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
	RDMA/siw: Fix failure handling during device creation
	RDMA/iwcm: Fix iwcm work deallocation
	RDMA/core: Fix protection fault in ib_mr_pool_destroy
	regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling
	RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
	IB/hfi1, qib: Ensure RCU is locked when accessing list
	ARM: imx: build v7_cpu_resume() unconditionally
	ARM: dts: am437x-idk-evm: Fix incorrect OPP node names
	ARM: dts: dra7xx-clocks: Fixup IPU1 mux clock parent source
	ARM: dts: imx7-colibri: Fix frequency for sd/mmc
	hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
	dma-buf: free dmabuf->name in dma_buf_release()
	dmaengine: coh901318: Fix a double lock bug in dma_tc_handle()
	arm64: dts: meson: fix gxm-khadas-vim2 wifi
	bus: ti-sysc: Fix 1-wire reset quirk
	EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
	powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
	efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
	efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
	efi: READ_ONCE rng seed size before munmap
	block, bfq: get a ref to a group when adding it to a service tree
	block, bfq: remove ifdefs from around gets/puts of bfq groups
	csky: Implement copy_thread_tls
	drm/virtio: module_param_named() requires linux/moduleparam.h
	Linux 5.4.25

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ba29f273c7a2b02bfa54593f7a9087c34607cd5
2020-03-12 17:09:04 +01:00
Shin'ichiro Kawasaki
5c929bcb7a dm zoned: Fix reference counter initial value of chunk works
commit ee63634bae02e13c8c0df1209a6a0ca5326f3189 upstream.

Dm-zoned initializes reference counters of new chunk works with zero
value and refcount_inc() is called to increment the counter. However, the
refcount_inc() function handles the addition to zero value as an error
and triggers the warning as follows:

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0
...
CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134
...
Call Trace:
 dmz_map+0x2d2/0x350 [dm_zoned]
 __map_bio+0x42/0x1a0
 __split_and_process_non_flush+0x14a/0x1b0
 __split_and_process_bio+0x83/0x240
 ? kmem_cache_alloc+0x165/0x220
 dm_process_bio+0x90/0x230
 ? generic_make_request_checks+0x2e7/0x680
 dm_make_request+0x3e/0xb0
 generic_make_request+0xcf/0x320
 ? memcg_drain_all_list_lrus+0x1c0/0x1c0
 submit_bio+0x3c/0x160
 ? guard_bio_eod+0x2c/0x130
 mpage_readpages+0x182/0x1d0
 ? bdev_evict_inode+0xf0/0xf0
 read_pages+0x6b/0x1b0
 __do_page_cache_readahead+0x1ba/0x1d0
 force_page_cache_readahead+0x93/0x100
 generic_file_read_iter+0x83a/0xe40
 ? __seccomp_filter+0x7b/0x670
 new_sync_read+0x12a/0x1c0
 vfs_read+0x9d/0x150
 ksys_read+0x5f/0xe0
 do_syscall_64+0x5b/0x180
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
...

After this warning, following refcount API calls for the counter all fail
to change the counter value.

Fix this by setting the initial reference counter value not zero but one
for the new chunk works. Instead, do not call refcount_inc() via
dmz_get_chunk_work() for the new chunks works.

The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL
enabled. Refcount rework was merged to linux version 5.5 by the
commit 168829ad09ca ("Merge branch 'locking-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this
commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed
regardless of kernel configuration.

Linux version 4.20 merged the commit 092b564876 ("dm zoned: target: use
refcount_t for dm zoned reference counters"). Before this commit, dm
zoned used atomic_t APIs which does not check addition to zero, then this
fix is not necessary.

Fixes: 092b564876 ("dm zoned: target: use refcount_t for dm zoned reference counters")
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12 13:00:24 +01:00
Jaegeuk Kim
57df3030e1 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.4.y' into android-5.4
Merged in v5.5-rc1.

* aosp/upstream-f2fs-stable-linux-5.4.y:
  docs: fs-verity: mention statx() support
  f2fs: support STATX_ATTR_VERITY
  ext4: support STATX_ATTR_VERITY
  statx: define STATX_ATTR_VERITY
  docs: fs-verity: document first supported kernel version
  f2fs: add support for IV_INO_LBLK_64 encryption policies
  ext4: add support for IV_INO_LBLK_64 encryption policies
  fscrypt: add support for IV_INO_LBLK_64 policies
  fscrypt: avoid data race on fscrypt_mode::logged_impl_name
  fscrypt: zeroize fscrypt_info before freeing
  fscrypt: remove struct fscrypt_ctx
  fscrypt: invoke crypto API for ESSIV handling
  null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
  block: set the zone size in blk_revalidate_disk_zones atomically
  block: don't handle bio based drivers in blk_revalidate_disk_zones
  null_blk: cleanup null_gendisk_register
  null_blk: fix zone size paramter check
  block: allocate the zone bitmaps lazily
  block: replace seq_zones_bitmap with conv_zones_bitmap
  block: simplify blkdev_nr_zones
  block: remove the empty line at the end of blk-zoned.c
  scsi: sd_zbc: Improve report zones error printout
  scsi: sd_zbc: Remove set but not used variable 'buflen'
  block: rework zone reporting
  scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  null_blk: clean up report zones
  null_blk: clean up the block device operations
  null_blk: return fixed zoned reads > write pointer
  scsi: sd_zbc: add zone open, close, and finish support
  block: Remove partition support for zoned block devices
  block: Simplify report zones execution
  block: cleanup the !zoned case in blk_revalidate_disk_zones
  block: Enhance blk_revalidate_disk_zones()
  block: add zone open, close and finish ioctl support
  block: add zone open, close and finish operations
  block: Simplify REQ_OP_ZONE_RESET_ALL handling
  block: Remove REQ_OP_ZONE_RESET plugging
  f2fs: stop GC when the victim becomes fully valid
  f2fs: expose main_blkaddr in sysfs
  f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
  f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
  f2fs: show f2fs instance in printk_ratelimited
  f2fs: fix potential overflow
  f2fs: fix to update dir's i_pino during cross_rename
  f2fs: support aligned pinned file
  f2fs: avoid kernel panic on corruption test
  f2fs: fix wrong description in document
  f2fs: cache global IPU bio
  f2fs: fix to avoid memory leakage in f2fs_listxattr
  f2fs: check total_segments from devices in raw_super
  f2fs: update multi-dev metadata in resize_fs
  f2fs: mark recovery flag correctly in read_raw_super_block()
  f2fs: fix to update time in lazytime mode

Change-Id: I9325127228fb82b67f064ce8b3bc8d40ac76e65b
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2020-01-15 16:41:51 -08:00
Christoph Hellwig
a354b75ed5 block: simplify blkdev_nr_zones
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity.  This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:49 -08:00
Dmitry Fomichev
fca436251d dm zoned: reduce overhead of backing device checks
commit e7fad909b68aa37470d9f2d2731b5bec355ee5d6 upstream.

Commit 75d66ffb48 added backing device health checks and as a part
of these checks, check_events() block ops template call is invoked in
dm-zoned mapping path as well as in reclaim and flush path. Calling
check_events() with ATA or SCSI backing devices introduces a blocking
scsi_test_unit_ready() call being made in sd_check_events(). Even though
the overhead of calling scsi_test_unit_ready() is small for ATA zoned
devices, it is much larger for SCSI and it affects performance in a very
negative way.

Fix this performance regression by executing check_events() only in case
of any I/O errors. The function dmz_bdev_is_dying() is modified to call
only blk_queue_dying(), while calls to check_events() are made in a new
helper function, dmz_check_bdev().

Reported-by: zhangxiaoxu <zhangxiaoxu5@huawei.com>
Fixes: 75d66ffb48 ("dm zoned: properly handle backing device failure")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 19:56:12 +01:00
Mikulas Patocka
0c8e9c2d66 dm zoned: fix invalid memory access
Commit 75d66ffb48 ("dm zoned: properly
handle backing device failure") triggers a coverity warning:

*** CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
/drivers/md/dm-zoned-target.c: 137 in dmz_submit_bio()
131             clone->bi_private = bioctx;
132
133             bio_advance(bio, clone->bi_iter.bi_size);
134
135             refcount_inc(&bioctx->ref);
136             generic_make_request(clone);
>>>     CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
>>>     Dereferencing freed pointer "clone".
137             if (clone->bi_status == BLK_STS_IOERR)
138                     return -EIO;
139
140             if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
141                     zone->wp_block += nr_blocks;
142

The "clone" bio may be processed and freed before the check
"clone->bi_status == BLK_STS_IOERR" - so this check can access invalid
memory.

Fixes: 75d66ffb48 ("dm zoned: properly handle backing device failure")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-26 10:33:58 -04:00
Dmitry Fomichev
bae9a0aa33 dm zoned: add SPDX license identifiers
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15 15:57:42 -04:00
Dmitry Fomichev
75d66ffb48 dm zoned: properly handle backing device failure
dm-zoned is observed to lock up or livelock in case of hardware
failure or some misconfiguration of the backing zoned device.

This patch adds a new dm-zoned target function that checks the status of
the backing device. If the request queue of the backing device is found
to be in dying state or the SCSI backing device enters offline state,
the health check code sets a dm-zoned target flag prompting all further
incoming I/O to be rejected. In order to detect backing device failures
timely, this new function is called in the request mapping path, at the
beginning of every reclaim run and before performing any metadata I/O.

The proper way out of this situation is to do

dmsetup remove <dm-zoned target>

and recreate the target when the problem with the backing device
is resolved.

Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15 15:57:42 -04:00
Dmitry Fomichev
d7428c5011 dm zoned: improve error handling in i/o map code
Some errors are ignored in the I/O path during queueing chunks
for processing by chunk works. Since at least these errors are
transient in nature, it should be possible to retry the failed
incoming commands.

The fix -

Errors that can happen while queueing chunks are carried upwards
to the main mapping function and it now returns DM_MAPIO_REQUEUE
for any incoming requests that can not be properly queued.

Error logging/debug messages are added where needed.

Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15 15:57:41 -04:00
Dan Carpenter
a3839bc635 dm zoned: Silence a static checker warning
My static checker complains about this line from dmz_get_zoned_device()

	aligned_capacity = dev->capacity & ~(blk_queue_zone_sectors(q) - 1);

The problem is that "aligned_capacity" and "dev->capacity" are sector_t
type (which is a u64 under most configs) but blk_queue_zone_sectors(q)
returns a u32 so the higher 32 bits in aligned_capacity are cleared to
zero.  This patch adds a cast to address the issue.

Fixes: 114e025968 ("dm zoned: ignore last smaller runt zone")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-18 16:16:01 -04:00
Mike Snitzer
61697a6abd dm: eliminate 'split_discard_bios' flag from DM target interface
There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits.  A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-20 23:24:55 -05:00
Damien Le Moal
d57f9da890 dm zoned: Fix target BIO completion handling
struct bioctx includes the ref refcount_t to track the number of I/O
fragments used to process a target BIO as well as ensure that the zone
of the BIO is kept in the active state throughout the lifetime of the
BIO. However, since decrementing of this reference count is done in the
target .end_io method, the function bio_endio() must be called multiple
times for read and write target BIOs, which causes problems with the
value of the __bi_remaining struct bio field for chained BIOs (e.g. the
clone BIO passed by dm core is large and splits into fragments by the
block layer), resulting in incorrect values and inconsistencies with the
BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:

BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);

in bio_remaining_done() called from bio_endio().

Fix this ensuring that bio_endio() is called only once for any target
BIO by always using internal clone BIOs for processing any read or
write target BIO. This allows reference counting using the target BIO
context counter to trigger the target BIO completion bio_endio() call
once all data, metadata and other zone work triggered by the BIO
complete.

Overall, this simplifies the code too as the target .end_io becomes
unnecessary and differences between read and write BIO issuing and
completion processing disappear.

Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-07 16:04:31 -05:00
Linus Torvalds
71f4d95b23 - Biggest change this cycle is to remove support for the legacy IO path
(.request_fn) from request-based DM.  Jens has already started
   preparing for complete removal of the legacy IO path in 4.21 but this
   earlier removal of support from DM has been coordinated with Jens (as
   evidenced by the commit being attributed to him).  Making
   request-based DM exclussively blk-mq only cleans up that portion of DM
   core quite nicely.
 
 - Convert the thinp and zoned targets over to using refcount_t where
   applicable.
 
 - A couple fixes to the DM zoned target for refcounting and other races
   buried in the implementation of metadata block creation and use.
 
 - Small cleanups to remove redundant unlikely() around a couple
   WARN_ON_ONCE().
 
 - Simplify how dm-ioctl copies from userspace, eliminating some
   potential for a malicious user trying to change the executed ioctl
   after its processing has begun.
 
 - Tweaked DM crypt target to use the DM device name when naming the
   various workqueues created for a particular DM crypt device (makes the
   N workqueues for a DM crypt device more easily understood and enhances
   user's accounting capabilities at a glance via "ps")
 
 - Small fixup to remove dead branch in DM writecache's memory_entry().
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJb0zCmAAoJEMUj8QotnQNaUKwIAKFC3cuTtQyh3LomSbT4uAr5
 6apBSVcxVhqU+isriW3HmBPkO4HGyqMWjX5oQGHrOj0YK0i1H65Nq3qH9ATaiSHn
 awdo8A4YmClF5Mojc51UebXIH0IfnGSOKH/FHNhQzT3jAdn+vYinMSZ28JwFPgKW
 DsVOSM1dlJZBWRXhQNpyCjVl9Xb3rRUOnkfG0endyMfOsnoxKurhwSkXoStzCdQn
 O5ubt1XT3wMKoI1k9QWjfrBU1NtZZYD+kQ6EfkYXfL9RNhhZjwzO/eNtqT3jnKsq
 qbcd8/0JIUttPf7+F0URG9mbMbebfJGqNAaJWcnlRbCHmgUBBGVWsnl8MTWOLkw=
 =MdDN
 -----END PGP SIGNATURE-----

Merge tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - The biggest change this cycle is to remove support for the legacy IO
   path (.request_fn) from request-based DM.

   Jens has already started preparing for complete removal of the legacy
   IO path in 4.21 but this earlier removal of support from DM has been
   coordinated with Jens (as evidenced by the commit being attributed to
   him).

   Making request-based DM exclussively blk-mq only cleans up that
   portion of DM core quite nicely.

 - Convert the thinp and zoned targets over to using refcount_t where
   applicable.

 - A couple fixes to the DM zoned target for refcounting and other races
   buried in the implementation of metadata block creation and use.

 - Small cleanups to remove redundant unlikely() around a couple
   WARN_ON_ONCE().

 - Simplify how dm-ioctl copies from userspace, eliminating some
   potential for a malicious user trying to change the executed ioctl
   after its processing has begun.

 - Tweaked DM crypt target to use the DM device name when naming the
   various workqueues created for a particular DM crypt device (makes
   the N workqueues for a DM crypt device more easily understood and
   enhances user's accounting capabilities at a glance via "ps")

 - Small fixup to remove dead branch in DM writecache's memory_entry().

* tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm writecache: remove disabled code in memory_entry()
  dm zoned: fix various dmz_get_mblock() issues
  dm zoned: fix metadata block ref counting
  dm raid: avoid bitmap with raid4/5/6 journal device
  dm crypt: make workqueue names device-specific
  dm: add dm_table_device_name()
  dm ioctl: harden copy_params()'s copy_from_user() from malicious users
  dm: remove unnecessary unlikely() around WARN_ON_ONCE()
  dm zoned: target: use refcount_t for dm zoned reference counters
  dm thin: use refcount_t for thin_c reference counting
  dm table: require that request-based DM be layered on blk-mq devices
  dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED
  dm: remove legacy request-based IO path
2018-10-26 12:57:38 -07:00
Damien Le Moal
a91e138022 block: Introduce blkdev_nr_zones() helper
Introduce the blkdev_nr_zones() helper function to get the total
number of zones of a zoned block device. This number is always 0 for a
regular block device (q->limits.zoned == BLK_ZONED_NONE case).

Replace hard-coded number of zones calculation in dmz_get_zoned_device()
with a call to this helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-25 11:17:40 -06:00
John Pittman
092b564876 dm zoned: target: use refcount_t for dm zoned reference counters
The API surrounding refcount_t should be used in place of atomic_t
when variables are being used as reference counters.  This API can
prevent issues such as counter overflows and use-after-free
conditions.  Within the dm zoned target stack, the atomic_t API is
used for bioctx->ref and cw->refcount.  Change these to use
refcount_t, avoiding the issues mentioned.

Signed-off-by: John Pittman <jpittman@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-16 14:27:38 -04:00
Bart Van Assche
2d0b2d64d3 dm zoned: avoid triggering reclaim from inside dmz_map()
This patch avoids that lockdep reports the following:

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc1 #62 Not tainted
------------------------------------------------------
kswapd0/84 is trying to acquire lock:
00000000c313516d (&xfs_nondir_ilock_class){++++}, at: xfs_free_eofblocks+0xa2/0x1e0

but task is already holding lock:
00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (fs_reclaim){+.+.}:
  kmem_cache_alloc+0x2c/0x2b0
  radix_tree_node_alloc.constprop.19+0x3d/0xc0
  __radix_tree_create+0x161/0x1c0
  __radix_tree_insert+0x45/0x210
  dmz_map+0x245/0x2d0 [dm_zoned]
  __map_bio+0x40/0x260
  __split_and_process_non_flush+0x116/0x220
  __split_and_process_bio+0x81/0x180
  __dm_make_request.isra.32+0x5a/0x100
  generic_make_request+0x36e/0x690
  submit_bio+0x6c/0x140
  mpage_readpages+0x19e/0x1f0
  read_pages+0x6d/0x1b0
  __do_page_cache_readahead+0x21b/0x2d0
  force_page_cache_readahead+0xc4/0x100
  generic_file_read_iter+0x7c6/0xd20
  __vfs_read+0x102/0x180
  vfs_read+0x9b/0x140
  ksys_read+0x55/0xc0
  do_syscall_64+0x5a/0x1f0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #1 (&dmz->chunk_lock){+.+.}:
  dmz_map+0x133/0x2d0 [dm_zoned]
  __map_bio+0x40/0x260
  __split_and_process_non_flush+0x116/0x220
  __split_and_process_bio+0x81/0x180
  __dm_make_request.isra.32+0x5a/0x100
  generic_make_request+0x36e/0x690
  submit_bio+0x6c/0x140
  _xfs_buf_ioapply+0x31c/0x590
  xfs_buf_submit_wait+0x73/0x520
  xfs_buf_read_map+0x134/0x2f0
  xfs_trans_read_buf_map+0xc3/0x580
  xfs_read_agf+0xa5/0x1e0
  xfs_alloc_read_agf+0x59/0x2b0
  xfs_alloc_pagf_init+0x27/0x60
  xfs_bmap_longest_free_extent+0x43/0xb0
  xfs_bmap_btalloc_nullfb+0x7f/0xf0
  xfs_bmap_btalloc+0x428/0x7c0
  xfs_bmapi_write+0x598/0xcc0
  xfs_iomap_write_allocate+0x15a/0x330
  xfs_map_blocks+0x1cf/0x3f0
  xfs_do_writepage+0x15f/0x7b0
  write_cache_pages+0x1ca/0x540
  xfs_vm_writepages+0x65/0xa0
  do_writepages+0x48/0xf0
  __writeback_single_inode+0x58/0x730
  writeback_sb_inodes+0x249/0x5c0
  wb_writeback+0x11e/0x550
  wb_workfn+0xa3/0x670
  process_one_work+0x228/0x670
  worker_thread+0x3c/0x390
  kthread+0x11c/0x140
  ret_from_fork+0x3a/0x50

-> #0 (&xfs_nondir_ilock_class){++++}:
  down_read_nested+0x43/0x70
  xfs_free_eofblocks+0xa2/0x1e0
  xfs_fs_destroy_inode+0xac/0x270
  dispose_list+0x51/0x80
  prune_icache_sb+0x52/0x70
  super_cache_scan+0x127/0x1a0
  shrink_slab.part.47+0x1bd/0x590
  shrink_node+0x3b5/0x470
  balance_pgdat+0x158/0x3b0
  kswapd+0x1ba/0x600
  kthread+0x11c/0x140
  ret_from_fork+0x3a/0x50

other info that might help us debug this:

Chain exists of:
  &xfs_nondir_ilock_class --> &dmz->chunk_lock --> fs_reclaim

Possible unsafe locking scenario:

     CPU0                    CPU1
     ----                    ----
lock(fs_reclaim);
                             lock(&dmz->chunk_lock);
                             lock(fs_reclaim);
lock(&xfs_nondir_ilock_class);

*** DEADLOCK ***

3 locks held by kswapd0/84:
 #0: 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 #1: 000000000f8208f5 (shrinker_rwsem){++++}, at: shrink_slab.part.47+0x3f/0x590
 #2: 00000000cacefa54 (&type->s_umount_key#43){.+.+}, at: trylock_super+0x16/0x50

stack backtrace:
CPU: 7 PID: 84 Comm: kswapd0 Not tainted 4.18.0-rc1 #62
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
Call Trace:
 dump_stack+0x85/0xcb
 print_circular_bug.isra.36+0x1ce/0x1db
 __lock_acquire+0x124e/0x1310
 lock_acquire+0x9f/0x1f0
 down_read_nested+0x43/0x70
 xfs_free_eofblocks+0xa2/0x1e0
 xfs_fs_destroy_inode+0xac/0x270
 dispose_list+0x51/0x80
 prune_icache_sb+0x52/0x70
 super_cache_scan+0x127/0x1a0
 shrink_slab.part.47+0x1bd/0x590
 shrink_node+0x3b5/0x470
 balance_pgdat+0x158/0x3b0
 kswapd+0x1ba/0x600
 kthread+0x11c/0x140
 ret_from_fork+0x3a/0x50

Reported-by: Masato Suzuki <masato.suzuki@wdc.com>
Fixes: 4218a95546 ("dm zoned: use GFP_NOIO in I/O path")
Cc: <stable@vger.kernel.org>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-22 14:51:12 -04:00
Mike Snitzer
72d711c876 dm: adjust structure members to improve alignment
Eliminate most holes in DM data structures that were modified by
commit 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()").
Also prevent structure members from unnecessarily spanning cache
lines.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-08 11:53:14 -04:00
Kent Overstreet
6f1c819c21 dm: convert to bioset_init()/mempool_init()
Convert dm to embedded bio sets.

Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30 15:33:32 -06:00
Mike Snitzer
5bd5e8d891 dm: remove fmode_t argument from .prepare_ioctl hook
Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.

All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:39 -04:00
Mike Snitzer
d5ffebdd79 dm: backfill missing calls to mutex_destroy()
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:16:15 -05:00
Damien Le Moal
114e025968 dm zoned: ignore last smaller runt zone
The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.

Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.

While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().

Reported-by: Peter Desnoyers <pjd@ccs.neu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:53 -05:00
Christoph Hellwig
74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Damien Le Moal
4218a95546 dm zoned: use GFP_NOIO in I/O path
Use GFP_NOIO for memory allocations in the I/O path.  Other memory
allocations in the initialization path can use GFP_KERNEL.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-07-26 15:55:43 -04:00
Mikulas Patocka
edbe9597ac dm zoned: remove test for impossible REQ_OP_FLUSH conditions
The value REQ_OP_FLUSH is only used by the block code for
request-based devices.

Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-07-25 15:12:17 -04:00
Damien Le Moal
3b1a94c88b dm zoned: drive-managed zoned block device target
The dm-zoned device mapper target provides transparent write access
to zoned block devices (ZBC and ZAC compliant block devices).
dm-zoned hides to the device user (a file system or an application
doing raw block device accesses) any constraint imposed on write
requests by the device, equivalent to a drive-managed zoned block
device model.

Write requests are processed using a combination of on-disk buffering
using the device conventional zones and direct in-place processing for
requests aligned to a zone sequential write pointer position.
A background reclaim process implemented using dm_kcopyd_copy ensures
that conventional zones are always available for executing unaligned
write requests. The reclaim process overhead is minimized by managing
buffer zones in a least-recently-written order and first targeting the
oldest buffer zones. Doing so, blocks under regular write access (such
as metadata blocks of a file system) remain stored in conventional
zones, resulting in no apparent overhead.

dm-zoned implementation focus on simplicity and on minimizing overhead
(CPU, memory and storage overhead). For a 14TB host-managed disk with
256 MB zones, dm-zoned memory usage per disk instance is at most about
3 MB and as little as 5 zones will be used internally for storing metadata
and performing buffer zone reclaim operations. This is achieved using
zone level indirection rather than a full block indirection system for
managing block movement between zones.

dm-zoned primary target is host-managed zoned block devices but it can
also be used with host-aware device models to mitigate potential
device-side performance degradation due to excessive random writing.

Zoned block devices can be formatted and checked for use with the dm-zoned
target using the dmzadm utility available at:

https://github.com/hgst/dm-zoned-tools

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer partly refactored Damien's original work to cleanup the code]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:05:20 -04:00