Commit Graph

4925 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
f520ca124c This is the 5.4.44 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl7XQXMACgkQONu9yGCS
 aT4OHw//YYuI/61rkff6/3qAE4gDwTZolVywu5HHzT5W7t7qeHPzJin2u04RBiS8
 4S8Mut0RUSK/0IyB0B3S342ntia1v41Q04veWm0K90iAScScjjUapLDXC/P3StA0
 iitGKJ8QFDS49+PFKFYkyXEsv6HYlDbtTmS0yxVoooSr+uqeR7m6rS1jsDsfUTaR
 T4tvfX8VPHkgfkfkOKCUq8/rM3uDW3lSk3JflIbPwRBQo9KvNPnfBetU9p//dCHG
 CB1K9K3sB6xLkKe7Ut7PlwoTq/Lc8qOma535xy3A8Iv6fVq4+hPE2jsB93WGI270
 WoEZbHpon7W6g/bU+C+CGfov2zBtz1dKHfWNcK5+dEkEQjjzKvvigfUvaKjyUUKB
 Vo5rQ3GZQ4JsMkHEJaLOlp3/SkdRd6RV/E0YErBISNeswzqsOgTrX8mz6wfQInwd
 Ww7V9LKdwSD6h2DuzutUbEm1X8i8glXammWEOUuh6zzQ3+WS57R1L+Nkr/6WxpgN
 w2g7F0+5enUbE1kIdq5OCzY1D0gBpT1o5YlrZgdL2GF5lU1b/lhsGhV6P83fl2Mf
 rTGFtg5M1pNgjbUkSH3VHHof35PM9vQZ6lrYbKMCjwymVY+BcR6nsCadfLqjMGnW
 NCYeiAmoIVCJX7q0hONww+TevZ3T+SLUjQ2os3WzooPC51MPOAQ=
 =5p6V
 -----END PGP SIGNATURE-----

Merge 5.4.44 into android-5.4-stable

Changes in 5.4.44
	ax25: fix setsockopt(SO_BINDTODEVICE)
	dpaa_eth: fix usage as DSA master, try 3
	net: don't return invalid table id error when we fall back to PF_UNSPEC
	net: dsa: mt7530: fix roaming from DSA user ports
	net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
	__netif_receive_skb_core: pass skb by reference
	net: inet_csk: Fix so_reuseport bind-address cache in tb->fast*
	net: ipip: fix wrong address family in init error path
	net/mlx5: Add command entry handling completion
	net: mvpp2: fix RX hashing for non-10G ports
	net: nlmsg_cancel() if put fails for nhmsg
	net: qrtr: Fix passing invalid reference to qrtr_local_enqueue()
	net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"
	net sched: fix reporting the first-time use timestamp
	net/tls: fix race condition causing kernel panic
	nexthop: Fix attribute checking for groups
	r8152: support additional Microsoft Surface Ethernet Adapter variant
	sctp: Don't add the shutdown timer if its already been added
	sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and socket is closed
	tipc: block BH before using dst_cache
	net/mlx5e: kTLS, Destroy key object after destroying the TIS
	net/mlx5e: Fix inner tirs handling
	net/mlx5: Fix memory leak in mlx5_events_init
	net/mlx5e: Update netdev txq on completions during closure
	net/mlx5: Fix error flow in case of function_setup failure
	net/mlx5: Annotate mutex destroy for root ns
	net/tls: fix encryption error checking
	net/tls: free record only on encryption error
	net: sun: fix missing release regions in cas_init_one().
	net/mlx4_core: fix a memory leak bug.
	mlxsw: spectrum: Fix use-after-free of split/unsplit/type_set in case reload fails
	ARM: dts: rockchip: fix phy nodename for rk3228-evb
	ARM: dts: rockchip: fix phy nodename for rk3229-xms6
	arm64: dts: rockchip: fix status for &gmac2phy in rk3328-evb.dts
	arm64: dts: rockchip: swap interrupts interrupt-names rk3399 gpu node
	ARM: dts: rockchip: swap clock-names of gpu nodes
	ARM: dts: rockchip: fix pinctrl sub nodename for spi in rk322x.dtsi
	gpio: tegra: mask GPIO IRQs during IRQ shutdown
	ALSA: usb-audio: add mapping for ASRock TRX40 Creator
	net: microchip: encx24j600: add missed kthread_stop
	gfs2: move privileged user check to gfs2_quota_lock_check
	gfs2: Grab glock reference sooner in gfs2_add_revoke
	drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate
	drm/amd/powerplay: perform PG ungate prior to CG ungate
	drm/amdgpu: Use GEM obj reference for KFD BOs
	cachefiles: Fix race between read_waiter and read_copier involving op->to_do
	usb: dwc3: pci: Enable extcon driver for Intel Merrifield
	usb: phy: twl6030-usb: Fix a resource leak in an error handling path in 'twl6030_usb_probe()'
	usb: gadget: legacy: fix redundant initialization warnings
	net: freescale: select CONFIG_FIXED_PHY where needed
	IB/i40iw: Remove bogus call to netdev_master_upper_dev_get()
	riscv: stacktrace: Fix undefined reference to `walk_stackframe'
	clk: ti: am33xx: fix RTC clock parent
	csky: Fixup msa highest 3 bits mask
	csky: Fixup perf callchain unwind
	csky: Fixup remove duplicate irq_disable
	hwmon: (nct7904) Fix incorrect range of temperature limit registers
	cifs: Fix null pointer check in cifs_read
	csky: Fixup raw_copy_from_user()
	samples: bpf: Fix build error
	drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
	Input: usbtouchscreen - add support for BonXeon TP
	Input: evdev - call input_flush_device() on release(), not flush()
	Input: xpad - add custom init packet for Xbox One S controllers
	Input: dlink-dir685-touchkeys - fix a typo in driver name
	Input: i8042 - add ThinkPad S230u to i8042 reset list
	Input: synaptics-rmi4 - really fix attn_data use-after-free
	Input: synaptics-rmi4 - fix error return code in rmi_driver_probe()
	ARM: 8970/1: decompressor: increase tag size
	ARM: uaccess: consolidate uaccess asm to asm/uaccess-asm.h
	ARM: uaccess: integrate uaccess_save and uaccess_restore
	ARM: uaccess: fix DACR mismatch with nested exceptions
	gpio: exar: Fix bad handling for ida_simple_get error path
	arm64: dts: mt8173: fix vcodec-enc clock
	soc: mediatek: cmdq: return send msg error code
	gpu/drm: Ingenic: Fix opaque pointer casted to wrong type
	IB/qib: Call kobject_put() when kobject_init_and_add() fails
	ARM: dts/imx6q-bx50v3: Set display interface clock parents
	ARM: dts: bcm2835-rpi-zero-w: Fix led polarity
	ARM: dts: bcm: HR2: Fix PPI interrupt types
	mmc: block: Fix use-after-free issue for rpmb
	gpio: pxa: Fix return value of pxa_gpio_probe()
	gpio: bcm-kona: Fix return value of bcm_kona_gpio_probe()
	RDMA/pvrdma: Fix missing pci disable in pvrdma_pci_probe()
	ALSA: hwdep: fix a left shifting 1 by 31 UB bug
	ALSA: hda/realtek - Add a model for Thinkpad T570 without DAC workaround
	ALSA: usb-audio: mixer: volume quirk for ESS Technology Asus USB DAC
	exec: Always set cap_ambient in cap_bprm_set_creds
	clk: qcom: gcc: Fix parent for gpll0_out_even
	ALSA: usb-audio: Quirks for Gigabyte TRX40 Aorus Master onboard audio
	ALSA: hda/realtek - Add new codec supported for ALC287
	libceph: ignore pool overlay and cache logic on redirects
	ceph: flush release queue when handling caps for unknown inode
	RDMA/core: Fix double destruction of uobject
	drm/amd/display: drop cursor position check in atomic test
	IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
	mm,thp: stop leaking unreleased file pages
	mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()
	fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()
	include/asm-generic/topology.h: guard cpumask_of_node() macro argument
	Revert "block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT"
	gpio: fix locking open drain IRQ lines
	iommu: Fix reference count leak in iommu_group_alloc.
	parisc: Fix kernel panic in mem_init()
	cfg80211: fix debugfs rename crash
	x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
	mac80211: mesh: fix discovery timer re-arming issue / crash
	x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
	copy_xstate_to_kernel(): don't leave parts of destination uninitialized
	xfrm: allow to accept packets with ipv6 NEXTHDR_HOP in xfrm_input
	xfrm: do pskb_pull properly in __xfrm_transport_prep
	xfrm: remove the xfrm_state_put call becofe going to out_reset
	xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output
	xfrm interface: fix oops when deleting a x-netns interface
	xfrm: fix a warning in xfrm_policy_insert_list
	xfrm: fix a NULL-ptr deref in xfrm_local_error
	xfrm: fix error in comment
	ip_vti: receive ipip packet by calling ip_tunnel_rcv
	netfilter: nft_reject_bridge: enable reject with bridge vlan
	netfilter: ipset: Fix subcounter update skip
	netfilter: conntrack: make conntrack userspace helpers work again
	netfilter: nfnetlink_cthelper: unbreak userspace helper support
	netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code
	esp6: get the right proto for transport mode in esp6_gso_encap
	bnxt_en: Fix accumulation of bp->net_stats_prev.
	ieee80211: Fix incorrect mask for default PE duration
	xsk: Add overflow check for u64 division, stored into u32
	qlcnic: fix missing release in qlcnic_83xx_interrupt_test.
	crypto: chelsio/chtls: properly set tp->lsndtime
	nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry
	nexthops: don't modify published nexthop groups
	nexthop: Expand nexthop_is_multipath in a few places
	ipv4: nexthop version of fib_info_nh_uses_dev
	net: dsa: declare lockless TX feature for slave ports
	bonding: Fix reference count leak in bond_sysfs_slave_add.
	netfilter: conntrack: comparison of unsigned in cthelper confirmation
	netfilter: conntrack: Pass value of ctinfo to __nf_conntrack_update
	netfilter: nf_conntrack_pptp: fix compilation warning with W=1 build
	perf: Make perf able to build with latest libbfd
	Linux 5.4.44

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idd547df1abb0bea116f30e3224a80387529adb0b
2020-06-04 08:23:35 +02:00
Jens Axboe
bba91cdba6 Revert "block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT"
[ Upstream commit b0beb28097fa04177b3769f4bb7a0d0d9c4ae76e ]

This reverts commit c58c1f83436b501d45d4050fd1296d71a9760bcb.

io_uring does do the right thing for this case, and we're still returning
-EAGAIN to userspace for the cases we don't support. Revert this change
to avoid doing endless spins of resubmits.

Cc: stable@vger.kernel.org # v5.6
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03 08:21:27 +02:00
Eric Biggers
1af79e47ae ANDROID: block: backport the ability to specify max_dun_bytes
Backport a fix from the v7 inline crypto patchset which ensures that the
block layer knows the number of DUN bytes the inline encryption hardware
supports, so that hardware isn't used when it shouldn't be.

(This unfortunately means introducing some increasing long argument
lists; this was all already fixed up in later versions of the patchset.)

To avoid breaking the KMI for drivers, don't add a dun_bytes argument to
keyslot_manager_create() but rather allow drivers to call
keyslot_manager_set_max_dun_bytes() to override the default.  Also,
don't add dun_bytes as a new field in 'struct blk_crypto_key' but rather
pack it into the existing 'hash' field which is for block layer use.

Bug: 144046242
Bug: 153512828
Change-Id: I285f36557fb3eafc5f2f64727ef1740938b59dd7
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-15 22:01:33 +00:00
Greg Kroah-Hartman
5e169f689f This is the 5.4.41 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl683gYACgkQONu9yGCS
 aT4UrQ/+OWH+sgNXQW2TtBAtDt+b6WCPCwsAe74YdsbqzVf/mxVGVpHKvFJCEXxA
 KDRrBqdICgrjZ+L8Y2MPzNhyD2/nLHwB8M99ARx4B6mvOu4pT0+/xATViGmotqDN
 tzpQ3HvnFLlR/z74/xDanXgXrTAv591hpSQlpUmf6NCiBZNlhndId4qnh/z8Eumn
 wVLseK1r2CY3s3mMZTw6BXmHmj6zGA70Ckuvhp9JmxiKs9fg+pmDlUaRPHex91Xh
 LtSJd7CdpVr5YrMIC9DcQ2TN46KsZZkoo+l/W8jVNVG3ggqWUrHn7wGamwTHafd1
 TkoU7eQt9ps15p7Sj4Z19de30Y1m/g+Qq7L4NrgGcX8bhnCHdgfdbAj40GINOaB2
 WLHRVu3PgEUCbLCSixE5BRLmBTECjWapIiW50fp/jogGmeRiBbJFFnWbVtiEwyme
 KU7ZJRw/sfKNzIN0QioJ/EadK7ZkvIfr/ajinpXdxIA+4gteyKRrNb0323FRG3Ev
 JoStdR2g+dv+yEJYLmsCl3N0eEETzHK8fRJbp0lkSKjEaxW/yDRpIdhREXmWGd2V
 Hprcoiyknae0MEIFFnTvA4Oj7wOYezxP0tQg14nOdtXZX5afry5qP/lryE0kYxiV
 JcI4BrwfWI8hOwdaFd413qp+JG7eKV3RhanhaPimroQJn0WKB9Q=
 =Ipyc
 -----END PGP SIGNATURE-----

Merge 5.4.41 into android-5.4-stable

Changes in 5.4.41
	USB: serial: qcserial: Add DW5816e support
	nvme: refactor nvme_identify_ns_descs error handling
	nvme: fix possible hang when ns scanning fails during error recovery
	tracing/kprobes: Fix a double initialization typo
	net: macb: Fix runtime PM refcounting
	drm/amdgpu: move kfd suspend after ip_suspend_phase1
	drm/amdgpu: drop redundant cg/pg ungate on runpm enter
	vt: fix unicode console freeing with a common interface
	tty: xilinx_uartps: Fix missing id assignment to the console
	devlink: fix return value after hitting end in region read
	dp83640: reverse arguments to list_add_tail
	fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks
	ipv6: Use global sernum for dst validation with nexthop objects
	mlxsw: spectrum_acl_tcam: Position vchunk in a vregion list properly
	neigh: send protocol value in neighbor create notification
	net: dsa: Do not leave DSA master with NULL netdev_ops
	net: macb: fix an issue about leak related system resources
	net: macsec: preserve ingress frame ordering
	net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()
	net_sched: sch_skbprio: add message validation to skbprio_change()
	net: stricter validation of untrusted gso packets
	net: tc35815: Fix phydev supported/advertising mask
	net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict()
	net/tls: Fix sk_psock refcnt leak when in tls_data_ready()
	net: usb: qmi_wwan: add support for DW5816e
	nfp: abm: fix a memory leak bug
	sch_choke: avoid potential panic in choke_reset()
	sch_sfq: validate silly quantum values
	tipc: fix partial topology connection closure
	tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040
	bnxt_en: Fix VF anti-spoof filter setup.
	bnxt_en: Reduce BNXT_MSIX_VEC_MAX value to supported CQs per PF.
	bnxt_en: Improve AER slot reset.
	bnxt_en: Return error when allocating zero size context memory.
	bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features().
	net/mlx5: DR, On creation set CQ's arm_db member to right value
	net/mlx5: Fix forced completion access non initialized command entry
	net/mlx5: Fix command entry leak in Internal Error State
	net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()
	net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
	HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices
	sctp: Fix bundling of SHUTDOWN with COOKIE-ACK
	Revert "HID: wacom: generic: read the number of expected touches on a per collection basis"
	HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
	HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT
	USB: uas: add quirk for LaCie 2Big Quadra
	usb: chipidea: msm: Ensure proper controller reset using role switch API
	USB: serial: garmin_gps: add sanity checking for data length
	tracing: Add a vmalloc_sync_mappings() for safe measure
	crypto: arch/nhpoly1305 - process in explicit 4k chunks
	KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
	KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
	KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER
	KVM: arm64: Fix 32bit PC wrap-around
	arm64: hugetlb: avoid potential NULL dereference
	drm: ingenic-drm: add MODULE_DEVICE_TABLE
	ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
	epoll: atomically remove wait entry on wake up
	eventpoll: fix missing wakeup for ovflist in ep_poll_callback
	mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
	mm: limit boost_watermark on small zones
	ceph: fix endianness bug when handling MDS session feature bits
	ceph: demote quotarealm lookup warning to a debug message
	staging: gasket: Check the return value of gasket_get_bar_index()
	coredump: fix crash when umh is disabled
	riscv: set max_pfn to the PFN of the last page
	iocost: protect iocg->abs_vdebt with iocg->waitq.lock
	batman-adv: fix batadv_nc_random_weight_tq
	batman-adv: Fix refcnt leak in batadv_show_throughput_override
	batman-adv: Fix refcnt leak in batadv_store_throughput_override
	batman-adv: Fix refcnt leak in batadv_v_ogm_process
	x86/entry/64: Fix unwind hints in register clearing code
	x86/entry/64: Fix unwind hints in kernel exit path
	x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
	x86/unwind/orc: Don't skip the first frame for inactive tasks
	x86/unwind/orc: Prevent unwinding before ORC initialization
	x86/unwind/orc: Fix error path for bad ORC entry type
	x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
	KVM: x86: Fixes posted interrupt check for IRQs delivery modes
	arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
	netfilter: nat: never update the UDP checksum when it's 0
	netfilter: nf_osf: avoid passing pointer to local var
	objtool: Fix stack offset tracking for indirect CFAs
	iommu/virtio: Reverse arguments to list_add
	scripts/decodecode: fix trapping instruction formatting
	mm, memcg: fix error return value of mem_cgroup_css_alloc()
	bdi: move bdi_dev_name out of line
	bdi: add a ->dev_name field to struct backing_dev_info
	fsnotify: replace inode pointer with an object id
	fanotify: merge duplicate events on parent and child
	Linux 5.4.41

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie6695b1dace8ca62579a57084608e9268e52fde9
2020-05-14 08:55:48 +02:00
Tejun Heo
34ca080088 iocost: protect iocg->abs_vdebt with iocg->waitq.lock
commit 0b80f9866e6bbfb905140ed8787ff2af03652c0c upstream.

abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.

However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.

1. The iocg is in the process of being deactivated by the periodic timer.

2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
   without anything because it still sees that the iocg is already active.

3. The iocg is deactivated.

4. The bio from #2 is over budget but needs to be forced. It increases
   abs_vdebt and goes over the threshold and enables use_delay.

5. IO control is enabled for the iocg's subtree and now IOs are attributed
   to the descendant cgroups and the iocg itself no longer issues IOs.

This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.

The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.

This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f63f2 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14 07:58:27 +02:00
Greg Kroah-Hartman
ae0dae9ffc This is the 5.4.37 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6tF/4ACgkQONu9yGCS
 aT7GdA//U9Nzp0upthsH5IMqIOwaJQBEwXF83fTResLKPSNjq6wAYO6kQwdTBMZ1
 PUo/ZEmOnDigdHM3PCGw+Z779UCb9/2laH+KPnPTnst9LcM0sLJMsgoCIuqsyl8J
 mPDLCbx4f7/ffkw/cSb+JrqCn/2mFib3uCwktTSqxVWm9S7EcE3CRxSTEE1XP/z6
 FzDPCjeNijNa3U96NnHFcKXEo/vcaEKHIB9bgdR7kUuRKGBhXSjv7LWUV/940F2w
 eyGgW5A+o94dsCORx2aOgBwOoujAto/DxDihv4jm/S5HTg68hqWQxqWerlsy0PFP
 k7j854aaHamIJjt5SE2MTm9YxnvWh4rpbXjuYDLYLM1jLaACZ+5mIj+w18yrpmOs
 7vjlHBBBTt4xNbODML4KLrj+fCdXk4uEBy7sWi/qYPUmrV3CLK1DqcqRQ9toS+yh
 o22JwyVYuD2os0YMYikqSVRlCe4UwJcW0ZZfOFg2cpB9anG7i+DrzW9Lc6CuPsHo
 ZC9rdVNEHLh9Ti9zcXrs8AFjxoIbP/m0n+ZH7bQPo1/rWE4+fzj14wtKslGtkT0B
 00/Vo9mtmmBC0MVBignbWsq5aE3bFLWTOveJppjgAVXYJ7mQhtnvw4eFSJahtBa0
 s+SB9M6kGNvWpL11cokqIaVfklDjaMo0Jeakd78KdobeNOgBvug=
 =TNyS
 -----END PGP SIGNATURE-----

Merge 5.4.37 into android-5.4-stable

Changes in 5.4.37
	remoteproc: Fix wrong rvring index computation
	ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()
	printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
	ASoC: stm32: sai: fix sai probe
	usb: dwc3: gadget: Do link recovery for SS and SSP
	kbuild: fix DT binding schema rule again to avoid needless rebuilds
	usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete
	usb: gadget: udc: atmel: Fix vbus disconnect handling
	afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate
	afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCH
	iio:ad7797: Use correct attribute_group
	propagate_one(): mnt_set_mountpoint() needs mount_lock
	counter: 104-quad-8: Add lock guards - generic interface
	s390/ftrace: fix potential crashes when switching tracers
	ASoC: q6dsp6: q6afe-dai: add missing channels to MI2S DAIs
	ASoC: tas571x: disable regulators on failed probe
	ASoC: meson: axg-card: fix codec-to-codec link setup
	ASoC: wm8960: Fix wrong clock after suspend & resume
	drivers: soc: xilinx: fix firmware driver Kconfig dependency
	nfsd: memory corruption in nfsd4_lock()
	bpf: Forbid XADD on spilled pointers for unprivileged users
	i2c: altera: use proper variable to hold errno
	rxrpc: Fix DATA Tx to disable nofrag for UDP on AF_INET6 socket
	net/cxgb4: Check the return from t4_query_params properly
	xfs: acquire superblock freeze protection on eofblocks scans
	svcrdma: Fix trace point use-after-free race
	svcrdma: Fix leak of svc_rdma_recv_ctxt objects
	net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns
	net/mlx5e: Get the latest values from counters in switchdev mode
	PCI: Avoid ASMedia XHCI USB PME# from D0 defect
	PCI: Add ACS quirk for Zhaoxin multi-function devices
	PCI: Make ACS quirk implementations more uniform
	PCI: Unify ACS quirk desired vs provided checking
	PCI: Add Zhaoxin Vendor ID
	PCI: Add ACS quirk for Zhaoxin Root/Downstream Ports
	PCI: Move Apex Edge TPU class quirk to fix BAR assignment
	ARM: dts: bcm283x: Disable dsi0 node
	cpumap: Avoid warning when CONFIG_DEBUG_PER_CPU_MAPS is enabled
	s390/pci: do not set affinity for floating irqs
	net/mlx5: Fix failing fw tracer allocation on s390
	sched/core: Fix reset-on-fork from RT with uclamp
	perf/core: fix parent pid/tid in task exit events
	netfilter: nat: fix error handling upon registering inet hook
	PM: sleep: core: Switch back to async_schedule_dev()
	blk-iocost: Fix error on iocost_ioc_vrate_adj
	um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/
	bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
	bpf, x86_32: Fix clobbering of dst for BPF_JSET
	bpf, x86_32: Fix logic error in BPF_LDX zero-extension
	mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path
	xfs: clear PF_MEMALLOC before exiting xfsaild thread
	bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B
	libbpf: Initialize *nl_pid so gcc 10 is happy
	net: fec: set GPR bit on suspend by DT configuration.
	x86: hyperv: report value of misc_features
	signal: check sig before setting info in kill_pid_usb_asyncio
	afs: Fix length of dump of bad YFSFetchStatus record
	xfs: fix partially uninitialized structure in xfs_reflink_remap_extent
	ALSA: hda: Release resources at error in delayed probe
	ALSA: hda: Keep the controller initialization even if no codecs found
	ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported
	scsi: target: fix PR IN / READ FULL STATUS for FC
	scsi: target: tcmu: reset_ring should reset TCMU_DEV_BIT_BROKEN
	objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
	objtool: Support Clang non-section symbols in ORC dump
	xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status
	ALSA: hda: call runtime_allow() for all hda controllers
	net: stmmac: socfpga: Allow all RGMII modes
	mac80211: fix channel switch trigger from unknown mesh peer
	arm64: Delete the space separator in __emit_inst
	ext4: use matching invalidatepage in ext4_writepage
	ext4: increase wait time needed before reuse of deleted inode numbers
	ext4: convert BUG_ON's to WARN_ON's in mballoc.c
	blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budget
	hwmon: (jc42) Fix name to have no illegal characters
	taprio: do not use BIT() in TCA_TAPRIO_ATTR_FLAG_* definitions
	qed: Fix race condition between scheduling and destroying the slowpath workqueue
	Crypto: chelsio - Fixes a hang issue during driver registration
	net: use indirect call wrappers for skb_copy_datagram_iter()
	qed: Fix use after free in qed_chain_free
	ext4: check for non-zero journal inum in ext4_calculate_overhead
	ASoC: soc-core: disable route checks for legacy devices
	ASoC: stm32: spdifrx: fix regmap status check
	Linux 5.4.37

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ice2ab2e77117b798ed22e9442f72a44f39be28dc
2020-05-02 09:01:51 +02:00
John Garry
c7b6c51298 blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budget
[ Upstream commit 5fe56de799ad03e92d794c7936bf363922b571df ]

If in blk_mq_dispatch_rq_list() we find no budget, then we break of the
dispatch loop, but the request may keep the driver tag, evaulated
in 'nxt' in the previous loop iteration.

Fix by putting the driver tag for that request.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02 08:48:59 +02:00
Waiman Long
9c5c94c501 blk-iocost: Fix error on iocost_ioc_vrate_adj
commit d6c8e949a35d6906d6c03a50e9a9cdf4e494528a upstream.

Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]"
argument of the iocost_ioc_vrate_adj trace entry defined in
include/trace/events/iocost.h leading to the following error:

  /tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8:
  error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
   , u32[]* __tracepoint_arg_missed_ppm

That argument type is indeed rather complex and hard to read. Looking
at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying
the argument to a simple "u32 *missed_ppm" and adjusting the trace
entry accordingly, the compilation error was gone.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02 08:48:53 +02:00
Greg Kroah-Hartman
5e713c48ff This is the 5.4.35 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6hU54ACgkQONu9yGCS
 aT5/3BAAlSOFEbVYeiAjDQYfA5DvieeVN3qKk0HnErIPRm35UHqCYSMyEDiJ2c8E
 01V2aFpvAZDyj/pE/prBrUH5FnKyil9tPQrg/da2f54yMiXQvQ6iFdmH/N5Zp5eu
 oY6qFUo4jePTbmI/TBzz08XZ9B4VxccNRhSdF0dO4SInt3eC+vJho3dCXH8H3B7o
 cDf4uIXQqyGn6t9yQQlSVRYTCK1JMwkSVxCU7uMWS5TfJSN3EyZvMMfXyTCTmgIy
 13Vv3+nSHxGqgyAA3fsClCGGAeQyFGQXP28OqyzesPuYyi5z3nDKtgZcAVtvyw9I
 eDsfnOUrw76StiJwRfnKkbg8TBKDWn4N9VyLyBvjRvRovSzTJ31jKVBLhByKDJQt
 cnsi/Ttkm2CYmChozdJrm1Pfm6HH5etEXh6rq4sqeGLkpi+k1UiQgYlavJPOI3nz
 n6dMQEyeg1dmAIBXqgvSvGVfyZuRi37ApPHMHEY4klALbRaSj2Vu/pblyeRezIXL
 G5D7olchwI0X18khdoBYOT1+tmid1pDZ00WB6Iq5IKIjR5x8KBf5uMcvprAc3LsP
 mhGP9+MYXhWQ/GjHjA6TZq76qhYlEZBIHBarIaNjrl3IShLTQXzxAwS8rGtI5wZP
 fTlCc+FBg5w1LDiVcEYJHXR583jSgsFTd3qbtpeaaQyKcC/fkEk=
 =3/4K
 -----END PGP SIGNATURE-----

Merge 5.4.35 into android-5.4-stable

Changes in 5.4.35
	ext4: use non-movable memory for superblock readahead
	watchdog: sp805: fix restart handler
	xsk: Fix out of boundary write in __xsk_rcv_memcpy
	arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0
	arm, bpf: Fix offset overflow for BPF_MEM BPF_DW
	objtool: Fix switch table detection in .text.unlikely
	scsi: sg: add sg_remove_request in sg_common_write
	ALSA: hda: Honor PM disablement in PM freeze and thaw_noirq ops
	ARM: dts: imx6: Use gpc for FEC interrupt controller to fix wake on LAN.
	kbuild, btf: Fix dependencies for DEBUG_INFO_BTF
	netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type
	irqchip/mbigen: Free msi_desc on device teardown
	ALSA: hda: Don't release card at firmware loading error
	xsk: Add missing check on user supplied headroom size
	of: unittest: kmemleak on changeset destroy
	of: unittest: kmemleak in of_unittest_platform_populate()
	of: unittest: kmemleak in of_unittest_overlay_high_level()
	of: overlay: kmemleak in dup_and_fixup_symbol_prop()
	x86/Hyper-V: Unload vmbus channel in hv panic callback
	x86/Hyper-V: Trigger crash enlightenment only once during system crash.
	x86/Hyper-V: Report crash register data or kmsg before running crash kernel
	x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
	x86/Hyper-V: Report crash data in die() when panic_on_oops is set
	afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus()
	afs: Fix decoding of inline abort codes from version 1 status records
	afs: Fix rename operation status delivery
	afs: Fix afs_d_validate() to set the right directory version
	afs: Fix race between post-modification dir edit and readdir/d_revalidate
	block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup
	block, bfq: make reparent_leaf_entity actually work only on leaf entities
	block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline
	rbd: avoid a deadlock on header_rwsem when flushing notifies
	rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
	x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
	drm/ttm: flush the fence on the bo after we individualize the reservation object
	clk: Don't cache errors from clk_ops::get_phase()
	clk: at91: usb: continue if clk_hw_round_rate() return zero
	net/mlx5e: Enforce setting of a single FEC mode
	f2fs: fix the panic in do_checkpoint()
	ARM: dts: rockchip: fix vqmmc-supply property name for rk3188-bqedison2qc
	arm64: dts: allwinner: a64: Fix display clock register range
	power: supply: bq27xxx_battery: Silence deferred-probe error
	clk: tegra: Fix Tegra PMC clock out parents
	arm64: tegra: Add PCIe endpoint controllers nodes for Tegra194
	arm64: tegra: Fix Tegra194 PCIe compatible string
	arm64: dts: clearfog-gt-8k: set gigabit PHY reset deassert delay
	soc: imx: gpc: fix power up sequencing
	dma-coherent: fix integer overflow in the reserved-memory dma allocation
	rtc: 88pm860x: fix possible race condition
	NFS: alloc_nfs_open_context() must use the file cred when available
	NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
	NFSv4.2: error out when relink swapfile
	ARM: dts: rockchip: fix lvds-encoder ports subnode for rk3188-bqedison2qc
	KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
	f2fs: fix to show norecovery mount option
	phy: uniphier-usb3ss: Add Pro5 support
	NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
	f2fs: Fix mount failure due to SPO after a successful online resize FS
	f2fs: Add a new CP flag to help fsck fix resize SPO issues
	s390/cpuinfo: fix wrong output when CPU0 is offline
	hibernate: Allow uswsusp to write to swap
	btrfs: add RCU locks around block group initialization
	powerpc/prom_init: Pass the "os-term" message to hypervisor
	powerpc/maple: Fix declaration made after definition
	s390/cpum_sf: Fix wrong page count in error message
	ext4: do not commit super on read-only bdev
	um: ubd: Prevent buffer overrun on command completion
	cifs: Allocate encryption header through kmalloc
	mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
	drm/nouveau/svm: check for SVM initialized before migrating
	drm/nouveau/svm: fix vma range check for migration
	include/linux/swapops.h: correct guards for non_swap_entry()
	percpu_counter: fix a data race at vm_committed_as
	compiler.h: fix error in BUILD_BUG_ON() reporting
	KVM: s390: vsie: Fix possible race when shadowing region 3 tables
	drm/nouveau: workaround runpm fail by disabling PCI power management on certain intel bridges
	leds: core: Fix warning message when init_data
	x86: ACPI: fix CPU hotplug deadlock
	csky: Fixup cpu speculative execution to IO area
	drm/amdkfd: kfree the wrong pointer
	NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
	csky: Fixup get wrong psr value from phyical reg
	f2fs: fix NULL pointer dereference in f2fs_write_begin()
	ACPICA: Fixes for acpiExec namespace init file
	um: falloc.h needs to be directly included for older libc
	drm/vc4: Fix HDMI mode validation
	iommu/virtio: Fix freeing of incomplete domains
	iommu/vt-d: Fix mm reference leak
	SUNRPC: fix krb5p mount to provide large enough buffer in rq_rcvsize
	ext2: fix empty body warnings when -Wextra is used
	iommu/vt-d: Silence RCU-list debugging warning in dmar_find_atsr()
	iommu/vt-d: Fix page request descriptor size
	ext2: fix debug reference to ext2_xattr_cache
	sunrpc: Fix gss_unwrap_resp_integ() again
	csky: Fixup init_fpu compile warning with __init
	power: supply: axp288_fuel_gauge: Broaden vendor check for Intel Compute Sticks.
	libnvdimm: Out of bounds read in __nd_ioctl()
	iommu/amd: Fix the configuration of GCR3 table root pointer
	f2fs: fix to wait all node page writeback
	drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
	net: dsa: bcm_sf2: Fix overflow checks
	dma-debug: fix displaying of dma allocation type
	fbdev: potential information leak in do_fb_ioctl()
	ARM: dts: sunxi: Fix DE2 clocks register range
	iio: si1133: read 24-bit signed integer for measurement
	fbmem: Adjust indentation in fb_prepare_logo and fb_blank
	tty: evh_bytechan: Fix out of bounds accesses
	locktorture: Print ratio of acquisitions, not failures
	mtd: rawnand: free the nand_device object
	mtd: spinand: Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
	docs: Fix path to MTD command line partition parser
	mtd: lpddr: Fix a double free in probe()
	mtd: phram: fix a double free issue in error path
	KEYS: Don't write out to userspace while holding key semaphore
	bpf: fix buggy r0 retval refinement for tracing helpers
	bpf: Test_verifier, bpf_get_stack return value add <0
	bpf: Test_progs, add test to catch retval refine error handling
	bpf, test_verifier: switch bpf_get_stack's 0 s> r8 test
	Linux 5.4.35

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I702aba533097c8533c12561c7f1a51f3a96f6f09
2020-04-23 11:15:10 +02:00
Paolo Valente
a362482b23 block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline
commit 4d38a87fbb77fb9ff2ff4e914162a8ae6453eff5 upstream.

In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
flush the rb tree that contains all idle entities belonging to the pd
(cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
invoked before bfq_reparent_active_queues(). Yet the latter may happen
to add some entities to the idle tree. It happens if, in some of the
calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
the queue to move is empty and gets expired.

This commit simply reverses the invocation order between
bfq_flush_idle_tree() and bfq_reparent_active_queues().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23 10:36:26 +02:00
Paolo Valente
839b7cd1d8 block, bfq: make reparent_leaf_entity actually work only on leaf entities
commit 576682fa52cbd95deb3773449566274f206acc58 upstream.

bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
entity represents just a bfq_queue in an entity tree). Yet, the input
entity is guaranteed to always be a leaf entity only in two-level
entity trees. In this respect, because of the error fixed by
commit 14afc5936197 ("block, bfq: fix overwrite of bfq_group pointer
in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
to actually have only two levels. After the latter commit, this does not
hold any longer.

This commit fixes this problem by modifying
bfq_reparent_leaf_entity(), so that it searches an active leaf entity
down the path that stems from the input entity. Such a leaf entity is
guaranteed to exist when bfq_reparent_leaf_entity() is invoked.

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23 10:36:26 +02:00
Paolo Valente
ad749ca022 block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup
commit c8997736650060594845e42c5d01d3118aec8d25 upstream.

A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
goal of this put is to release a process reference to a bfq_queue. But
process-reference releases may trigger also some extra operation, and,
to this goal, are handled through bfq_release_process_ref(). So, turn
the invocation of bfq_put_queue() into an invocation of
bfq_release_process_ref().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23 10:36:26 +02:00
Greg Kroah-Hartman
a9372c6b57 This is the 5.4.33 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6ZbdIACgkQONu9yGCS
 aT5Jqw/7BZ639nTAAmz809yOF1JBhvBptRg9tBKYAfw62DzfZe5s9IZA6znIt0f0
 nlluLvnhHlDSpgycHNkFry5AkiCUpRQW6NY681xITm918w3BxsKX2pfCawojIOSw
 YBaTWoWqNFQQWlC18L1CWJmIvIktSCHXBMTVDpnvRv7sw5A4Oe/zarzVDNb0A6OJ
 ThaR7LAKJrUEDDLuCOuB/IrYCOpOzg2SkViFmlo4wmmhvCSi8PXvf3royrSFXxM/
 Y1bs67Hu/uqeHl8Y2RaMZpXf1aW9F31sooca4GD+UnVoWppjIOKRyuGLTrXKv7pw
 /goIzlE8wfJz5K0iQ4UcbXwwdY81L9UlMdVsmIWHHgxMjSp1J5mfQ5TLUC/VK3UO
 Ll9tCYBwH4FjzxNRJq7if8TDAfgPzyhw4BMchgXZWzW1oasl51T2uEye3KgFXQSb
 u6TwCx4KGS0w/Q81SKis83Pb0unHGanJOSCZxI1B44raf0ruCBpTYUc713pfegWT
 46YtwoorAK8N+GpFQA1tsTJvVclqCF5bHVE19TMvXV4UX/VTPtbIAUE7vnvcxcqO
 uh0b9Jfmd6Fcgh7VZQCH7CUYnsyJmGqj2kycB1p+T8UB6H+PCeuQBZBl8sJnf8oj
 d5NIzB7WWXBQuEG5XuPtxg6+ARMPEIpd2exEVn9ZOv5qhesBo04=
 =pjAq
 -----END PGP SIGNATURE-----

Merge 5.4.33 into android-5.4-stable

Changes in 5.4.33
	ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
	bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
	ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode
	bpf: Fix deadlock with rq_lock in bpf_send_signal()
	iwlwifi: mvm: Fix rate scale NSS configuration
	Input: tm2-touchkey - add support for Coreriver TC360 variant
	soc: fsl: dpio: register dpio irq handlers after dpio create
	rxrpc: Abstract out the calculation of whether there's Tx space
	rxrpc: Fix call interruptibility handling
	net: stmmac: platform: Fix misleading interrupt error msg
	net: vxge: fix wrong __VA_ARGS__ usage
	hinic: fix a bug of waitting for IO stopped
	hinic: fix the bug of clearing event queue
	hinic: fix out-of-order excution in arm cpu
	hinic: fix wrong para of wait_for_completion_timeout
	hinic: fix wrong value of MIN_SKB_LEN
	selftests/net: add definition for SOL_DCCP to fix compilation errors for old libc
	cxgb4/ptp: pass the sign of offset delta in FW CMD
	drm/scheduler: fix rare NULL ptr race
	cfg80211: Do not warn on same channel at the end of CSA
	qlcnic: Fix bad kzalloc null test
	i2c: st: fix missing struct parameter description
	i2c: pca-platform: Use platform_irq_get_optional
	media: rc: add keymap for Videostrong KII Pro
	cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
	staging: wilc1000: avoid double unlocking of 'wilc->hif_cs' mutex
	media: venus: hfi_parser: Ignore HEVC encoding for V1
	firmware: arm_sdei: fix double-lock on hibernate with shared events
	null_blk: Fix the null_add_dev() error path
	null_blk: Handle null_add_dev() failures properly
	null_blk: fix spurious IO errors after failed past-wp access
	media: imx: imx7_mipi_csis: Power off the source when stopping streaming
	media: imx: imx7-media-csi: Fix video field handling
	xhci: bail out early if driver can't accress host in resume
	ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add()
	x86: Don't let pgprot_modify() change the page encryption bit
	dma-mapping: Fix dma_pgprot() for unencrypted coherent pages
	block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
	debugfs: Check module state before warning in {full/open}_proxy_open()
	irqchip/versatile-fpga: Handle chained IRQs properly
	time/sched_clock: Expire timer in hardirq context
	media: allegro: fix type of gop_length in channel_create message
	sched: Avoid scale real weight down to zero
	selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
	PCI/switchtec: Fix init_completion race condition with poll_wait()
	block, bfq: move forward the getting of an extra ref in bfq_bfqq_move
	media: i2c: video-i2c: fix build errors due to 'imply hwmon'
	libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()
	pstore/platform: fix potential mem leak if pstore_init_fs failed
	gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty
	gfs2: Don't demote a glock until its revokes are written
	cpufreq: imx6q: fix error handling
	x86/boot: Use unsigned comparison for addresses
	efi/x86: Ignore the memory attributes table on i386
	genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()
	block: Fix use-after-free issue accessing struct io_cq
	media: i2c: ov5695: Fix power on and off sequences
	usb: dwc3: core: add support for disabling SS instances in park mode
	irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency
	md: check arrays is suspended in mddev_detach before call quiesce operations
	firmware: fix a double abort case with fw_load_sysfs_fallback
	spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion
	locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
	block, bfq: fix use-after-free in bfq_idle_slice_timer_body
	btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
	btrfs: remove a BUG_ON() from merge_reloc_roots()
	btrfs: restart relocate_tree_blocks properly
	btrfs: track reloc roots based on their commit root bytenr
	ASoC: fix regwmask
	ASoC: dapm: connect virtual mux with default value
	ASoC: dpcm: allow start or stop during pause for backend
	ASoC: topology: use name_prefix for new kcontrol
	usb: gadget: f_fs: Fix use after free issue as part of queue failure
	usb: gadget: composite: Inform controller driver of self-powered
	ALSA: usb-audio: Add mixer workaround for TRX40 and co
	ALSA: hda: Add driver blacklist
	ALSA: hda: Fix potential access overflow in beep helper
	ALSA: ice1724: Fix invalid access for enumerated ctl items
	ALSA: pcm: oss: Fix regression by buffer overflow fix
	ALSA: hda/realtek: Enable mute LED on an HP system
	ALSA: hda/realtek - a fake key event is triggered by running shutup
	ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
	ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
	ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
	ALSA: hda/realtek - Add quirk for Lenovo Carbon X1 8th gen
	ALSA: hda/realtek - Add quirk for MSI GL63
	media: venus: firmware: Ignore secure call error on first resume
	media: hantro: Read be32 words starting at every fourth byte
	media: ti-vpe: cal: fix disable_irqs to only the intended target
	media: ti-vpe: cal: fix a kernel oops when unloading module
	seccomp: Add missing compat_ioctl for notify
	acpi/x86: ignore unspecified bit positions in the ACPI global lock field
	ACPICA: Allow acpi_any_gpe_status_set() to skip one GPE
	ACPI: PM: s2idle: Refine active GPEs check
	thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
	nvmet-tcp: fix maxh2cdata icresp parameter
	nvme-fc: Revert "add module to ops template to allow module references"
	efi/x86: Add TPM related EFI tables to unencrypted mapping checks
	PCI: pciehp: Fix indefinite wait on sysfs requests
	PCI/ASPM: Clear the correct bits when enabling L1 substates
	PCI: Add boot interrupt quirk mechanism for Xeon chipsets
	PCI: qcom: Fix the fixup of PCI_VENDOR_ID_QCOM
	PCI: endpoint: Fix for concurrent memory allocation in OB address region
	sched/fair: Fix enqueue_task_fair warning
	tpm: Don't make log failures fatal
	tpm: tpm1_bios_measurements_next should increase position index
	tpm: tpm2_bios_measurements_next should increase position index
	KEYS: reaching the keys quotas correctly
	cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
	genirq/debugfs: Add missing sanity checks to interrupt injection
	irqchip/versatile-fpga: Apply clear-mask earlier
	io_uring: remove bogus RLIMIT_NOFILE check in file registration
	pstore: pstore_ftrace_seq_next should increase position index
	MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3
	MIPS: OCTEON: irq: Fix potential NULL pointer dereference
	PM / Domains: Allow no domain-idle-states DT property in genpd when parsing
	PM: sleep: wakeup: Skip wakeup_source_sysfs_remove() if device is not there
	ath9k: Handle txpower changes even when TPC is disabled
	signal: Extend exec_id to 64bits
	x86/tsc_msr: Use named struct initializers
	x86/tsc_msr: Fix MSR_FSB_FREQ mask for Cherry Trail devices
	x86/tsc_msr: Make MSR derived TSC frequency more accurate
	x86/entry/32: Add missing ASM_CLAC to general_protection entry
	platform/x86: asus-wmi: Support laptops where the first battery is named BATT
	KVM: nVMX: Properly handle userspace interrupt window request
	KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
	KVM: s390: vsie: Fix delivery of addressing exceptions
	KVM: x86: Allocate new rmap and large page tracking when moving memslot
	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
	KVM: x86: Gracefully handle __vmalloc() failure during VM allocation
	KVM: VMX: Add a trampoline to fix VMREAD error handling
	KVM: VMX: fix crash cleanup when KVM wasn't used
	smb3: fix performance regression with setting mtime
	CIFS: Fix bug which the return value by asynchronous read is error
	mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers
	mtd: spinand: Do not erase the block before writing a bad block marker
	btrfs: Don't submit any btree write bio if the fs has errors
	Btrfs: fix crash during unmount due to race with delayed inode workers
	btrfs: reloc: clean dirty subvols if we fail to start a transaction
	btrfs: set update the uuid generation as soon as possible
	btrfs: drop block from cache on error in relocation
	btrfs: fix missing file extent item for hole after ranged fsync
	btrfs: unset reloc control if we fail to recover
	btrfs: fix missing semaphore unlock in btrfs_sync_file
	btrfs: use nofs allocations for running delayed items
	remoteproc: qcom_q6v5_mss: Don't reassign mpss region on shutdown
	remoteproc: qcom_q6v5_mss: Reload the mba region on coredump
	remoteproc: Fix NULL pointer dereference in rproc_virtio_notify
	crypto: rng - Fix a refcounting bug in crypto_rng_reset()
	crypto: mxs-dcp - fix scatterlist linearization for hash
	erofs: correct the remaining shrink objects
	io_uring: honor original task RLIMIT_FSIZE
	mmc: sdhci-of-esdhc: fix esdhc_reset() for different controller versions
	powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init()
	tools: gpio: Fix out-of-tree build regression
	net: qualcomm: rmnet: Allow configuration updates to existing devices
	arm64: dts: allwinner: h6: Fix PMU compatible
	sched/core: Remove duplicate assignment in sched_tick_remote()
	arm64: dts: allwinner: h5: Fix PMU compatible
	mm, memcg: do not high throttle allocators based on wraparound
	dm writecache: add cond_resched to avoid CPU hangs
	dm integrity: fix a crash with unusually large tag size
	dm verity fec: fix memory leak in verity_fec_dtr
	dm clone: Add overflow check for number of regions
	dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions()
	XArray: Fix xas_pause for large multi-index entries
	xarray: Fix early termination of xas_for_each_marked
	crypto: caam/qi2 - fix chacha20 data size error
	crypto: caam - update xts sector size for large input length
	crypto: ccree - protect against empty or NULL scatterlists
	crypto: ccree - only try to map auth tag if needed
	crypto: ccree - dec auth tag size from cryptlen map
	scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
	scsi: ufs: fix Auto-Hibern8 error detection
	scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error path
	ARM: dts: exynos: Fix polarity of the LCD SPI bus on UniversalC210 board
	arm64: dts: ti: k3-am65: Add clocks to dwc3 nodes
	arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
	selftests: vm: drop dependencies on page flags from mlock2 tests
	selftests/vm: fix map_hugetlb length used for testing read and write
	selftests/powerpc: Add tlbie_test in .gitignore
	vfio: platform: Switch to platform_get_irq_optional()
	drm/i915/gem: Flush all the reloc_gpu batch
	drm/etnaviv: rework perfmon query infrastructure
	drm: Remove PageReserved manipulation from drm_pci_alloc
	drm/amdgpu/powerplay: using the FCLK DPM table to set the MCLK
	drm/amdgpu: unify fw_write_wait for new gfx9 asics
	powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailable
	nfsd: fsnotify on rmdir under nfsd/clients/
	NFS: Fix use-after-free issues in nfs_pageio_add_request()
	NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
	ext4: fix a data race at inode->i_blocks
	fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
	ocfs2: no need try to truncate file beyond i_size
	perf tools: Support Python 3.8+ in Makefile
	s390/diag: fix display of diagnose call statistics
	Input: i8042 - add Acer Aspire 5738z to nomux list
	ftrace/kprobe: Show the maxactive number on kprobe_events
	clk: ingenic/jz4770: Exit with error if CGU init failed
	clk: ingenic/TCU: Fix round_rate returning error
	kmod: make request_module() return an error when autoloading is disabled
	cpufreq: powernv: Fix use-after-free
	hfsplus: fix crash and filesystem corruption when deleting files
	libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
	ipmi: fix hung processes in __get_guid()
	xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
	powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
	powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
	powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries
	powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
	powerpc/64: Setup a paca before parsing device tree etc.
	powerpc/xive: Fix xmon support on the PowerNV platform
	powerpc/kprobes: Ignore traps that happened in real mode
	powerpc/64: Prevent stack protection in early boot
	scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
	powerpc: Make setjmp/longjmp signature standard
	arm64: Always force a branch protection mode when the compiler has one
	dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
	dm clone: replace spin_lock_irqsave with spin_lock_irq
	dm clone: Fix handling of partial region discards
	dm clone: Add missing casts to prevent overflows and data corruption
	scsi: lpfc: Add registration for CPU Offline/Online events
	scsi: lpfc: Fix Fabric hostname registration if system hostname changes
	scsi: lpfc: Fix configuration of BB credit recovery in service parameters
	scsi: lpfc: Fix broken Credit Recovery after driver load
	Revert "drm/dp_mst: Remove VCPI while disabling topology mgr"
	drm/dp_mst: Fix clearing payload state on topology disable
	drm/amdgpu: fix gfx hang during suspend with video playback (v2)
	drm/i915/icl+: Don't enable DDI IO power on a TypeC port in TBT mode
	powerpc/kasan: Fix kasan_remap_early_shadow_ro()
	mmc: sdhci: Convert sdhci_set_timeout_irq() to non-static
	mmc: sdhci: Refactor sdhci_set_timeout()
	bpf: Fix tnum constraints for 32-bit comparisons
	mfd: dln2: Fix sanity checking for endpoints
	efi/x86: Fix the deletion of variables in mixed mode
	ASoC: stm32: sai: Add missing cleanup
	scsi: lpfc: fix inlining of lpfc_sli4_cleanup_poll_list()
	Linux 5.4.33

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6c37e2c64801a572781c46fc5883bcc74e6a7a1a
2020-04-17 11:26:58 +02:00
Zhiqiang Liu
b37de1b1e8 block, bfq: fix use-after-free in bfq_idle_slice_timer_body
[ Upstream commit 2f95fa5c955d0a9987ffdc3a095e2f4e62c5f2a9 ]

In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
not in bfqd-lock critical section. The bfqq, which is not
equal to NULL in bfq_idle_slice_timer, may be freed after passing
to bfq_idle_slice_timer_body. So we will access the freed memory.

In addition, considering the bfqq may be in race, we should
firstly check whether bfqq is in service before doing something
on it in bfq_idle_slice_timer_body func. If the bfqq in race is
not in service, it means the bfqq has been expired through
__bfq_bfqq_expire func, and wait_request flags has been cleared in
__bfq_bfqd_reset_in_service func. So we do not need to re-clear the
wait_request of bfqq which is not in service.

KASAN log is given as follows:
[13058.354613] ==================================================================
[13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
[13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
[13058.354646]
[13058.354655] CPU: 96 PID: 19767 Comm: fork13
[13058.354661] Call trace:
[13058.354667]  dump_backtrace+0x0/0x310
[13058.354672]  show_stack+0x28/0x38
[13058.354681]  dump_stack+0xd8/0x108
[13058.354687]  print_address_description+0x68/0x2d0
[13058.354690]  kasan_report+0x124/0x2e0
[13058.354697]  __asan_load8+0x88/0xb0
[13058.354702]  bfq_idle_slice_timer+0xac/0x290
[13058.354707]  __hrtimer_run_queues+0x298/0x8b8
[13058.354710]  hrtimer_interrupt+0x1b8/0x678
[13058.354716]  arch_timer_handler_phys+0x4c/0x78
[13058.354722]  handle_percpu_devid_irq+0xf0/0x558
[13058.354731]  generic_handle_irq+0x50/0x70
[13058.354735]  __handle_domain_irq+0x94/0x110
[13058.354739]  gic_handle_irq+0x8c/0x1b0
[13058.354742]  el1_irq+0xb8/0x140
[13058.354748]  do_wp_page+0x260/0xe28
[13058.354752]  __handle_mm_fault+0x8ec/0x9b0
[13058.354756]  handle_mm_fault+0x280/0x460
[13058.354762]  do_page_fault+0x3ec/0x890
[13058.354765]  do_mem_abort+0xc0/0x1b0
[13058.354768]  el0_da+0x24/0x28
[13058.354770]
[13058.354773] Allocated by task 19731:
[13058.354780]  kasan_kmalloc+0xe0/0x190
[13058.354784]  kasan_slab_alloc+0x14/0x20
[13058.354788]  kmem_cache_alloc_node+0x130/0x440
[13058.354793]  bfq_get_queue+0x138/0x858
[13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
[13058.354801]  bfq_init_rq+0x1f4/0x1180
[13058.354806]  bfq_insert_requests+0x264/0x1c98
[13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
[13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
[13058.354826]  blk_flush_plug_list+0x230/0x548
[13058.354830]  blk_finish_plug+0x60/0x80
[13058.354838]  read_pages+0xec/0x2c0
[13058.354842]  __do_page_cache_readahead+0x374/0x438
[13058.354846]  ondemand_readahead+0x24c/0x6b0
[13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
[13058.354858]  generic_file_buffered_read+0x588/0xc58
[13058.354862]  generic_file_read_iter+0x1b4/0x278
[13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
[13058.354972]  __vfs_read+0x238/0x320
[13058.354976]  vfs_read+0xbc/0x1c0
[13058.354980]  ksys_read+0xdc/0x1b8
[13058.354984]  __arm64_sys_read+0x50/0x60
[13058.354990]  el0_svc_common+0xb4/0x1d8
[13058.354994]  el0_svc_handler+0x50/0xa8
[13058.354998]  el0_svc+0x8/0xc
[13058.354999]
[13058.355001] Freed by task 19731:
[13058.355007]  __kasan_slab_free+0x120/0x228
[13058.355010]  kasan_slab_free+0x10/0x18
[13058.355014]  kmem_cache_free+0x288/0x3f0
[13058.355018]  bfq_put_queue+0x134/0x208
[13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
[13058.355026]  bfq_exit_icq+0x28/0x40
[13058.355030]  ioc_exit_icq+0xa0/0x150
[13058.355035]  put_io_context_active+0x250/0x438
[13058.355038]  exit_io_context+0xd0/0x138
[13058.355045]  do_exit+0x734/0xc58
[13058.355050]  do_group_exit+0x78/0x220
[13058.355054]  __wake_up_parent+0x0/0x50
[13058.355058]  el0_svc_common+0xb4/0x1d8
[13058.355062]  el0_svc_handler+0x50/0xa8
[13058.355066]  el0_svc+0x8/0xc
[13058.355067]
[13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
[13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
[13058.355077] The buggy address belongs to the page:
[13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
[13058.366175] flags: 0x2ffffe0000008100(slab|head)
[13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
[13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
[13058.370789] page dumped because: kasan: bad access detected
[13058.370791]
[13058.370792] Memory state around the buggy address:
[13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
[13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370808]                                                                 ^
[13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[13058.370817] ==================================================================
[13058.370820] Disabling lock debugging due to kernel taint

Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
--
V2->V3: rewrite the comment as suggested by Paolo Valente
V1->V2: add one comment, and add Fixes and Reported-by tag.

Fixes: aee69d78d ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reported-by: Wang Wang <wangwang2@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Feilong Lin <linfeilong@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:50:05 +02:00
Sahitya Tummala
510b4e0695 block: Fix use-after-free issue accessing struct io_cq
[ Upstream commit 30a2da7b7e225ef6c87a660419ea04d3cef3f6a7 ]

There is a potential race between ioc_release_fn() and
ioc_clear_queue() as shown below, due to which below kernel
crash is observed. It also can result into use-after-free
issue.

context#1:				context#2:
ioc_release_fn()			__ioc_clear_queue() gets the same icq
->spin_lock(&ioc->lock);		->spin_lock(&ioc->lock);
->ioc_destroy_icq(icq);
  ->list_del_init(&icq->q_node);
  ->call_rcu(&icq->__rcu_head,
  	icq_free_icq_rcu);
->spin_unlock(&ioc->lock);
					->ioc_destroy_icq(icq);
					  ->hlist_del_init(&icq->ioc_node);
					  This results into below crash as this memory
					  is now used by icq->__rcu_head in context#1.
					  There is a chance that icq could be free'd
					  as well.

22150.386550:   <6> Unable to handle kernel write to read-only memory
at virtual address ffffffaa8d31ca50
...
Call trace:
22150.607350:   <2>  ioc_destroy_icq+0x44/0x110
22150.611202:   <2>  ioc_clear_queue+0xac/0x148
22150.615056:   <2>  blk_cleanup_queue+0x11c/0x1a0
22150.619174:   <2>  __scsi_remove_device+0xdc/0x128
22150.623465:   <2>  scsi_forget_host+0x2c/0x78
22150.627315:   <2>  scsi_remove_host+0x7c/0x2a0
22150.631257:   <2>  usb_stor_disconnect+0x74/0xc8
22150.635371:   <2>  usb_unbind_interface+0xc8/0x278
22150.639665:   <2>  device_release_driver_internal+0x198/0x250
22150.644897:   <2>  device_release_driver+0x24/0x30
22150.649176:   <2>  bus_remove_device+0xec/0x140
22150.653204:   <2>  device_del+0x270/0x460
22150.656712:   <2>  usb_disable_device+0x120/0x390
22150.660918:   <2>  usb_disconnect+0xf4/0x2e0
22150.664684:   <2>  hub_event+0xd70/0x17e8
22150.668197:   <2>  process_one_work+0x210/0x480
22150.672222:   <2>  worker_thread+0x32c/0x4c8

Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to
indicate this icq is once marked as destroyed. Also, ensure
__ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so
that icq doesn't get free'd up while it is still using it.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Co-developed-by: Pradeep P V K <ppvk@codeaurora.org>
Signed-off-by: Pradeep P V K <ppvk@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:50:04 +02:00
Paolo Valente
fb80a18584 block, bfq: move forward the getting of an extra ref in bfq_bfqq_move
[ Upstream commit fd1bb3ae54a9a2e0c42709de861c69aa146b8955 ]

Commit ecedd3d7e199 ("block, bfq: get extra ref to prevent a queue
from being freed during a group move") gets an extra reference to a
bfq_queue before possibly deactivating it (temporarily), in
bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
being reactivated in its new group.

Yet, the bfq_queue may also be expired (i.e., its service may be
stopped) before the bfq_queue is deactivated. And also an expiration
may lead to a premature freeing. This commit fixes this issue by
simply moving forward the getting of the extra reference already
introduced by commit ecedd3d7e199 ("block, bfq: get extra ref to
prevent a queue from being freed during a group move").

Reported-by: cki-project@redhat.com
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:50:02 +02:00
Konstantin Khlebnikov
fd66df97dc block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
[ Upstream commit e74d93e96d721c4297f2a900ad0191890d2fc2b0 ]

Field bdi->io_pages added in commit 9491ae4aad ("mm: don't cap request
size based on read-ahead setting") removes unneeded split of read requests.

Stacked drivers do not call blk_queue_max_hw_sectors(). Instead they set
limits of their devices by blk_set_stacking_limits() + disk_stack_limits().
Field bio->io_pages stays zero until user set max_sectors_kb via sysfs.

This patch updates io_pages after merging limits in disk_stack_limits().

Commit c6d6e9b0f6 ("dm: do not allow readahead to limit IO size") fixed
the same problem for device-mapper devices, this one fixes MD RAIDs.

Fixes: 9491ae4aad ("mm: don't cap request size based on read-ahead setting")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:50:01 +02:00
Greg Kroah-Hartman
724ffa0096 This is the 5.4.32 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6UJ1IACgkQONu9yGCS
 aT6i1A//RmLhD1Td+1pGWODgsMtfYD1FMB07D2uyFlX/NJe3jnhR9XGIKjFEtFSt
 bLXoQZlm07GPS7fsll7rE6Ydq0b1c5z59l+5pFCP4cc0ehM3Gca/V77ui0YBh9Mq
 /Jvk70qyE4en20qTia16cjozUjMreYPDdbqoR4rB3Lq+ALEEOwS0G4h2Nd4PTYGp
 YDXBwYn+O/f+CAl1arQeOSwpEThGGA4giSzBGaevq09xl2oIs4hAIWTpZ0WtugjR
 C4AXSEfl9Y/3OcYJm9KYVx4HqyunDhM3rY5ecCZXqeG4g9i5PZob9KisHuhs4nDD
 CBHd8ALTk0jo869MpizJ7nlcaGWPzBMSyxQeo1icq340KZjH/zWm7FY72VL+tBI0
 DSpPyP7zJmQESuZGRWfjLZkETH3edg6VI9233pB6OSfddYh7asrDVcw1jwMpr6Pf
 Y71aR7D9cYVyNPAP5AzQzdKUQlEPW1t4GhW6Cwxt7lbMZV18N73vYUxpl7IjSUa6
 6J5FHEIylnOFmpObzEC4Rj45Poy5ziI44/jmKMf7jmua9IAmHK2Dd6X5XjCRX40C
 Urf5it+wC5vkHVS6SW1tN4kbBhDfThHsAG71a3Y7kZeSpT6MrgneLEM9/BNk+csv
 gKTMukqZrgmR+zYTY78nbwM/XflqIqSwkF97GNvalyvT48Cokco=
 =Wb3/
 -----END PGP SIGNATURE-----

Merge 5.4.32 into android-5.4-stable

Changes in 5.4.32
	net: phy: realtek: fix handling of RTL8105e-integrated PHY
	cxgb4: fix MPS index overwrite when setting MAC address
	ipv6: don't auto-add link-local address to lag ports
	net: dsa: bcm_sf2: Do not register slave MDIO bus with OF
	net: dsa: bcm_sf2: Ensure correct sub-node is parsed
	net: dsa: mt7530: fix null pointer dereferencing in port5 setup
	net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers
	net_sched: add a temporary refcnt for struct tcindex_data
	net_sched: fix a missing refcnt in tcindex_init()
	net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting
	slcan: Don't transmit uninitialized stack data in padding
	tun: Don't put_page() for all negative return values from XDP program
	mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE
	r8169: change back SG and TSO to be disabled by default
	s390: prevent leaking kernel address in BEAR
	random: always use batched entropy for get_random_u{32,64}
	usb: dwc3: gadget: Wrap around when skip TRBs
	uapi: rename ext2_swab() to swab() and share globally in swab.h
	slub: improve bit diffusion for freelist ptr obfuscation
	tools/accounting/getdelays.c: fix netlink attribute length
	hwrng: imx-rngc - fix an error path
	ACPI: PM: Add acpi_[un]register_wakeup_handler()
	platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler()
	ASoC: jz4740-i2s: Fix divider written at incorrect offset in register
	IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
	IB/hfi1: Fix memory leaks in sysfs registration and unregistration
	IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
	ARM: imx: Enable ARM_ERRATA_814220 for i.MX6UL and i.MX7D
	ARM: imx: only select ARM_ERRATA_814220 for ARMv7-A
	ceph: remove the extra slashes in the server path
	ceph: canonicalize server path in place
	include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap
	RDMA/ucma: Put a lock around every call to the rdma_cm layer
	RDMA/cma: Teach lockdep about the order of rtnl and lock
	RDMA/siw: Fix passive connection establishment
	Bluetooth: RFCOMM: fix ODEBUG bug in rfcomm_dev_ioctl
	RDMA/cm: Update num_paths in cma_resolve_iboe_route error flow
	blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync
	fbcon: fix null-ptr-deref in fbcon_switch
	drm/i915: Fix ref->mutex deadlock in i915_active_wait()
	iommu/vt-d: Allow devices with RMRRs to use identity domain
	Linux 5.4.32

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I520b282d0cdebf3f80293c014d5cd6e88d956d55
2020-04-13 13:11:19 +02:00
Bart Van Assche
d4083258db blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync
commit 6e66b49392419f3fe134e1be583323ef75da1e4b upstream.

blk_mq_map_queues() and multiple .map_queues() implementations expect that
set->map[HCTX_TYPE_DEFAULT].nr_queues is set to the number of hardware
queues. Hence set .nr_queues before calling these functions. This patch
fixes the following kernel warning:

WARNING: CPU: 0 PID: 2501 at include/linux/cpumask.h:137
Call Trace:
 blk_mq_run_hw_queue+0x19d/0x350 block/blk-mq.c:1508
 blk_mq_run_hw_queues+0x112/0x1a0 block/blk-mq.c:1525
 blk_mq_requeue_work+0x502/0x780 block/blk-mq.c:775
 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255

Fixes: ed76e329d7 ("blk-mq: abstract out queue map") # v5.0
Reported-by: syzbot+d44e1b26ce5c3e77458d@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13 10:48:14 +02:00
Eric Biggers
0367acdde3 ANDROID: block: require drivers to declare supported crypto key type(s)
We need a way to tell which type of keys the inline crypto hardware
supports (standard, wrapped, or both), so that fallbacks can be used
when needed (either blk-crypto-fallback, or fscrypt fs-layer crypto).

We can't simply assume that

    keyslot_mgmt_ll_ops::derive_raw_secret == NULL

means only standard keys are supported and that

    keyslot_mgmt_ll_ops::derive_raw_secret != NULL

means that only wrapped keys are supported, because device-mapper
devices always implement this method.  Also, hardware might support both
types of keys.

Therefore, add a field keyslot_manager::features which contains a
bitmask of flags which indicate the supported types of keys.  Drivers
will need to fill this in.  This patch makes the UFS standard crypto
code set BLK_CRYPTO_FEATURE_STANDARD_KEYS, but UFS variant drivers may
need to set BLK_CRYPTO_FEATURE_WRAPPED_KEYS instead.

Then, make keyslot_manager_crypto_mode_supported() take the key type
into account.

Bug: 137270441
Bug: 151100202
Test: 'atest vts_kernel_encryption_test' on Pixel 4 with the
      inline crypto patches backported, and also on Cuttlefish.
Change-Id: Ied846c2767c1fd2f438792dcfd3649157e68b005
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-04-07 17:58:49 +00:00
Eric Biggers
6e8182d6c5 ANDROID: block: make blk_crypto_start_using_mode() properly check for support
If blk-crypto-fallback is needed but is disabled by kconfig, make
blk_crypto_start_using_mode() return an error rather than succeeding.
Use ENOPKG, which matches the error code used by fscrypt when crypto API
support is missing with fs-layer encryption.

Also, if blk-crypto-fallback is needed but the algorithm is missing from
the kernel's crypto API, change the error code from ENOENT to ENOPKG.

This is needed for VtsKernelEncryptionTest to pass on some devices.

Bug: 137270441
Bug: 151100202
Test: 'atest vts_kernel_encryption_test' on Pixel 4 with the
      inline crypto patches backported, and also on Cuttlefish.
Change-Id: Iedf00ca8e48c74a5d4c40b12712f38738a04ef11
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-04-07 17:58:39 +00:00
Greg Kroah-Hartman
2341be6d9d This is the 5.4.28 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl57B4gACgkQONu9yGCS
 aT6KWQ//ToF4D+fDl1Muf4xuT83HnXe1yA1XFlyC54ZmEaxGFnFMSAgAvitBR7HC
 GEczXlvYbbVJl646ynTmX/LFz2d+V0i2zGv2QKN3xfd8GtDrAq/s+Ffhneaskk1k
 gKkIUzOyBq0nEAq5vXbCT4LbQDYGLw8BxxvurLim/YywSy5sfnw+hKkYE7cVVSOa
 rTIdt5qnuL4pxD8VeCAakoU6SajoxfFqS8pX79LC1UPY+++OaVcJjyJvM4APw0Kr
 e2E1BaeZxxCyY57pQY2oqhjG3cCfIcmfln19JRzMCVNo9+J3MEjZI5EUqP/Zcjwz
 1V5FHqDmqMGfA9cn+CexDk79bTKW5+YOyYMEG2RJjV5alWJtvv3Wj6dRPVPpBhnJ
 O627IuIVGMsHuiEbjziKczaTtYYU5GTpJF+7COH0Jnud0q1w3/RjaJXspnb2Yozh
 L9BFqc4aonD+AyW2NXTxuuvc3hnD2YSVgLectm7LF/TbM2YFJVu4tounelFsGG6I
 CPH2VF0Tn+yNR2iWf8igvopvYPCjv+QFGXU6kFGxQxLQFTnXHqoO2sPF4awXx7Hv
 XF8LrJzPwissX5BbPyhUSIl0FEcmQi6UzSN1/6fpxVq+092OGVacMWpZvwqjUOV/
 3k/OrmcYsu7i2UUbms47YHAK0PRkL2ogKgxgcSO4aNZ7MfkXkm8=
 =kp1m
 -----END PGP SIGNATURE-----

Merge 5.4.28 into android-5.4

Changes in 5.4.28
	locks: fix a potential use-after-free problem when wakeup a waiter
	locks: reinstate locks_delete_block optimization
	spi: spi-omap2-mcspi: Support probe deferral for DMA channels
	drm/mediatek: Find the cursor plane instead of hard coding it
	phy: ti: gmii-sel: fix set of copy-paste errors
	phy: ti: gmii-sel: do not fail in case of gmii
	ARM: dts: dra7-l4: mark timer13-16 as pwm capable
	spi: qup: call spi_qup_pm_resume_runtime before suspending
	powerpc: Include .BTF section
	cifs: fix potential mismatch of UNC paths
	cifs: add missing mount option to /proc/mounts
	ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
	spi: pxa2xx: Add CS control clock quirk
	spi/zynqmp: remove entry that causes a cs glitch
	drm/exynos: dsi: propagate error value and silence meaningless warning
	drm/exynos: dsi: fix workaround for the legacy clock name
	drm/exynos: hdmi: don't leak enable HDMI_EN regulator if probe fails
	drivers/perf: fsl_imx8_ddr: Correct the CLEAR bit definition
	drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
	altera-stapl: altera_get_note: prevent write beyond end of 'key'
	dm bio record: save/restore bi_end_io and bi_integrity
	dm integrity: use dm_bio_record and dm_bio_restore
	riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
	ASoC: stm32: sai: manage rebind issue
	spi: spi_register_controller(): free bus id on error paths
	riscv: Force flat memory model with no-mmu
	riscv: Fix range looking for kernel image memblock
	drm/amdgpu: clean wptr on wb when gpu recovery
	drm/amd/display: Clear link settings on MST disable connector
	drm/amd/display: fix dcc swath size calculations on dcn1
	xenbus: req->body should be updated before req->state
	xenbus: req->err should be updated before req->state
	block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
	parse-maintainers: Mark as executable
	binderfs: use refcount for binder control devices too
	Revert "drm/fbdev: Fallback to non tiled mode if all tiles not present"
	USB: Disable LPM on WD19's Realtek Hub
	usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
	USB: serial: option: add ME910G1 ECM composition 0x110b
	usb: host: xhci-plat: add a shutdown
	USB: serial: pl2303: add device-id for HP LD381
	usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
	usb: typec: ucsi: displayport: Fix NULL pointer dereference
	usb: typec: ucsi: displayport: Fix a potential race during registration
	USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
	USB: cdc-acm: fix rounding error in TIOCSSERIAL
	ALSA: line6: Fix endless MIDI read loop
	ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662
	ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662
	ALSA: seq: virmidi: Fix running status after receiving sysex
	ALSA: seq: oss: Fix running status after receiving sysex
	ALSA: pcm: oss: Avoid plugin buffer overflow
	ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
	tty: fix compat TIOCGSERIAL leaking uninitialized memory
	tty: fix compat TIOCGSERIAL checking wrong function ptr
	iio: chemical: sps30: fix missing triggered buffer dependency
	iio: st_sensors: remap SMO8840 to LIS2DH12
	iio: trigger: stm32-timer: disable master mode when stopping
	iio: accel: adxl372: Set iio_chan BE
	iio: magnetometer: ak8974: Fix negative raw values in sysfs
	iio: adc: stm32-dfsdm: fix sleep in atomic context
	iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
	iio: light: vcnl4000: update sampling periods for vcnl4200
	iio: light: vcnl4000: update sampling periods for vcnl4040
	mmc: rtsx_pci: Fix support for speed-modes that relies on tuning
	mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
	mmc: sdhci-cadence: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN for UniPhier
	CIFS: fiemap: do not return EINVAL if get nothing
	kbuild: Disable -Wpointer-to-enum-cast
	staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
	staging: greybus: loopback_test: fix poll-mask build breakage
	staging/speakup: fix get_word non-space look-ahead
	intel_th: msu: Fix the unexpected state warning
	intel_th: Fix user-visible error codes
	intel_th: pci: Add Elkhart Lake CPU support
	modpost: move the namespace field in Module.symvers last
	rtc: max8907: add missing select REGMAP_IRQ
	arm64: compat: Fix syscall number of compat_clock_getres
	xhci: Do not open code __print_symbolic() in xhci trace events
	btrfs: fix log context list corruption after rename whiteout error
	drm/amd/amdgpu: Fix GPR read from debugfs (v2)
	drm/lease: fix WARNING in idr_destroy
	stm class: sys-t: Fix the use of time_after()
	memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
	mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
	mm, memcg: throttle allocators based on ancestral memory.high
	mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
	mm: do not allow MADV_PAGEOUT for CoW pages
	epoll: fix possible lost wakeup on epoll_ctl() path
	mm: slub: be more careful about the double cmpxchg of freelist
	mm, slub: prevent kmalloc_node crashes and memory leaks
	page-flags: fix a crash at SetPageError(THP_SWAP)
	x86/mm: split vmalloc_sync_all()
	futex: Fix inode life-time issue
	futex: Unbreak futex hashing
	ALSA: hda/realtek: Fix pop noise on ALC225
	arm64: smp: fix smp_send_stop() behaviour
	arm64: smp: fix crash_smp_send_stop() behaviour
	nvmet-tcp: set MSG_MORE only if we actually have more to send
	drm/bridge: dw-hdmi: fix AVI frame colorimetry
	staging: greybus: loopback_test: fix potential path truncation
	staging: greybus: loopback_test: fix potential path truncations
	Linux 5.4.28

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5d9d15d6236d8ab7374205c6ceda7efa7a9abb70
2020-03-25 16:12:11 +01:00
Carlo Nonato
4db2f87e15 block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
[ Upstream commit 14afc59361976c0ba39e3a9589c3eaa43ebc7e1d ]

The bfq_find_set_group() function takes as input a blkcg (which represents
a cgroup) and retrieves the corresponding bfq_group, then it updates the
bfq internal group hierarchy (see comments inside the function for why
this is needed) and finally it returns the bfq_group.
In the hierarchy update cycle, the pointer holding the correct bfq_group
that has to be returned is mistakenly used to traverse the hierarchy
bottom to top, meaning that in each iteration it gets overwritten with the
parent of the current group. Since the update cycle stops at root's
children (depth = 2), the overwrite becomes a problem only if the blkcg
describes a cgroup at a hierarchy level deeper than that (depth > 2). In
this case the root's child that happens to be also an ancestor of the
correct bfq_group is returned. The main consequence is that processes
contained in a cgroup at depth greater than 2 are wrongly placed in the
group described above by BFQ.

This commits fixes this problem by using a different bfq_group pointer in
the update cycle in order to avoid the overwrite of the variable holding
the original group reference.

Reported-by: Kwon Je Oh <kwonje.oh2@gmail.com>
Signed-off-by: Carlo Nonato <carlo.nonato95@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-25 08:25:49 +01:00
Greg Kroah-Hartman
16ada3b38e This is the 5.4.27 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl51vt0ACgkQONu9yGCS
 aT7TGxAArvOqk911pxz4lhFG2MXnZMSvP/9/y92zJFlSGc3mX4ZdpCMLONNPkXmY
 8CpnXAP6q1Od4zatNSM+wTmbpszevyeIQusCqcEWhE9IkMKLUfkRrq4logaEE5lB
 8+Dw3FG8lB6SRJGoR42+8+HLo5fJtG33FyouBBB/jmeVbLyhkL36zyY0o29EcPVC
 YwVFEzwl8/DZWuWe99IXVsL3iA1Z4zDnr2zOxHmXp4tF+jOZ7Kc4JF8O4q7/x546
 721NPuuzBMqDIiL6u5idH2weE6e5V4KEGVxOlIr5fVB1CPQKOswHWArKvMaq2+u8
 iUoXI9Vng7Qb30+CBW2LXJ1Irt3IkUFJkCKSHj/2xD85xHeWPYBQ5VdcbiFkHsGT
 WMYupYn28fEi3pBVbWTAm+E8YDPR/gr/FzqopSeR0cuzWvupuuRDRtjEJyDc+VZx
 Vu2SvyMx9MEbwoFa7zEjc8unqeYD78a/m9ULh97NN//3FHnT479c4Tq6oA4+WGgr
 9zF8u5B9L5bwgTqBGk3OCNd82wsaGjFzzhsg/vPGj+CW4HIratOvISVEg5Harm2Y
 q5c3+V017n5+gVH4DcJ/fiSL11Vuwk537ZZzK6WjVkRmxN3+Ba7d1AWikwwFVtdm
 eaHbMbCtstwSE0wgsFLnw7ovA1650GMIk9kuwFuAUwEFM+Qnws0=
 =xHmK
 -----END PGP SIGNATURE-----

Merge 5.4.27 into android-5.4

Changes in 5.4.27
	netfilter: hashlimit: do not use indirect calls during gc
	netfilter: xt_hashlimit: unregister proc file before releasing mutex
	drm/amdgpu: Fix TLB invalidation request when using semaphore
	ACPI: watchdog: Allow disabling WDAT at boot
	HID: apple: Add support for recent firmware on Magic Keyboards
	ACPI: watchdog: Set default timeout in probe
	HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
	HID: hid-bigbenff: fix general protection fault caused by double kfree
	HID: hid-bigbenff: call hid_hw_stop() in case of error
	HID: hid-bigbenff: fix race condition for scheduled work during removal
	selftests/rseq: Fix out-of-tree compilation
	tracing: Fix number printing bug in print_synth_event()
	cfg80211: check reg_rule for NULL in handle_channel_custom()
	scsi: libfc: free response frame from GPN_ID
	net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch
	net: ks8851-ml: Fix IRQ handling and locking
	mac80211: rx: avoid RCU list traversal under mutex
	net: ll_temac: Fix race condition causing TX hang
	net: ll_temac: Add more error handling of dma_map_single() calls
	net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressure
	net: ll_temac: Handle DMA halt condition caused by buffer underrun
	blk-mq: insert passthrough request into hctx->dispatch directly
	drm/amdgpu: fix memory leak during TDR test(v2)
	kbuild: add dtbs_check to PHONY
	kbuild: add dt_binding_check to PHONY in a correct place
	signal: avoid double atomic counter increments for user accounting
	slip: not call free_netdev before rtnl_unlock in slip_open
	net: phy: mscc: fix firmware paths
	hinic: fix a irq affinity bug
	hinic: fix a bug of setting hw_ioctxt
	hinic: fix a bug of rss configuration
	net: rmnet: fix NULL pointer dereference in rmnet_newlink()
	net: rmnet: fix NULL pointer dereference in rmnet_changelink()
	net: rmnet: fix suspicious RCU usage
	net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
	net: rmnet: do not allow to change mux id if mux id is duplicated
	net: rmnet: use upper/lower device infrastructure
	net: rmnet: fix bridge mode bugs
	net: rmnet: fix packet forwarding in rmnet bridge mode
	sfc: fix timestamp reconstruction at 16-bit rollover points
	jbd2: fix data races at struct journal_head
	blk-mq: insert flush request to the front of dispatch queue
	net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue
	ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional()
	ARM: 8958/1: rename missed uaccess .fixup section
	mm: slub: add missing TID bump in kmem_cache_alloc_bulk()
	HID: google: add moonball USB id
	HID: add ALWAYS_POLL quirk to lenovo pixart mouse
	ARM: 8961/2: Fix Kbuild issue caused by per-task stack protector GCC plugin
	ipv4: ensure rcu_read_lock() in cipso_v4_error()
	Linux 5.4.27

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie4b2a2b56d2e6e6b1a657d8b8e3ed7ed0c8a2d6c
2020-03-21 10:07:42 +01:00
Ming Lei
235fb892d8 blk-mq: insert flush request to the front of dispatch queue
[ Upstream commit cc3200eac4c5eb11c3f34848a014d1f286316310 ]

commit 01e99aeca397 ("blk-mq: insert passthrough request into
hctx->dispatch directly") may change to add flush request to the tail
of dispatch by applying the 'add_head' parameter of
blk_mq_sched_insert_request.

Turns out this way causes performance regression on NCQ controller because
flush is non-NCQ command, which can't be queued when there is any in-flight
NCQ command. When adding flush rq to the front of hctx->dispatch, it is
easier to introduce extra time to flush rq's latency compared with adding
to the tail of dispatch queue because of S_SCHED_RESTART, then chance of
flush merge is increased, and less flush requests may be issued to
controller.

So always insert flush request to the front of dispatch queue just like
before applying commit 01e99aeca397 ("blk-mq: insert passthrough request
into hctx->dispatch directly").

Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 01e99aeca397 ("blk-mq: insert passthrough request into hctx->dispatch directly")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-21 08:11:57 +01:00
Ming Lei
74c77d6a4e blk-mq: insert passthrough request into hctx->dispatch directly
[ Upstream commit 01e99aeca3979600302913cef3f89076786f32c8 ]

For some reason, device may be in one situation which can't handle
FS request, so STS_RESOURCE is always returned and the FS request
will be added to hctx->dispatch. However passthrough request may
be required at that time for fixing the problem. If passthrough
request is added to scheduler queue, there isn't any chance for
blk-mq to dispatch it given we prioritize requests in hctx->dispatch.
Then the FS IO request may never be completed, and IO hang is caused.

So passthrough request has to be added to hctx->dispatch directly
for fixing the IO hang.

Fix this issue by inserting passthrough request into hctx->dispatch
directly together withing adding FS request to the tail of
hctx->dispatch in blk_mq_dispatch_rq_list(). Actually we add FS request
to tail of hctx->dispatch at default, see blk_mq_request_bypass_insert().

Then it becomes consistent with original legacy IO request
path, in which passthrough request is always added to q->queue_head.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-21 08:11:52 +01:00
Greg Kroah-Hartman
0d3cca0c7d This is the 5.4.26 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5xvRcACgkQONu9yGCS
 aT5GJhAAlahoiTrrDWQH5njl2uNVxOqZl2r6jiaX7+CrcPUg5tIznqUo9qht2s+V
 zCqUcuq9e1Q6WFeqHPx0qwaROaLppcKLuTqg4hrH2KPhuZ4yYjTfjFa8++Q7poPL
 7tQVYxC6r04IdoZnbAOrRduAoMFMgTq9WZCdOpY3860LJrelOPPUh6+N8YnzXTIs
 q0+6nBNtlMYAlQLbQI9nggOtxf6yBor4Y9mfgk1Spe8vbNyZBFCvtRTntlhJI5ME
 u+xictLY3dbvUNFeuJSlQZd22tqTvw2Gxc5Efy2GHjokNb7SKMEswnWy1K6BKP5b
 NZwLtn85mgCWbq7Eo4Bf5idA2EpsyeoJTr9bRmx5mWJ3hFcUaMj6bNEUrfybsZKg
 8/lbP5zM459NyLF+arCy5lNkNBridQt/vDciGiYDzHEVi88PJRNwL6nN01XK+ByU
 aCcQghNAzjMYdxDwotWGKKyTelC+Ck8FlJkIoIgEPYvvTRdaDk+BP/mHaH7KQNax
 AXbSND0MmtloxHgecvmJASU/628uhv8hxzjcCW+Gnzwk55nH2rwZ+VC4ciJgI/hj
 u5x1zaumj67WPqAZ7dZIBrJxpKfDMMFxTQFXLMgP+C+UjJviG4dUw1IVk0v5jTqx
 XG+Gk13ovv62QtKW7wTsuZVrjHXo1zB8OsQrDdKG+GXTkf53bP4=
 =1Qwi
 -----END PGP SIGNATURE-----

Merge 5.4.26 into android-5.4

Changes in 5.4.26
	virtio_balloon: Adjust label in virtballoon_probe
	ALSA: hda/realtek - More constifications
	ALSA: hda/realtek - Add Headset Mic supported for HP cPC
	ALSA: hda/realtek - Fixed one of HP ALC671 platform Headset Mic supported
	cgroup, netclassid: periodically release file_lock on classid updating
	gre: fix uninit-value in __iptunnel_pull_header
	inet_diag: return classid for all socket types
	ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface
	ipvlan: add cond_resched_rcu() while processing muticast backlog
	ipvlan: do not add hardware address of master to its unicast filter list
	ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()
	ipvlan: don't deref eth hdr before checking it's set
	macvlan: add cond_resched() during multicast processing
	net: dsa: fix phylink_start()/phylink_stop() calls
	net: dsa: mv88e6xxx: fix lockup on warm boot
	net: fec: validate the new settings in fec_enet_set_coalesce()
	net: hns3: fix a not link up issue when fibre port supports autoneg
	net/ipv6: use configured metric when add peer route
	netlink: Use netlink header as base to calculate bad attribute offset
	net: macsec: update SCI upon MAC address change.
	net: nfc: fix bounds checking bugs on "pipe"
	net/packet: tpacket_rcv: do not increment ring index on drop
	net: phy: bcm63xx: fix OOPS due to missing driver name
	net: stmmac: dwmac1000: Disable ACS if enhanced descs are not used
	net: systemport: fix index check to avoid an array out of bounds access
	r8152: check disconnect status after long sleep
	sfc: detach from cb_page in efx_copy_channel()
	slip: make slhc_compress() more robust against malicious packets
	taprio: Fix sending packets without dequeueing them
	bonding/alb: make sure arp header is pulled before accessing it
	bnxt_en: reinitialize IRQs when MTU is modified
	bnxt_en: fix error handling when flashing from file
	cgroup: memcg: net: do not associate sock with unrelated cgroup
	net: memcg: late association of sock to memcg
	net: memcg: fix lockdep splat in inet_csk_accept()
	devlink: validate length of param values
	devlink: validate length of region addr/len
	fib: add missing attribute validation for tun_id
	nl802154: add missing attribute validation
	nl802154: add missing attribute validation for dev_type
	can: add missing attribute validation for termination
	macsec: add missing attribute validation for port
	net: fq: add missing attribute validation for orphan mask
	net: taprio: add missing attribute validation for txtime delay
	team: add missing attribute validation for port ifindex
	team: add missing attribute validation for array index
	tipc: add missing attribute validation for MTU property
	nfc: add missing attribute validation for SE API
	nfc: add missing attribute validation for deactivate target
	nfc: add missing attribute validation for vendor subcommand
	net: phy: avoid clearing PHY interrupts twice in irq handler
	net: phy: fix MDIO bus PM PHY resuming
	net/ipv6: need update peer route when modify metric
	net/ipv6: remove the old peer route if change it to a new one
	selftests/net/fib_tests: update addr_metric_test for peer route testing
	net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
	net: phy: Avoid multiple suspends
	cgroup: cgroup_procs_next should increase position index
	cgroup: Iterate tasks that did not finish do_exit()
	netfilter: nf_tables: fix infinite loop when expr is not available
	iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices
	virtio-blk: fix hw_queue stopped on arbitrary error
	iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
	netfilter: nf_conntrack: ct_cpu_seq_next should increase position index
	netfilter: synproxy: synproxy_cpu_seq_next should increase position index
	netfilter: xt_recent: recent_seq_next should increase position index
	netfilter: x_tables: xt_mttg_seq_next should increase position index
	workqueue: don't use wq_select_unbound_cpu() for bound works
	drm/amd/display: remove duplicated assignment to grph_obj_type
	drm/i915: be more solid in checking the alignment
	drm/i915: Defer semaphore priority bumping to a workqueue
	mmc: sdhci-pci-gli: Enable MSI interrupt for GL975x
	pinctrl: falcon: fix syntax error
	ktest: Add timeout for ssh sync testing
	cifs_atomic_open(): fix double-put on late allocation failure
	gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
	KVM: x86: clear stale x86_emulate_ctxt->intercept value
	KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
	ARC: define __ALIGN_STR and __ALIGN symbols for ARC
	fuse: fix stack use after return
	s390/dasd: fix data corruption for thin provisioned devices
	ipmi_si: Avoid spurious errors for optional IRQs
	blk-iocost: fix incorrect vtime comparison in iocg_is_idle()
	fscrypt: don't evict dirty inodes after removing key
	macintosh: windfarm: fix MODINFO regression
	x86/ioremap: Map EFI runtime services data as encrypted for SEV
	efi: Fix a race and a buffer overflow while reading efivars via sysfs
	efi: Add a sanity check to efivar_store_raw()
	i2c: designware-pci: Fix BUG_ON during device removal
	mt76: fix array overflow on receiving too many fragments for a packet
	perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
	x86/mce: Fix logic and comments around MSR_PPIN_CTL
	iommu/dma: Fix MSI reservation allocation
	iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
	iommu/vt-d: Fix RCU list debugging warnings
	iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
	batman-adv: Don't schedule OGM for disabled interface
	clk: imx8mn: Fix incorrect clock defines
	pinctrl: meson-gxl: fix GPIOX sdio pins
	pinctrl: imx: scu: Align imx sc msg structs to 4
	virtio_ring: Fix mem leak with vring_new_virtqueue()
	drm/i915/gvt: Fix dma-buf display blur issue on CFL
	pinctrl: core: Remove extra kref_get which blocks hogs being freed
	drm/i915/gvt: Fix unnecessary schedule timer when no vGPU exits
	driver code: clarify and fix platform device DMA mask allocation
	iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()
	i2c: gpio: suppress error on probe defer
	nl80211: add missing attribute validation for critical protocol indication
	nl80211: add missing attribute validation for beacon report scanning
	nl80211: add missing attribute validation for channel switch
	perf bench futex-wake: Restore thread count default to online CPU count
	netfilter: cthelper: add missing attribute validation for cthelper
	netfilter: nft_payload: add missing attribute validation for payload csum flags
	netfilter: nft_tunnel: add missing attribute validation for tunnels
	netfilter: nf_tables: dump NFTA_CHAIN_FLAGS attribute
	netfilter: nft_chain_nat: inet family is missing module ownership
	iommu/vt-d: Fix the wrong printing in RHSA parsing
	iommu/vt-d: Ignore devices with out-of-spec domain number
	i2c: acpi: put device when verifying client fails
	iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE
	ipv6: restrict IPV6_ADDRFORM operation
	net/smc: check for valid ib_client_data
	net/smc: cancel event worker during device removal
	Linux 5.4.26

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifacde9164f8ded01031a9e6a9c313d4fbcead25b
2020-03-18 08:19:15 +01:00
Tejun Heo
b7e54dd751 blk-iocost: fix incorrect vtime comparison in iocg_is_idle()
commit dcd6589b11d3b1e71f516a87a7b9646ed356b4c0 upstream.

vtimes may wrap and time_before/after64() should be used to determine
whether a given vtime is before or after another. iocg_is_idle() was
incorrectly using plain "<" comparison do determine whether done_vtime
is before vtime. Here, the only thing we're interested in is whether
done_vtime matches vtime which indicates that there's nothing in
flight. Let's test for inequality instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:52 +01:00
Greg Kroah-Hartman
6d52041543 This is the 5.4.25 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5qJSMACgkQONu9yGCS
 aT6/Dw//Usg9m0cBB4Ip4fYxI0EVz8BgnVe9KSdt+71gM63QCOi1ZeTS0NDMUtO0
 MTsQSudUpfntrT8QHCmBwCZ5LlAAZvxDS9UOqnhkWbqNY5jGmUhH5u28RJL28dp2
 8wJY6zZKg+pfOWXd81slW86uN27QZvURNEthT81sN2ucxe5DXV1gs87FILSdMpXm
 I0Z3LpUoZDjpONeA6WTZqkDNA0J7Z9QjULx9/4LFi/gc0q1ApWC7FV1A9gpQHaBa
 w4kDWJCGqq3mNx8Hi9BHau50VUHX5tuKvpn9RcmSl9BBba25pE5h0EVIGo8Dlq+9
 T9hkVR5iXeMbFERnLm5iR0DjFHog/mOgAgUHSTTXB3BcdgIKWwUcc2gCcr2Y7KIK
 CD7l+kX1nWUk4yYre7zXiG/vO9ilYgeboc8C5Qdq3XR6zaO90+8NUbCOpa2+6yEF
 H7kugstb6l+iCJ1k8YJd0ORGOobl68+P79TLxAOFnkNGJRzuAoXmBH+xkqAugz1H
 YKKAbE+MzW75sre7PxU1g1uPOHxfMfd5e3uRtUU5OETJv0A2kTte8ay5rqLNbe7H
 QYqdfwTr2oFssnWKW5d/KdSopD5A/31/Kjkmzl6ED2xaLMEpA7zyed5p+G/Beu5s
 dkPlteya8wCQ1W/KtDJRhbCauoG/NyCKIeoQitHBJwMapcEo8ZU=
 =rDP8
 -----END PGP SIGNATURE-----

Merge 5.4.25 into android-5.4

Changes in 5.4.25
	block, bfq: get extra ref to prevent a queue from being freed during a group move
	block, bfq: do not insert oom queue into position tree
	ALSA: hda/realtek - Fix a regression for mute led on Lenovo Carbon X1
	net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
	net: stmmac: fix notifier registration
	dm thin metadata: fix lockdep complaint
	RDMA/core: Fix pkey and port assignment in get_new_pps
	RDMA/core: Fix use of logical OR in get_new_pps
	kbuild: fix 'No such file or directory' warning when cleaning
	kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
	blktrace: fix dereference after null check
	ALSA: hda: do not override bus codec_mask in link_get()
	serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
	selftests: fix too long argument
	usb: gadget: composite: Support more than 500mA MaxPower
	usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
	usb: gadget: serial: fix Tx stall after buffer overflow
	habanalabs: halt the engines before hard-reset
	habanalabs: do not halt CoreSight during hard reset
	habanalabs: patched cb equals user cb in device memset
	drm/msm/mdp5: rate limit pp done timeout warnings
	drm: msm: Fix return type of dsi_mgr_connector_mode_valid for kCFI
	drm/modes: Make sure to parse valid rotation value from cmdline
	drm/modes: Allow DRM_MODE_ROTATE_0 when applying video mode parameters
	scsi: megaraid_sas: silence a warning
	drm/msm/dsi: save pll state before dsi host is powered off
	drm/msm/dsi/pll: call vco set rate explicitly
	selftests: forwarding: use proto icmp for {gretap, ip6gretap}_mac testing
	selftests: forwarding: vxlan_bridge_1d: fix tos value
	net: atlantic: check rpc result and wait for rpc address
	net: ks8851-ml: Remove 8-bit bus accessors
	net: ks8851-ml: Fix 16-bit data access
	net: ks8851-ml: Fix 16-bit IO operation
	net: ethernet: dm9000: Handle -EPROBE_DEFER in dm9000_parse_dt()
	watchdog: da9062: do not ping the hw during stop()
	s390/cio: cio_ignore_proc_seq_next should increase position index
	s390: make 'install' not depend on vmlinux
	efi: Only print errors about failing to get certs if EFI vars are found
	net/mlx5: DR, Fix matching on vport gvmi
	iommu/amd: Disable IOMMU on Stoney Ridge systems
	nvme/pci: Add sleep quirk for Samsung and Toshiba drives
	nvme-pci: Use single IRQ vector for old Apple models
	x86/boot/compressed: Don't declare __force_order in kaslr_64.c
	s390/qdio: fill SL with absolute addresses
	nvme: Fix uninitialized-variable warning
	ice: Don't tell the OS that link is going down
	x86/xen: Distribute switch variables for initialization
	net: thunderx: workaround BGX TX Underflow issue
	csky/mm: Fixup export invalid_pte_table symbol
	csky: Set regs->usp to kernel sp, when the exception is from kernel
	csky/smp: Fixup boot failed when CONFIG_SMP
	csky: Fixup ftrace modify panic
	csky: Fixup compile warning for three unimplemented syscalls
	arch/csky: fix some Kconfig typos
	selftests: forwarding: vxlan_bridge_1d: use more proper tos value
	firmware: imx: scu: Ensure sequential TX
	binder: prevent UAF for binderfs devices
	binder: prevent UAF for binderfs devices II
	ALSA: hda/realtek - Add Headset Mic supported
	ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1
	ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
	ALSA: hda/realtek - Enable the headset of ASUS B9450FA with ALC294
	cifs: don't leak -EAGAIN for stat() during reconnect
	cifs: fix rename() by ensuring source handle opened with DELETE bit
	usb: storage: Add quirk for Samsung Fit flash
	usb: quirks: add NO_LPM quirk for Logitech Screen Share
	usb: dwc3: gadget: Update chain bit correctly when using sg list
	usb: cdns3: gadget: link trb should point to next request
	usb: cdns3: gadget: toggle cycle bit before reset endpoint
	usb: core: hub: fix unhandled return by employing a void function
	usb: core: hub: do error out if usb_autopm_get_interface() fails
	usb: core: port: do error out if usb_autopm_get_interface() fails
	vgacon: Fix a UAF in vgacon_invert_region
	mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
	mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
	mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
	fat: fix uninit-memory access for partial initialized inode
	btrfs: fix RAID direct I/O reads with alternate csums
	arm64: dts: socfpga: agilex: Fix gmac compatible
	arm: dts: dra76x: Fix mmc3 max-frequency
	tty:serial:mvebu-uart:fix a wrong return
	tty: serial: fsl_lpuart: free IDs allocated by IDA
	serial: 8250_exar: add support for ACCES cards
	vt: selection, close sel_buffer race
	vt: selection, push console lock down
	vt: selection, push sel_lock up
	media: hantro: Fix broken media controller links
	media: mc-entity.c: use & to check pad flags, not ==
	media: vicodec: process all 4 components for RGB32 formats
	media: v4l2-mem2mem.c: fix broken links
	perf intel-pt: Fix endless record after being terminated
	perf intel-bts: Fix endless record after being terminated
	perf cs-etm: Fix endless record after being terminated
	perf arm-spe: Fix endless record after being terminated
	spi: spidev: Fix CS polarity if GPIO descriptors are used
	x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
	s390/pci: Fix unexpected write combine on resource
	s390/mm: fix panic in gup_fast on large pud
	dmaengine: imx-sdma: fix context cache
	dmaengine: imx-sdma: Fix the event id check to include RX event for UART6
	dmaengine: tegra-apb: Fix use-after-free
	dmaengine: tegra-apb: Prevent race conditions of tasklet vs free list
	dm integrity: fix recalculation when moving from journal mode to bitmap mode
	dm integrity: fix a deadlock due to offloading to an incorrect workqueue
	dm integrity: fix invalid table returned due to argument count mismatch
	dm cache: fix a crash due to incorrect work item cancelling
	dm: report suspended device during destroy
	dm writecache: verify watermark during resume
	dm zoned: Fix reference counter initial value of chunk works
	dm: fix congested_fn for request-based device
	arm64: dts: meson-sm1-sei610: add missing interrupt-names
	ARM: dts: ls1021a: Restore MDIO compatible to gianfar
	spi: bcm63xx-hsspi: Really keep pll clk enabled
	drm/virtio: make resource id workaround runtime switchable.
	drm/virtio: fix resource id creation race
	ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
	ASoC: topology: Fix memleak in soc_tplg_manifest_load()
	ASoC: SOF: Fix snd_sof_ipc_stream_posn()
	ASoC: intel: skl: Fix pin debug prints
	ASoC: intel: skl: Fix possible buffer overflow in debug outputs
	powerpc: define helpers to get L1 icache sizes
	powerpc: Convert flush_icache_range & friends to C
	powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
	ASoC: pcm: Fix possible buffer overflow in dpcm state sysfs output
	ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
	ASoC: Intel: Skylake: Fix available clock counter incrementation
	ASoC: dapm: Correct DAPM handling of active widgets during shutdown
	spi: atmel-quadspi: fix possible MMIO window size overrun
	drm/panfrost: Don't try to map on error faults
	drm: kirin: Revert "Fix for hikey620 display offset problem"
	drm/sun4i: Add separate DE3 VI layer formats
	drm/sun4i: Fix DE2 VI layer format support
	drm/sun4i: de2/de3: Remove unsupported VI layer formats
	drm/i915: Program MBUS with rmw during initialization
	drm/i915/selftests: Fix return in assert_mmap_offset()
	phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
	phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
	ARM: dts: imx6: phycore-som: fix emmc supply
	arm64: dts: imx8qxp-mek: Remove unexisting Ethernet PHY
	firmware: imx: misc: Align imx sc msg structs to 4
	firmware: imx: scu-pd: Align imx sc msg structs to 4
	firmware: imx: Align imx_sc_msg_req_cpu_start to 4
	soc: imx-scu: Align imx sc msg structs to 4
	Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
	RDMA/rw: Fix error flow during RDMA context initialization
	RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
	RDMA/siw: Fix failure handling during device creation
	RDMA/iwcm: Fix iwcm work deallocation
	RDMA/core: Fix protection fault in ib_mr_pool_destroy
	regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling
	RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
	IB/hfi1, qib: Ensure RCU is locked when accessing list
	ARM: imx: build v7_cpu_resume() unconditionally
	ARM: dts: am437x-idk-evm: Fix incorrect OPP node names
	ARM: dts: dra7xx-clocks: Fixup IPU1 mux clock parent source
	ARM: dts: imx7-colibri: Fix frequency for sd/mmc
	hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
	dma-buf: free dmabuf->name in dma_buf_release()
	dmaengine: coh901318: Fix a double lock bug in dma_tc_handle()
	arm64: dts: meson: fix gxm-khadas-vim2 wifi
	bus: ti-sysc: Fix 1-wire reset quirk
	EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
	powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
	efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
	efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
	efi: READ_ONCE rng seed size before munmap
	block, bfq: get a ref to a group when adding it to a service tree
	block, bfq: remove ifdefs from around gets/puts of bfq groups
	csky: Implement copy_thread_tls
	drm/virtio: module_param_named() requires linux/moduleparam.h
	Linux 5.4.25

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ba29f273c7a2b02bfa54593f7a9087c34607cd5
2020-03-12 17:09:04 +01:00
Paolo Valente
e28c9b3caf block, bfq: remove ifdefs from around gets/puts of bfq groups
commit 4d8340d0d4d90e7ca367d18ec16c2fefa89a339c upstream.

ifdefs around gets and puts of bfq groups reduce readability, remove them.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12 13:00:32 +01:00
Paolo Valente
92ed51e651 block, bfq: get a ref to a group when adding it to a service tree
commit db37a34c563bf4692b36990ae89005c031385e52 upstream.

BFQ schedules generic entities, which may represent either bfq_queues
or groups of bfq_queues. When an entity is inserted into a service
tree, a reference must be taken, to make sure that the entity does not
disappear while still referred in the tree. Unfortunately, such a
reference is mistakenly taken only if the entity represents a
bfq_queue. This commit takes a reference also in case the entity
represents a group.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Chris Evich <cevich@redhat.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12 13:00:32 +01:00
Paolo Valente
63f42809f0 block, bfq: do not insert oom queue into position tree
[ Upstream commit 32c59e3a9a5a0b180dd015755d6d18ca31e55935 ]

BFQ maintains an ordered list, implemented with an RB tree, of
head-request positions of non-empty bfq_queues. This position tree,
inherited from CFQ, is used to find bfq_queues that contain I/O close
to each other. BFQ merges these bfq_queues into a single shared queue,
if this boosts throughput on the device at hand.

There is however a special-purpose bfq_queue that does not participate
in queue merging, the oom bfq_queue. Yet, also this bfq_queue could be
wrongly added to the position tree. So bfqq_find_close() could return
the oom bfq_queue, which is a source of further troubles in an
out-of-memory situation. This commit prevents the oom bfq_queue from
being inserted into the position tree.

Tested-by: Patrick Dung <patdung100@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-12 13:00:08 +01:00
Paolo Valente
594fca1d04 block, bfq: get extra ref to prevent a queue from being freed during a group move
[ Upstream commit ecedd3d7e19911ab8fe42f17b77c0a30fe7f4db3 ]

In bfq_bfqq_move(), the bfq_queue, say Q, to be moved to a new group
may happen to be deactivated in the scheduling data structures of the
source group (and then activated in the destination group). If Q is
referred only by the data structures in the source group when the
deactivation happens, then Q is freed upon the deactivation.

This commit addresses this issue by getting an extra reference before
the possible deactivation, and releasing this extra reference after Q
has been moved.

Tested-by: Chris Evich <cevich@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-12 13:00:08 +01:00
Barani Muthukumaran
3ab1972984 ANDROID: block: Prevent crypto fallback for wrapped keys
blk-crypto-fallback does not support wrapped keys, hence
prevent falling back when program_key fails. Add 'is_hw_wrapped'
flag to blk-crypto-key to mention if the key is wrapped
when the key is initialized.

Bug: 147209885

Test: Validate FBE, simulate a failure in the underlying blk
      device and ensure the call fails without falling back
      to blk-crypto-fallback.

Change-Id: I8bc301ca1ac9e55ba6ab622e8325486916b45c56
Signed-off-by: Barani Muthukumaran <bmuthuku@codeaurora.org>
2020-02-25 19:27:51 +00:00
Greg Kroah-Hartman
835bd1de9c This is the 5.4.22 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5TfSAACgkQONu9yGCS
 aT4I8w//SU+w9Tj8Crpt1BI7Lk2AiTGvyZtX0wGd53vzFKGy+Wi1Oba1ybB+xyYw
 UgMJJpoOgp9gTatRgjDl0vO/7U7vZckigPpog3pSW+xq2JW0kTWGS2z04hUjWKkG
 W4l3sAGwHRv7MTBbpjECDSHv+6x6ZqlWcVodpkHqLNmGxR0mYuiB6Zu8QuCu1bl0
 K0SAlt+yd0laUt2bU3wpEqBwGXHepz+IqsqcYp78sAeytT8ds9ZfPxKv98CvLlXs
 VLVr87UqZy3Hkl6IWKGrmdhWbTZE+3AyjKnxlA8PovA0ET5xO/IFPLHVhVX+or+5
 UFp/1qvacr+EIu8CKvftc2n1CflaRXIn/QNpwdemh94mi/2TqiXiqAUu1EiW56vg
 /PUH8G72Q26AiWSmD3WRr09ohTu4hfz6fIDKV60qmdVe4AUffLw0SnBEE0VFA3/S
 lVKZeXKkePeMlHcTyRDQ6+/y49yjfq2exdrjetypOwRa1emHxj/YsfdnEWYfwT53
 sikMLjP4XA7v5rsDr9LJTwQL/V/7euu1Hr3lSGpRv8vmePprvfmivTLcY5tgvOTC
 GZ51Em+CxJ+W4vCJKHuM7i0nUvf2Knn5lBidq4KsvLRUuZ31mSXSfSn4bW6Gl/Jm
 RZPDC71MqT/FMtfuQLlVNqIw2umC1buNa5SwZ8GhJG6za4gU4FU=
 =L+e0
 -----END PGP SIGNATURE-----

Merge 5.4.22 into android-5.4

Changes in 5.4.22
	core: Don't skip generic XDP program execution for cloned SKBs
	enic: prevent waking up stopped tx queues over watchdog reset
	net/smc: fix leak of kernel memory to user space
	net: dsa: tag_qca: Make sure there is headroom for tag
	net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
	net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
	drm/gma500: Fixup fbdev stolen size usage evaluation
	ath10k: Fix qmi init error handling
	wil6210: fix break that is never reached because of zero'ing of a retry counter
	drm/qxl: Complete exception handling in qxl_device_init()
	rcu/nocb: Fix dump_tree hierarchy print always active
	rcu: Fix missed wakeup of exp_wq waiters
	rcu: Fix data-race due to atomic_t copy-by-value
	f2fs: preallocate DIO blocks when forcing buffered_io
	f2fs: call f2fs_balance_fs outside of locked page
	media: meson: add missing allocation failure check on new_buf
	clk: meson: pll: Fix by 0 division in __pll_params_to_rate()
	cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
	brcmfmac: Fix memory leak in brcmf_p2p_create_p2pdev()
	brcmfmac: Fix use after free in brcmf_sdio_readframes()
	PCI: Fix pci_add_dma_alias() bitmask size
	drm/amd/display: Map ODM memory correctly when doing ODM combine
	leds: pca963x: Fix open-drain initialization
	ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
	ALSA: ctl: allow TLV read operation for callback type of element in locked case
	gianfar: Fix TX timestamping with a stacked DSA driver
	pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
	printk: fix exclusive_console replaying
	drm/mipi_dbi: Fix off-by-one bugs in mipi_dbi_blank()
	drm/msm/adreno: fix zap vs no-zap handling
	pxa168fb: Fix the function used to release some memory in an error handling path
	media: ov5640: Fix check for PLL1 exceeding max allowed rate
	media: i2c: mt9v032: fix enum mbus codes and frame sizes
	media: sun4i-csi: Deal with DRAM offset
	media: sun4i-csi: Fix data sampling polarity handling
	media: sun4i-csi: Fix [HV]sync polarity handling
	clk: at91: sam9x60: fix programmable clock prescaler
	powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
	clk: meson: meson8b: make the CCF use the glitch-free mali mux
	gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap()
	iommu/vt-d: Fix off-by-one in PASID allocation
	x86/fpu: Deactivate FPU state after failure during state load
	char/random: silence a lockdep splat with printk()
	media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run()
	kernel/module: Fix memleak in module_add_modinfo_attrs()
	IB/core: Let IB core distribute cache update events
	pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
	efi/x86: Map the entire EFI vendor string before copying it
	MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init()
	sparc: Add .exit.data section.
	net: ethernet: ixp4xx: Standard module init
	raid6/test: fix a compilation error
	uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()
	drm/amdgpu/sriov: workaround on rev_id for Navi12 under sriov
	spi: fsl-lpspi: fix only one cs-gpio working
	drm/nouveau/nouveau: fix incorrect sizeof on args.src an args.dst
	usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe()
	usb: dwc2: Fix IN FIFO allocation
	clocksource/drivers/bcm2835_timer: Fix memory leak of timer
	drm/amd/display: Clear state after exiting fixed active VRR state
	kselftest: Minimise dependency of get_size on C library interfaces
	jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
	ext4: fix deadlock allocating bio_post_read_ctx from mempool
	clk: ti: dra7: fix parent for gmac_clkctrl
	x86/sysfb: Fix check for bad VRAM size
	pwm: omap-dmtimer: Simplify error handling
	udf: Allow writing to 'Rewritable' partitions
	dmaengine: fsl-qdma: fix duplicated argument to &&
	wan/hdlc_x25: fix skb handling
	s390/pci: Fix possible deadlock in recover_store()
	powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()
	tracing: Fix tracing_stat return values in error handling paths
	tracing: Fix very unlikely race of registering two stat tracers
	ARM: 8952/1: Disable kmemleak on XIP kernels
	ext4, jbd2: ensure panic when aborting with zero errno
	ath10k: Correct the DMA direction for management tx buffers
	rtw88: fix rate mask for 1SS chip
	brcmfmac: sdio: Fix OOB interrupt initialization on brcm43362
	selftests: settings: tests can be in subsubdirs
	rtc: i2c/spi: Avoid inclusion of REGMAP support when not needed
	drm/amd/display: Retrain dongles when SINK_COUNT becomes non-zero
	tracing: Simplify assignment parsing for hist triggers
	nbd: add a flush_workqueue in nbd_start_device
	KVM: s390: ENOTSUPP -> EOPNOTSUPP fixups
	Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
	drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store
	block, bfq: do not plug I/O for bfq_queues with no proc refs
	kconfig: fix broken dependency in randconfig-generated .config
	clk: qcom: Don't overwrite 'cfg' in clk_rcg2_dfs_populate_freq()
	clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
	drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
	bpf, sockhash: Synchronize_rcu before free'ing map
	drm/amdgpu: remove 4 set but not used variable in amdgpu_atombios_get_connector_info_from_object_table
	ath10k: correct the tlv len of ath10k_wmi_tlv_op_gen_config_pno_start
	drm/amdgpu: Ensure ret is always initialized when using SOC15_WAIT_ON_RREG
	drm/panel: simple: Add Logic PD Type 28 display support
	arm64: dts: rockchip: Fix NanoPC-T4 cooling maps
	modules: lockdep: Suppress suspicious RCU usage warning
	ASoC: intel: sof_rt5682: Add quirk for number of HDMI DAI's
	ASoC: intel: sof_rt5682: Add support for tgl-max98357a-rt5682
	regulator: rk808: Lower log level on optional GPIOs being not available
	net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
	NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu().
	arm64: dts: allwinner: H6: Add PMU mode
	arm64: dts: allwinner: H5: Add PMU node
	arm: dts: allwinner: H3: Add PMU node
	opp: Free static OPPs on errors while adding them
	selinux: ensure we cleanup the internal AVC counters on error in avc_insert()
	arm64: dts: qcom: msm8996: Disable USB2 PHY suspend by core
	padata: validate cpumask without removed CPU during offline
	clk: imx: Add correct failure handling for clk based helpers
	ARM: exynos_defconfig: Bring back explicitly wanted options
	ARM: dts: imx6: rdu2: Disable WP for USDHC2 and USDHC3
	ARM: dts: imx6: rdu2: Limit USBH1 to Full Speed
	bus: ti-sysc: Implement quirk handling for CLKDM_NOAUTO
	PCI: iproc: Apply quirk_paxc_bridge() for module as well as built-in
	media: cx23885: Add support for AVerMedia CE310B
	PCI: Add generic quirk for increasing D3hot delay
	PCI: Increase D3 delay for AMD Ryzen5/7 XHCI controllers
	Revert "nfp: abm: fix memory leak in nfp_abm_u32_knode_replace"
	gpu/drm: ingenic: Avoid null pointer deference in plane atomic update
	selftests/net: make so_txtime more robust to timer variance
	media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros
	reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
	samples/bpf: Set -fno-stack-protector when building BPF programs
	r8169: check that Realtek PHY driver module is loaded
	fore200e: Fix incorrect checks of NULL pointer dereference
	netfilter: nft_tunnel: add the missing ERSPAN_VERSION nla_policy
	ALSA: usx2y: Adjust indentation in snd_usX2Y_hwdep_dsp_status
	PCI: Add nr_devfns parameter to pci_add_dma_alias()
	PCI: Add DMA alias quirk for PLX PEX NTB
	b43legacy: Fix -Wcast-function-type
	ipw2x00: Fix -Wcast-function-type
	iwlegacy: Fix -Wcast-function-type
	rtlwifi: rtl_pci: Fix -Wcast-function-type
	orinoco: avoid assertion in case of NULL pointer
	drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV
	clk: qcom: smd: Add missing bimc clock
	ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
	nfsd: Clone should commit src file metadata too
	scsi: ufs: Complete pending requests in host reset and restore path
	scsi: aic7xxx: Adjust indentation in ahc_find_syncrate
	crypto: inside-secure - add unspecified HAS_IOMEM dependency
	drm/mediatek: handle events when enabling/disabling crtc
	clk: renesas: rcar-gen3: Allow changing the RPC[D2] clocks
	ARM: dts: r8a7779: Add device node for ARM global timer
	selinux: ensure we cleanup the internal AVC counters on error in avc_update()
	scsi: lpfc: Fix: Rework setting of fdmi symbolic node name registration
	arm64: dts: qcom: db845c: Enable ath10k 8bit host-cap quirk
	iommu/amd: Check feature support bit before accessing MSI capability registers
	iommu/amd: Only support x2APIC with IVHD type 11h/40h
	iommu/iova: Silence warnings under memory pressure
	clk: actually call the clock init before any other callback of the clock
	dmaengine: Store module owner in dma_device struct
	dmaengine: imx-sdma: Fix memory leak
	bpf: Print error message for bpftool cgroup show
	net: phy: realtek: add logging for the RGMII TX delay configuration
	crypto: chtls - Fixed memory leak
	x86/vdso: Provide missing include file
	PM / devfreq: exynos-ppmu: Fix excessive stack usage
	PM / devfreq: rk3399_dmc: Add COMPILE_TEST and HAVE_ARM_SMCCC dependency
	drm/fbdev: Fallback to non tiled mode if all tiles not present
	pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs
	reset: uniphier: Add SCSSI reset control for each channel
	ASoC: soc-topology: fix endianness issues
	fbdev: fix numbering of fbcon options
	RDMA/rxe: Fix error type of mmap_offset
	clk: sunxi-ng: add mux and pll notifiers for A64 CPU clock
	ALSA: sh: Fix unused variable warnings
	clk: Use parent node pointer during registration if necessary
	clk: uniphier: Add SCSSI clock gate for each channel
	ALSA: hda/realtek - Apply mic mute LED quirk for Dell E7xx laptops, too
	ALSA: sh: Fix compile warning wrt const
	net: phy: fixed_phy: fix use-after-free when checking link GPIO
	tools lib api fs: Fix gcc9 stringop-truncation compilation error
	vfio/spapr/nvlink2: Skip unpinning pages on error exit
	ASoC: Intel: sof_rt5682: Ignore the speaker amp when there isn't one.
	ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch
	iommu/vt-d: Match CPU and IOMMU paging mode
	iommu/vt-d: Avoid sending invalid page response
	drm/amdkfd: Fix permissions of hang_hws
	mlx5: work around high stack usage with gcc
	RDMA/hns: Avoid printing address of mtt page
	drm: remove the newline for CRC source name.
	usb: dwc3: use proper initializers for property entries
	ARM: dts: stm32: Add power-supply for DSI panel on stm32f469-disco
	usbip: Fix unsafe unaligned pointer usage
	udf: Fix free space reporting for metadata and virtual partitions
	drm/mediatek: Add gamma property according to hardware capability
	staging: rtl8188: avoid excessive stack usage
	IB/hfi1: Add software counter for ctxt0 seq drop
	IB/hfi1: Add RcvShortLengthErrCnt to hfi1stats
	soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
	efi/x86: Don't panic or BUG() on non-critical error conditions
	rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
	Input: edt-ft5x06 - work around first register access error
	bnxt: Detach page from page pool before sending up the stack
	x86/nmi: Remove irq_work from the long duration NMI handler
	wan: ixp4xx_hss: fix compile-testing on 64-bit
	clocksource: davinci: only enable clockevents once tim34 is initialized
	arm64: dts: rockchip: fix dwmmc clock name for px30
	arm64: dts: rockchip: add reg property to brcmf sub-nodes
	ARM: dts: rockchip: add reg property to brcmf sub node for rk3188-bqedison2qc
	ALSA: usb-audio: Add boot quirk for MOTU M Series
	ASoC: atmel: fix build error with CONFIG_SND_ATMEL_SOC_DMA=m
	raid6/test: fix a compilation warning
	tty: synclinkmp: Adjust indentation in several functions
	tty: synclink_gt: Adjust indentation in several functions
	misc: xilinx_sdfec: fix xsdfec_poll()'s return type
	visorbus: fix uninitialized variable access
	driver core: platform: Prevent resouce overflow from causing infinite loops
	driver core: Print device when resources present in really_probe()
	ASoC: SOF: Intel: hda-dai: fix compilation warning in pcm_prepare
	bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map
	vme: bridges: reduce stack usage
	drm/nouveau/secboot/gm20b: initialize pointer in gm20b_secboot_new()
	drm/nouveau/gr/gk20a,gm200-: add terminators to method lists read from fw
	drm/nouveau: Fix copy-paste error in nouveau_fence_wait_uevent_handler
	drm/nouveau/drm/ttm: Remove set but not used variable 'mem'
	drm/nouveau/fault/gv100-: fix memory leak on module unload
	dm thin: don't allow changing data device during thin-pool reload
	gpiolib: Set lockdep class for hierarchical irq domains
	drm/vmwgfx: prevent memory leak in vmw_cmdbuf_res_add
	perf/imx_ddr: Fix cpu hotplug state cleanup
	usb: musb: omap2430: Get rid of musb .set_vbus for omap2430 glue
	kbuild: remove *.tmp file when filechk fails
	iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
	ALSA: usb-audio: unlock on error in probe
	f2fs: set I_LINKABLE early to avoid wrong access by vfs
	f2fs: free sysfs kobject
	scsi: ufs: pass device information to apply_dev_quirks
	scsi: ufs-mediatek: add apply_dev_quirks variant operation
	scsi: iscsi: Don't destroy session if there are outstanding connections
	crypto: essiv - fix AEAD capitalization and preposition use in help text
	ALSA: usb-audio: add implicit fb quirk for MOTU M Series
	RDMA/mlx5: Don't fake udata for kernel path
	arm64: lse: fix LSE atomics with LLVM's integrated assembler
	arm64: fix alternatives with LLVM's integrated assembler
	drm/amd/display: fixup DML dependencies
	EDAC/sifive: Fix return value check in ecc_register()
	KVM: PPC: Remove set but not used variable 'ra', 'rs', 'rt'
	arm64: dts: ti: k3-j721e-main: Add missing power-domains for smmu
	sched/core: Fix size of rq::uclamp initialization
	sched/topology: Assert non-NUMA topology masks don't (partially) overlap
	perf/x86/amd: Constrain Large Increment per Cycle events
	watchdog/softlockup: Enforce that timestamp is valid on boot
	debugobjects: Fix various data races
	ASoC: SOF: Intel: hda: Fix SKL dai count
	regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage
	f2fs: fix memleak of kobject
	x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
	pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
	cmd64x: potential buffer overflow in cmd64x_program_timings()
	ide: serverworks: potential overflow in svwks_set_pio_mode()
	pwm: Remove set but not set variable 'pwm'
	btrfs: fix possible NULL-pointer dereference in integrity checks
	btrfs: safely advance counter when looking up bio csums
	btrfs: device stats, log when stats are zeroed
	module: avoid setting info->name early in case we can fall back to info->mod->name
	remoteproc: Initialize rproc_class before use
	regulator: core: Fix exported symbols to the exported GPL version
	irqchip/mbigen: Set driver .suppress_bind_attrs to avoid remove problems
	ALSA: hda/hdmi - add retry logic to parse_intel_hdmi()
	spi: spi-fsl-qspi: Ensure width is respected in spi-mem operations
	kbuild: use -S instead of -E for precise cc-option test in Kconfig
	objtool: Fix ARCH=x86_64 build error
	x86/decoder: Add TEST opcode to Group3-2
	s390: adjust -mpacked-stack support check for clang 10
	s390/ftrace: generate traced function stack frame
	driver core: platform: fix u32 greater or equal to zero comparison
	bpf, btf: Always output invariant hit in pahole DWARF to BTF transform
	ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
	sunrpc: Fix potential leaks in sunrpc_cache_unhash()
	drm/nouveau/mmu: fix comptag memory leak
	powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
	media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value
	btrfs: separate definition of assertion failure handlers
	btrfs: Fix split-brain handling when changing FSID to metadata uuid
	bcache: cached_dev_free needs to put the sb page
	bcache: rework error unwinding in register_bcache
	bcache: fix use-after-free in register_bcache()
	iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
	alarmtimer: Make alarmtimer platform device child of RTC device
	selftests: bpf: Reset global state between reuseport test runs
	jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
	jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
	powerpc/pseries/lparcfg: Fix display of Maximum Memory
	selftests/eeh: Bump EEH wait time to 60s
	ARM: 8951/1: Fix Kexec compilation issue.
	ALSA: usb-audio: add quirks for Line6 Helix devices fw>=2.82
	hostap: Adjust indentation in prism2_hostapd_add_sta
	rtw88: fix potential NULL skb access in TX ISR
	iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop
	cifs: fix unitialized variable poential problem with network I/O cache lock patch
	cifs: Fix mount options set in automount
	cifs: fix NULL dereference in match_prepath
	bpf: map_seq_next should always increase position index
	powerpc/mm: Don't log user reads to 0xffffffff
	ceph: check availability of mds cluster on mount after wait timeout
	rbd: work around -Wuninitialized warning
	drm/amd/display: do not allocate display_mode_lib unnecessarily
	irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
	drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
	char: hpet: Fix out-of-bounds read bug
	ftrace: fpid_next() should increase position index
	trigger_next should increase position index
	radeon: insert 10ms sleep in dce5_crtc_load_lut
	powerpc: Do not consider weak unresolved symbol relocations as bad
	btrfs: do not do delalloc reservation under page lock
	ocfs2: make local header paths relative to C files
	ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
	lib/scatterlist.c: adjust indentation in __sg_alloc_table
	reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
	bcache: fix memory corruption in bch_cache_accounting_clear()
	bcache: explicity type cast in bset_bkey_last()
	bcache: fix incorrect data type usage in btree_flush_write()
	irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
	nvmet: Pass lockdep expression to RCU lists
	nvme-pci: remove nvmeq->tags
	iwlwifi: mvm: Fix thermal zone registration
	iwlwifi: mvm: Check the sta is not NULL in iwl_mvm_cfg_he_sta()
	asm-generic/tlb: add missing CONFIG symbol
	microblaze: Prevent the overflow of the start
	brd: check and limit max_part par
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
	NFS: Fix memory leaks
	help_next should increase position index
	i40e: Relax i40e_xsk_wakeup's return value when PF is busy
	cifs: log warning message (once) if out of disk space
	virtio_balloon: prevent pfn array overflow
	fuse: don't overflow LLONG_MAX with end offset
	mlxsw: spectrum_dpipe: Add missing error path
	s390/pci: Recover handle in clp_set_pci_fn()
	drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
	bcache: properly initialize 'path' and 'err' in register_bcache()
	rtc: Kconfig: select REGMAP_I2C when necessary
	Linux 5.4.22

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaeb3945493ecc81a0ae90ef87b19ceb2caf48164
2020-02-24 09:16:10 +01:00
Paolo Valente
39a7082195 block, bfq: do not plug I/O for bfq_queues with no proc refs
[ Upstream commit f718b093277df582fbf8775548a4f163e664d282 ]

Commit 478de3380c ("block, bfq: deschedule empty bfq_queues not
referred by any process") fixed commit 3726112ec7 ("block, bfq:
re-schedule empty queues if they deserve I/O plugging") by
descheduling an empty bfq_queue when it remains with not process
reference. Yet, this still left a case uncovered: an empty bfq_queue
with not process reference that remains in service. This happens for
an in-service sync bfq_queue that is deemed to deserve I/O-dispatch
plugging when it remains empty. Yet no new requests will arrive for
such a bfq_queue if no process sends requests to it any longer. Even
worse, the bfq_queue may happen to be prematurely freed while still in
service (because there may remain no reference to it any longer).

This commit solves this problem by preventing I/O dispatch from being
plugged for the in-service bfq_queue, if the latter has no process
reference (the bfq_queue is then prevented from remaining in service).

Fixes: 3726112ec7 ("block, bfq: re-schedule empty queues if they deserve I/O plugging")
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Patrick Dung <patdung100@gmail.com>
Tested-by: Patrick Dung <patdung100@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:36:31 +01:00
Eric Biggers
e124323186 ANDROID: ufs, block: fix crypto power management and move into block layer
The call to pm_runtime_get_sync() in ufshcd_program_key() can deadlock
because it waits for the UFS controller to be resumed, but it can itself
be reached while resuming the UFS controller via:

- ufshcd_runtime_resume()
  - ufshcd_resume()
    - ufshcd_reset_and_restore()
      - ufshcd_host_reset_and_restore()
        - ufshcd_hba_enable()
          - ufshcd_hba_execute_hce()
            - ufshcd_hba_start()
              - ufshcd_crypto_enable()
                - keyslot_manager_reprogram_all_keys()
                  - ufshcd_crypto_keyslot_program()
                    - ufshcd_program_key()

But pm_runtime_get_sync() *is* needed when evicting a key.  Also, on
pre-4.20 kernels it's needed when programming a keyslot for a bio since
the block layer used to resume the device in a different place.

Thus, it's hard for drivers to know what to do in .keyslot_program() and
.keyslot_evict().  In old kernels it may even be impossible unless we
were to pass more information down from the keyslot_manager.

There's also another possible deadlock: keyslot programming and eviction
take ksm->lock for write and then resume the device, which may result in
ksm->lock being taken again via the above call stack.  To fix this, we
should resume the device before taking ksm->lock.

Fix these problems by moving to a better design where the block layer
(namely, the keyslot manager) handles runtime power management instead
of drivers.  This is analogous to the block layer's existing runtime
power management support (blk-pm), which handles resuming devices when
bios are submitted to them so that drivers don't need to handle it.

Test: Tested on coral with:
        echo 5 > /sys/bus/platform/devices/1d84000.ufshc/rpm_lvl
        sleep 30
        touch /data && sync  # hangs before this fix
  Also verified via kvm-xfstests that blk-crypto-fallback continues
  to work both with and without CONFIG_PM=y.

Bug: 137270441
Bug: 149368295
Change-Id: I6bc9fb81854afe7edf490d71796ee68a61f7cbc8
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-02-21 16:00:25 +00:00
Ram Muthiah
1f15efbf93 ANDROID: modularize BLK_MQ_VIRTIO
Bug: 139431025
Test: Treehugger
Change-Id: Iba414b03a9ee1544036c4153bf52e8c8edbb9c17
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
(cherry picked from commit f1bdff93debe56dbafc0684fcfbfcaddd599be7b)
2020-01-31 16:27:30 -08:00
Greg Kroah-Hartman
33c8a1b2d0 This is the 5.4.15 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4tVVcACgkQONu9yGCS
 aT6IZQ/+J/hKVxK9S0E4nfHy8IC87wRnjmIBsjnZ8jx9+KAhYyHsL5iUL5U0YQPj
 O1ZYO2Yly8DzzU1RLwkMgZ+eGYBnNuSGtZN/v9IQrQYrV77F7fNM0S59f/ucQJLh
 lAMbaAbttR05bb48YieZm1HksoRsHmFEg0LsUbQqjm74CWJ+/JA+bZcdnTi9iiJm
 HELavBOM5NoO/g8Iuh0Xn5Y3W1mOTv3lG7Vn51TynUtJjlyJaaO9cVxDJzDBLabO
 SKYqH5X2yCBmKw3rH6F4KTDXAiM+v+EzvDwM12aEvG0TkkPEwNcFrkA4hgDFXUWi
 QEe24R/UP4J2W/jAH46VaeEELo0cNLzt0e9sVi6BsxtkTaf/KknxE93PSOyY40TF
 CM/nMJAlVv5KYmhQYPa9ZTEoUBNGcAVjsI2Pi7t86oLsFtaN6Sb1BvJTdHPwLA5Z
 OIi64ZBLy3jWHC4We3ajXI+PD6qlbzyTrjAE6Se5Zfmy05m936XNAfMup4mFMoBv
 MDEAG0f5XyyAXwARugq46xTlfjI1QO6XOnufxzFCaFETbtr+yYvmdmzWE1I+qyst
 Xugd94gchuWVH62YPbf+r9H2FpoHZjAroQHTV3hJ+pt/tJqYCcvISG2uv2pJePvm
 oRt/DO9CA2N5ls0z7WC55Kk746E5NSgsLmF4nktphnshqZR5VFs=
 =iz+j
 -----END PGP SIGNATURE-----

Merge 5.4.15 into android-5.4

Changes in 5.4.15
	drm/i915: Fix pid leak with banned clients
	libbpf: Fix compatibility for kernels without need_wakeup
	libbpf: Fix memory leak/double free issue
	libbpf: Fix potential overflow issue
	libbpf: Fix another potential overflow issue in bpf_prog_linfo
	libbpf: Make btf__resolve_size logic always check size error condition
	bpf: Force .BTF section start to zero when dumping from vmlinux
	samples: bpf: update map definition to new syntax BTF-defined map
	samples/bpf: Fix broken xdp_rxq_info due to map order assumptions
	ARM: dts: logicpd-torpedo-37xx-devkit-28: Reference new DRM panel
	ARM: OMAP2+: Add missing put_device() call in omapdss_init_of()
	xfs: Sanity check flags of Q_XQUOTARM call
	i2c: stm32f7: rework slave_id allocation
	i2c: i2c-stm32f7: fix 10-bits check in slave free id search loop
	mfd: intel-lpss: Add default I2C device properties for Gemini Lake
	SUNRPC: Fix svcauth_gss_proxy_init()
	SUNRPC: Fix backchannel latency metrics
	powerpc/security: Fix debugfs data leak on 32-bit
	powerpc/pseries: Enable support for ibm,drc-info property
	powerpc/kasan: Fix boot failure with RELOCATABLE && FSL_BOOKE
	powerpc/archrandom: fix arch_get_random_seed_int()
	tipc: reduce sensitive to retransmit failures
	tipc: update mon's self addr when node addr generated
	tipc: fix potential memory leak in __tipc_sendmsg()
	tipc: fix wrong socket reference counter after tipc_sk_timeout() returns
	tipc: fix wrong timeout input for tipc_wait_for_cond()
	net/mlx5e: Fix free peer_flow when refcount is 0
	phy: lantiq: vrx200-pcie: fix error return code in ltq_vrx200_pcie_phy_power_on()
	net: phy: broadcom: Fix RGMII delays configuration for BCM54210E
	phy: ti: gmii-sel: fix mac tx internal delay for rgmii-rxid
	mt76: mt76u: fix endpoint definition order
	mt7601u: fix bbp version check in mt7601u_wait_bbp_ready
	ice: fix stack leakage
	s390/pkey: fix memory leak within _copy_apqns_from_user()
	nfsd: depend on CRYPTO_MD5 for legacy client tracking
	crypto: amcc - restore CRYPTO_AES dependency
	crypto: sun4i-ss - fix big endian issues
	perf map: No need to adjust the long name of modules
	leds: tlc591xx: update the maximum brightness
	soc/tegra: pmc: Fix crashes for hierarchical interrupts
	soc: qcom: llcc: Name regmaps to avoid collisions
	soc: renesas: Add missing check for non-zero product register address
	soc: aspeed: Fix snoop_file_poll()'s return type
	watchdog: sprd: Fix the incorrect pointer getting from driver data
	ipmi: Fix memory leak in __ipmi_bmc_register
	sched/core: Further clarify sched_class::set_next_task()
	gpiolib: No need to call gpiochip_remove_pin_ranges() twice
	rtw88: fix beaconing mode rsvd_page memory violation issue
	rtw88: fix error handling when setup efuse info
	drm/panfrost: Add missing check for pfdev->regulator
	drm: panel-lvds: Potential Oops in probe error handling
	drm/amdgpu: remove excess function parameter description
	hwrng: omap3-rom - Fix missing clock by probing with device tree
	dpaa2-eth: Fix minor bug in ethtool stats reporting
	drm/rockchip: Round up _before_ giving to the clock framework
	software node: Get reference to parent swnode in get_parent op
	PCI: mobiveil: Fix csr_read()/write() build issue
	drm: rcar_lvds: Fix color mismatches on R-Car H2 ES2.0 and later
	net: netsec: Correct dma sync for XDP_TX frames
	ACPI: platform: Unregister stale platform devices
	pwm: sun4i: Fix incorrect calculation of duty_cycle/period
	regulator: bd70528: Add MODULE_ALIAS to allow module auto loading
	drm/amdgpu/vi: silence an uninitialized variable warning
	power: supply: bd70528: Add MODULE_ALIAS to allow module auto loading
	firmware: imx: Remove call to devm_of_platform_populate
	libbpf: Don't use kernel-side u32 type in xsk.c
	rcu: Fix uninitialized variable in nocb_gp_wait()
	dpaa_eth: perform DMA unmapping before read
	dpaa_eth: avoid timestamp read on error paths
	scsi: ufs: delete redundant function ufshcd_def_desc_sizes()
	net: openvswitch: don't unlock mutex when changing the user_features fails
	hv_netvsc: flag software created hash value
	rt2800: remove errornous duplicate condition
	net: neigh: use long type to store jiffies delta
	net: axienet: Fix error return code in axienet_probe()
	selftests: gen_kselftest_tar.sh: Do not clobber kselftest/
	rtc: bd70528: fix module alias to autoload module
	packet: fix data-race in fanout_flow_is_huge()
	i2c: stm32f7: report dma error during probe
	kselftests: cgroup: Avoid the reuse of fd after it is deallocated
	firmware: arm_scmi: Fix doorbell ring logic for !CONFIG_64BIT
	mmc: sdio: fix wl1251 vendor id
	mmc: core: fix wl1251 sdio quirks
	tee: optee: Fix dynamic shm pool allocations
	tee: optee: fix device enumeration error handling
	workqueue: Add RCU annotation for pwq list walk
	SUNRPC: Fix another issue with MIC buffer space
	sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util()
	mt76: mt76u: rely on usb_interface instead of usb_dev
	dma-direct: don't check swiotlb=force in dma_direct_map_resource
	afs: Remove set but not used variables 'before', 'after'
	dmaengine: ti: edma: fix missed failure handling
	drm/radeon: fix bad DMA from INTERRUPT_CNTL2
	xdp: Fix cleanup on map free for devmap_hash map type
	platform/chrome: wilco_ec: fix use after free issue
	block: fix memleak of bio integrity data
	s390/qeth: fix dangling IO buffers after halt/clear
	net-sysfs: Call dev_hold always in netdev_queue_add_kobject
	gpio: aspeed: avoid return type warning
	phy/rockchip: inno-hdmi: round clock rate down to closest 1000 Hz
	optee: Fix multi page dynamic shm pool alloc
	Linux 5.4.15

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I28b2a19657d40804406dc0e7c266296ce8768eb7
2020-01-26 11:14:46 +01:00
Justin Tee
ccbc5d03c2 block: fix memleak of bio integrity data
[ Upstream commit ece841abbed2da71fa10710c687c9ce9efb6bf69 ]

7c20f11680 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.

So memory leak of bio integrity data is caused by commit 7c20f11680.

Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

Fixes: 7c20f11680 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by Justin Tee <justin.tee@broadcom.com>

Add commit log, and simplify/fix the original patch wroten by Justin.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26 10:01:09 +01:00
Eric Biggers
5515324258 ANDROID: dm: add dm-default-key target for metadata encryption
Add a device-mapper target "dm-default-key" which assigns an encryption
key to bios that aren't for the contents of an encrypted file.

This ensures that all blocks on-disk will be encrypted with some key,
without the performance hit of file contents being encrypted twice when
fscrypt (File-Based Encryption) is used.

It is only appropriate to use dm-default-key when key configuration is
tightly controlled, like it is in Android, such that all fscrypt keys
are at least as hard to compromise as the default key.

Compared to the original version of dm-default-key, this has been
modified to use the new vendor-independent inline encryption framework
(which works even when no inline encryption hardware is present), the
table syntax has been changed to match dm-crypt, and support for
specifying Adiantum encryption has been added.  These changes also mean
that dm-default-key now always explicitly specifies the DUN (the IV).

Also, to handle f2fs moving blocks of encrypted files around without the
key, and to handle ext4 and f2fs filesystems mounted without
'-o inlinecrypt', the mapping logic is no longer "set a key on the bio
if it doesn't have one already", but rather "set a key on the bio unless
the bio has the bi_skip_dm_default_key flag set".  Filesystems set this
flag on *all* bios for encrypted file contents, regardless of whether
they are encrypting/decrypting the file using inline encryption or the
traditional filesystem-layer encryption, or moving the raw data.

For the bi_skip_dm_default_key flag, a new field in struct bio is used
rather than a bit in bi_opf so that fscrypt_set_bio_crypt_ctx() can set
the flag, minimizing the changes needed to filesystems.  (bi_opf is
usually overwritten after fscrypt_set_bio_crypt_ctx() is called.)

Bug: 137270441
Bug: 147814592
Change-Id: I69c9cd1e968ccf990e4ad96e5115b662237f5095
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-24 10:52:54 -08:00
Eric Biggers
695cfeb088 ANDROID: dm: add support for passing through inline crypto support
Update the device-mapper core to support exposing the inline crypto
support of the underlying device(s) through the device-mapper device.

This works by creating a "passthrough keyslot manager" for the dm
device, which declares support for the set of (crypto_mode,
data_unit_size) combos which all the underlying devices support.  When a
supported combo is used, the bio cloning code handles cloning the crypto
context to the bios for all the underlying devices.  When an unsupported
combo is used, the blk-crypto fallback is used as usual.

Crypto support on each underlying device is ignored unless the
corresponding dm target opts into exposing it.  This is needed because
for inline crypto to semantically operate on the original bio, the data
must not be transformed by the dm target.  Thus, targets like dm-linear
can expose crypto support of the underlying device, but targets like
dm-crypt can't.  (dm-crypt could use inline crypto itself, though.)

When a key is evicted from the dm device, it is evicted from all
underlying devices.

Bug: 137270441
Bug: 147814592
Change-Id: If28b574f2e28268db5eb9f325d4cf8f96cb63e3f
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-24 10:52:54 -08:00
Satya Tangirala
1fc06baadc ANDROID: block: Introduce passthrough keyslot manager
The regular keyslot manager is designed for devices that have a small
number of keyslots that need to be programmed with keys ahead of time,
and bios that are sent to the device need to be tagged with a keyslot
index.

Some inline encryption hardware may not have any limitations on the
number of keyslot, and may instead allow each bio to be tagged with
a raw key, data unit number, etc. rather than a pre-programmed keyslot's
index. These devices don't need any sort of keyslot management, and it's
better for these devices not to have to allocate a regular keyslot
manager with some fixed number of keyslots. These devices can instead
set up a passthrough keyslot manager in their request queue, which
require less resources than regular keyslot managers, as they simply
do no-ops when trying to program keys into slots.

Separately, the device mapper may map over devices that have inline
encryption hardware, and it wants to pass the key along to the
underlying hardware. While the DM layer can expose inline encryption
capabilities by setting up a regular keyslot manager with some fixed
number of keyslots in the dm device's request queue, this only wastes
memory since the keys programmed into the dm device's request queue
will never be used. Instead, it's better to set up a passthrough
keyslot manager for dm devices.

Bug: 137270441
Bug: 147814592
Change-Id: I6d91e83e86a73b0d6066873c8a9117cf2c089234
Signed-off-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-24 10:52:54 -08:00
Eric Biggers
9f8ee7b47c ANDROID: block: export symbols needed for modules to use inline crypto
Export the blk-crypto symbols needed for modules to use inline crypto.

These would have already been exported, except that so far they've only
been used by fs/crypto/, which is no longer modular.

Bug: 137270441
Bug: 147814592
Change-Id: I64bf98aecabe891c188b30dd50124aacb1e008ca
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-24 10:52:54 -08:00
Eric Biggers
8aaaa67f9b ANDROID: block: fix some inline crypto bugs
While we're waiting for v7 of the inline crypto patchset, fix some bugs
that made it into the v6 patchset, including one that caused bios with
an encryption context to never be merged, and one that could cause
non-contiguous pages to incorrectly added to a bio.

Bug: 137270441
Change-Id: I3911fcd6c76b5c9063b86d6af6267ad990a46718
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-24 10:52:54 -08:00
Greg Kroah-Hartman
ea962facf5 This is the 5.4.14 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4pSdUACgkQONu9yGCS
 aT7wnRAAqw7JuC+cQMI29GiTfX/+V6BHFUGp0b7EoFWg1VVqiFqEHOf4QETS2Qy1
 3ZAMIBCICMgJD03WUK95tju4byNhpAVc5HDy3ir4E0ZGE6VprgbkzmVWsJhdVd0E
 y0whXgRj2mM0eXwCLcLz2qcruE/AxmL84rlnwZ5BwgxbOPq/xRzQQV8R3XUu2bos
 1felFUzOLhfJKtZ5ig9ToJGCorc7jOU+cXrvUSvntOFGFqsEdmIfcQMoqE2cXExC
 WQIk/JDNGnyHlX7I55P0/OBuSv4SV5JzGZjGkkS/Orqg5/KtaOMh0dAcvp25L3jY
 b2bNDDnKvDEd2eb3GQNb4X7pqy8SFD5WLQH2otJjpZl6nHqOG2xCZCnrVtvf7toC
 kJ0ocNIeGU6OsFzAwXATh3hk6ZFlJYkQarHjvgpETkx+WBcw9yw7DFW9Pa9khkMV
 OWKqfGrqVyV1bTXTfZPHOHm+eBVmqTLPeoK4XeholrnBrPa50pBjexv+uN+UsQnT
 QQxTkDpdfFv4KIkH/qgwDh/gIJ/xtjsI9JTK+70aFbSq1HrXNeoZTYof8iLOggY8
 wh/aWQGf0PVC3ffgqxd7TF6CxiVn4vaP3/FfcBQVMuTnYK03BScdAEOqBCRLvLK6
 509VzsCc1N7qa3vwRTTWxr0u1aCDDgDx5iqmTXW46sYxzjfwnY4=
 =qNcT
 -----END PGP SIGNATURE-----

Merge 5.4.14 into android-5.4

Changes in 5.4.14
	ARM: dts: meson8: fix the size of the PMU registers
	clk: qcom: gcc-sdm845: Add missing flag to votable GDSCs
	soc: amlogic: meson-ee-pwrc: propagate PD provider registration errors
	soc: amlogic: meson-ee-pwrc: propagate errors from pm_genpd_init()
	dt-bindings: reset: meson8b: fix duplicate reset IDs
	ARM: dts: imx6q-dhcom: fix rtc compatible
	arm64: dts: ls1028a: fix endian setting for dcfg
	arm64: dts: imx8mm: Change SDMA1 ahb clock for imx8mm
	bus: ti-sysc: Fix iterating over clocks
	clk: Don't try to enable critical clocks if prepare failed
	Revert "gpio: thunderx: Switch to GPIOLIB_IRQCHIP"
	arm64: dts: imx8mq-librem5-devkit: use correct interrupt for the magnetometer
	ASoC: msm8916-wcd-digital: Reset RX interpolation path after use
	ASoC: stm32: sai: fix possible circular locking
	ASoC: stm32: dfsdm: fix 16 bits record
	ASoC: msm8916-wcd-analog: Fix selected events for MIC BIAS External1
	ASoC: msm8916-wcd-analog: Fix MIC BIAS Internal1
	ARM: OMAP2+: Fix ti_sysc_find_one_clockdomain to check for to_clk_hw_omap
	ARM: dts: imx7ulp: fix reg of cpu node
	ARM: dts: imx6q-dhcom: Fix SGTL5000 VDDIO regulator connection
	ASoC: Intel: bytcht_es8316: Fix Irbis NB41 netbook quirk
	ALSA: dice: fix fallback from protocol extension into limited functionality
	ALSA: seq: Fix racy access for queue timer in proc read
	ALSA: firewire-tascam: fix corruption due to spin lock without restoration in SoftIRQ context
	ALSA: usb-audio: fix sync-ep altsetting sanity check
	arm64: dts: allwinner: a64: olinuxino: Fix SDIO supply regulator
	arm64: dts: allwinner: a64: olinuxino: Fix eMMC supply regulator
	arm64: dts: agilex/stratix10: fix pmu interrupt numbers
	Fix built-in early-load Intel microcode alignment
	clk: sunxi-ng: r40: Allow setting parent rate for external clock outputs
	block: fix an integer overflow in logical block size
	fuse: fix fuse_send_readpages() in the syncronous read case
	io_uring: only allow submit from owning task
	cpuidle: teo: Fix intervals[] array indexing bug
	ARM: dts: am571x-idk: Fix gpios property to have the correct gpio number
	ARM: davinci: select CONFIG_RESET_CONTROLLER
	perf: Correctly handle failed perf_get_aux_event()
	iio: adc: ad7124: Fix DT channel configuration
	iio: imu: st_lsm6dsx: Fix selection of ST_LSM6DS3_ID
	iio: light: vcnl4000: Fix scale for vcnl4040
	iio: chemical: pms7003: fix unmet triggered buffer dependency
	iio: buffer: align the size of scan bytes to size of the largest element
	USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx
	USB: serial: option: Add support for Quectel RM500Q
	USB: serial: opticon: fix control-message timeouts
	USB: serial: option: add support for Quectel RM500Q in QDL mode
	USB: serial: suppress driver bind attributes
	USB: serial: ch341: handle unbound port at reset_resume
	USB: serial: io_edgeport: handle unbound ports on URB completion
	USB: serial: io_edgeport: add missing active-port sanity check
	USB: serial: keyspan: handle unbound ports
	USB: serial: quatech2: handle unbound ports
	staging: comedi: ni_routes: fix null dereference in ni_find_route_source()
	staging: comedi: ni_routes: allow partial routing information
	scsi: fnic: fix invalid stack access
	scsi: mptfusion: Fix double fetch bug in ioctl
	ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
	mtd: rawnand: gpmi: Fix suspend/resume problem
	mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume
	usb: core: hub: Improved device recognition on remote wakeup
	cpu/SMT: Fix x86 link error without CONFIG_SYSFS
	x86/resctrl: Fix an imbalance in domain_remove_cpu()
	x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
	locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
	perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events
	x86/efistub: Disable paging at mixed mode entry
	s390/zcrypt: Fix CCA cipher key gen with clear key value function
	scsi: storvsc: Correctly set number of hardware queues for IDE disk
	mtd: spi-nor: Fix selection of 4-byte addressing opcodes on Spansion
	drm/i915: Add missing include file <linux/math64.h>
	x86/resctrl: Fix potential memory leak
	efi/earlycon: Fix write-combine mapping on x86
	s390/setup: Fix secure ipl message
	clk: samsung: exynos5420: Keep top G3D clocks enabled
	perf hists: Fix variable name's inconsistency in hists__for_each() macro
	locking/lockdep: Fix buffer overrun problem in stack_trace[]
	perf report: Fix incorrectly added dimensions as switch perf data file
	mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
	mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
	mm: memcg/slab: fix percpu slab vmstats flushing
	mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
	mm, debug_pagealloc: don't rely on static keys too early
	btrfs: rework arguments of btrfs_unlink_subvol
	btrfs: fix invalid removal of root ref
	btrfs: do not delete mismatched root refs
	btrfs: relocation: fix reloc_root lifespan and access
	btrfs: fix memory leak in qgroup accounting
	btrfs: check rw_devices, not num_devices for balance
	Btrfs: always copy scrub arguments back to user space
	mm/memory_hotplug: don't free usage map when removing a re-added early section
	mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
	mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
	ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
	ARM: dts: imx6sx-sdb: Remove incorrect power supply assignment
	ARM: dts: imx6sl-evk: Remove incorrect power supply assignment
	ARM: dts: imx6sll-evk: Remove incorrect power supply assignment
	ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL
	ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support
	net: stmmac: 16KB buffer must be 16 byte aligned
	net: stmmac: Enable 16KB buffer size
	reset: Fix {of,devm}_reset_control_array_get kerneldoc return types
	tipc: fix potential hanging after b/rcast changing
	tipc: fix retrans failure due to wrong destination
	net: fix kernel-doc warning in <linux/netdevice.h>
	block: Fix the type of 'sts' in bsg_queue_rq()
	drm/amd/display: Reorder detect_edp_sink_caps before link settings read.
	bpf: Fix incorrect verifier simulation of ARSH under ALU32
	bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop
	bpf: Sockmap, ensure sock lock held during tear down
	bpf: Sockmap/tls, push write_space updates through ulp updates
	bpf: Sockmap, skmsg helper overestimates push, pull, and pop bounds
	bpf: Sockmap/tls, msg_push_data may leave end mark in place
	bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt buf
	bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining
	bpf: Sockmap/tls, fix pop data with SK_DROP return code
	i2c: tegra: Fix suspending in active runtime PM state
	i2c: tegra: Properly disable runtime PM on driver's probe error
	cfg80211: fix deadlocks in autodisconnect work
	cfg80211: fix memory leak in nl80211_probe_mesh_link
	cfg80211: fix memory leak in cfg80211_cqm_rssi_update
	cfg80211: fix page refcount issue in A-MSDU decap
	bpf/sockmap: Read psock ingress_msg before sk_receive_queue
	i2c: iop3xx: Fix memory leak in probe error path
	netfilter: fix a use-after-free in mtype_destroy()
	netfilter: arp_tables: init netns pointer in xt_tgdtor_param struct
	netfilter: nat: fix ICMP header corruption on ICMP errors
	netfilter: nft_tunnel: fix null-attribute check
	netfilter: nft_tunnel: ERSPAN_VERSION must not be null
	netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
	netfilter: nf_tables: store transaction list locally while requesting module
	netfilter: nf_tables: fix flowtable list del corruption
	NFC: pn533: fix bulk-message timeout
	net: bpf: Don't leak time wait and request sockets
	bpftool: Fix printing incorrect pointer in btf_dump_ptr
	batman-adv: Fix DAT candidate selection on little endian systems
	macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()
	hv_netvsc: Fix memory leak when removing rndis device
	net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
	net: dsa: tag_qca: fix doubled Tx statistics
	net: hns3: pad the short frame before sending to the hardware
	net: hns: fix soft lockup when there is not enough memory
	net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
	net/sched: act_ife: initalize ife->metalist earlier
	net: usb: lan78xx: limit size of local TSO packets
	net/wan/fsl_ucc_hdlc: fix out of bounds write on array utdm_info
	ptp: free ptp device pin descriptors properly
	r8152: add missing endpoint sanity check
	tcp: fix marked lost packets not being retransmitted
	bnxt_en: Fix NTUPLE firmware command failures.
	bnxt_en: Fix ipv6 RFS filter matching logic.
	bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
	net: ethernet: ave: Avoid lockdep warning
	net: systemport: Fixed queue mapping in internal ring map
	net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
	net: dsa: tag_gswip: fix typo in tagger name
	net: sched: act_ctinfo: fix memory leak
	net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
	i40e: prevent memory leak in i40e_setup_macvlans
	drm/amdgpu: allow direct upload save restore list for raven2
	sh_eth: check sh_eth_cpu_data::dual_port when dumping registers
	mlxsw: spectrum: Do not modify cloned SKBs during xmit
	mlxsw: spectrum: Wipe xstats.backlog of down ports
	mlxsw: spectrum_qdisc: Include MC TCs in Qdisc counters
	net: stmmac: selftests: Make it work in Synopsys AXS101 boards
	net: stmmac: selftests: Mark as fail when received VLAN ID != expected
	selftests: mlxsw: qos_mc_aware: Fix mausezahn invocation
	net: stmmac: selftests: Update status when disabling RSS
	net: stmmac: tc: Do not setup flower filtering if RSS is enabled
	devlink: Wait longer before warning about unset port type
	xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
	dt-bindings: Add missing 'properties' keyword enclosing 'snps,tso'
	tcp: refine rule to allow EPOLLOUT generation under mem pressure
	irqchip: Place CONFIG_SIFIVE_PLIC into the menu
	arm64: dts: qcom: msm8998: Disable coresight by default
	cw1200: Fix a signedness bug in cw1200_load_firmware()
	arm64: dts: meson: axg: fix audio fifo reg size
	arm64: dts: meson: g12: fix audio fifo reg size
	arm64: dts: meson-gxl-s905x-khadas-vim: fix gpio-keys-polled node
	arm64: dts: renesas: r8a77970: Fix PWM3
	arm64: dts: marvell: Add AP806-dual missing CPU clocks
	cfg80211: check for set_wiphy_params
	tick/sched: Annotate lockless access to last_jiffies_update
	arm64: dts: marvell: Fix CP110 NAND controller node multi-line comment alignment
	arm64: dts: renesas: r8a774a1: Remove audio port node
	arm64: dts: imx8mm-evk: Assigned clocks for audio plls
	arm64: dts: qcom: sdm845-cheza: delete zap-shader
	ARM: dts: imx6ul-kontron-n6310-s: Disable the snvs-poweroff driver
	arm64: dts: allwinner: a64: Re-add PMU node
	ARM: dts: dra7: fix cpsw mdio fck clock
	arm64: dts: juno: Fix UART frequency
	ARM: dts: Fix sgx sysconfig register for omap4
	Revert "arm64: dts: juno: add dma-ranges property"
	mtd: devices: fix mchp23k256 read and write
	mtd: cfi_cmdset_0002: only check errors when ready in cfi_check_err_status()
	mtd: cfi_cmdset_0002: fix delayed error detection on HyperFlash
	um: Don't trace irqflags during shutdown
	um: virtio_uml: Disallow modular build
	reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattr
	scsi: esas2r: unlock on error in esas2r_nvram_read_direct()
	scsi: hisi_sas: Don't create debugfs dump folder twice
	scsi: hisi_sas: Set the BIST init value before enabling BIST
	scsi: qla4xxx: fix double free bug
	scsi: bnx2i: fix potential use after free
	scsi: target: core: Fix a pr_debug() argument
	scsi: lpfc: fix: Coverity: lpfc_get_scsi_buf_s3(): Null pointer dereferences
	scsi: hisi_sas: Return directly if init hardware failed
	scsi: scsi_transport_sas: Fix memory leak when removing devices
	scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI
	scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan
	scsi: core: scsi_trace: Use get_unaligned_be*()
	scsi: lpfc: Fix list corruption detected in lpfc_put_sgl_per_hdwq
	scsi: lpfc: Fix hdwq sgl locks and irq handling
	scsi: lpfc: Fix a kernel warning triggered by lpfc_get_sgl_per_hdwq()
	rtw88: fix potential read outside array boundary
	perf probe: Fix wrong address verification
	perf script: Allow --time with --reltime
	clk: sprd: Use IS_ERR() to validate the return value of syscon_regmap_lookup_by_phandle()
	clk: imx7ulp: Correct system clock source option #7
	clk: imx7ulp: Correct DDR clock mux options
	regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_id
	hwmon: (pmbus/ibm-cffps) Switch LEDs to blocking brightness call
	hwmon: (pmbus/ibm-cffps) Fix LED blink behavior
	perf script: Fix --reltime with --time
	scsi: lpfc: use hdwq assigned cpu for allocation
	Linux 5.4.14

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I400bdf3be682df698c2477fbf869d5ad8ce300b5
2020-01-23 08:37:43 +01:00
Bart Van Assche
7ecc610a30 block: Fix the type of 'sts' in bsg_queue_rq()
commit c44a4edb20938c85b64a256661443039f5bffdea upstream.

This patch fixes the following sparse warnings:

block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19:    expected int sts
block/bsg-lib.c:269:19:    got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16:    expected restricted blk_status_t
block/bsg-lib.c:286:16:    got int [assigned] sts

Cc: Martin Wilck <mwilck@suse.com>
Fixes: d46fe2cb2d ("block: drop device references in bsg_queue_rq()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:22:44 +01:00
Mikulas Patocka
6eed26e35c block: fix an integer overflow in logical block size
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
  device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
  EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:22:32 +01:00
Barani Muthukumaran
3e8cceb730 ANDROID: block: add KSM op to derive software secret from wrapped key
Some inline encryption hardware supports protecting the keys in hardware
and only exposing wrapped keys to software.  To use this capability,
userspace must provide a hardware-wrapped key rather than a raw key.

However, users of inline encryption in the kernel won't necessarily use
the user-specified key directly for inline encryption.  E.g. with
fscrypt with IV_INO_LBLK_64 policies, each user-provided key is used to
derive a file contents encryption key, filenames encryption key, and key
identifier.  Since inline encryption can only be used with file
contents, if the user were to provide a wrapped key there would
(naively) be no way to encrypt filenames or derive the key identifier.

This problem is solved by designing the hardware to internally use the
unwrapped key as input to a KDF from which multiple cryptographically
isolated keys can be derived, including both the inline crypto key (not
exposed to software) and a secret that *is* exposed to software.

Add a function to the keyslot manager to allow upper layers to request
this software secret from a hardware-wrapped key.

Bug: 147209885

Change-Id: Iffb05b297b7ba3f3e865e798e4bb73aef4e6ba19
Co-developed-by: Gaurav Kashyap <gaurkash@codeaurora.org>
Signed-off-by: Gaurav Kashyap <gaurkash@codeaurora.org>
Signed-off-by: Barani Muthukumaran <bmuthuku@codeaurora.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-22 04:31:52 +00:00
Barani Muthukumaran
e25d82c5a0 ANDROID: block: provide key size as input to inline crypto APIs
Currently, blk-crypto uses the algorithm to determine the size of keys.
However, some inline encryption hardware supports protecting keys from
software by wrapping the storage keys with an ephemeral key.  Since
these wrapped keys are not of a fixed size, add the capability to
provide the key size when initializing a blk_crypto_key, and update the
keyslot manager to take size into account when comparing keys.

Bug: 147209885

Change-Id: Ibc2e4807e0a0ec473e72d131e0f20abb3a6033b5
Co-developed-by: Gaurav Kashyap <gaurkash@codeaurora.org>
Signed-off-by: Gaurav Kashyap <gaurkash@codeaurora.org>
Signed-off-by: Barani Muthukumaran <bmuthuku@codeaurora.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-22 04:31:41 +00:00
Greg Kroah-Hartman
b0b02162a4 This is the 5.4.13 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4iAaQACgkQONu9yGCS
 aT5vIg/+Lj4wdF3UuUWonHdWBhnfG2FKCWFTYJKPpFXFRMltAa27XKns/CvR8CBW
 9ztOH928CR8K9BS7HbfGtsgOEOVzILb4+akco5UhrTH93dc2T6RwSDiMpaULgeIF
 x/n834yNlsHs1NSmjjuimBe1j4NcZwPnnNVGKmFojkv04QPsFjP6HCp7PR2/PMXP
 CVO5JBXqMYtMRprY0xkpAGCStqVZPF6uwfTPrKRgaOCTpkKsqBEFJbwqOoqGQWou
 fQPOmEFjw+e9rIKzJgou6k4YGrWITcpNnUMdxavCszcQFTeUnY1vpLTiVxyZC1E3
 R+7ulfe+/zoQvWIer9H85ySLuOjSmmXb5CM9Fc0WLSsvKmTKfUNe/g5Cce+rngPY
 x/+tIBvXgFSoGR4oO5dEHhXn9Hzqr0OHbZy1dLKY1RU4NzxLsAtR2DH4ps25I4ux
 Ty2P0kYwm5Sz43MspnFAPTaU5kC3qHVNMjanbb5I7xGF2m0HZmh0zRHBC50DqP4Y
 nmLUklpX4EGVAYGb94YZMa3ugksSvie2SLgk838UQG+lGqaQoxAyAeRmDdyR1zE7
 GHlkNxWj8cbkBsPDSYt6Wvrt+7+e8Bbk5Y/fM5+j02h6ehs9wqOaQ985CfvrrYix
 RyGc7pWt1FPL7Kqv/CtbDieglS/P0BMPPGYX2rfidk6i+0knWaE=
 =53PP
 -----END PGP SIGNATURE-----

Merge 5.4.13 into android-5.4

Changes in 5.4.13
	HID: hidraw, uhid: Always report EPOLLOUT
	rtc: mt6397: fix alarm register overwrite
	phy: mapphone-mdm6600: Fix uninitialized status value regression
	RDMA/bnxt_re: Avoid freeing MR resources if dereg fails
	RDMA/bnxt_re: Fix Send Work Entry state check while polling completions
	IB/hfi1: Don't cancel unused work item
	mtd: rawnand: stm32_fmc2: avoid to lock the CPU bus
	i2c: bcm2835: Store pointer to bus clock
	ASoC: SOF: imx8: fix memory allocation failure check on priv->pd_dev
	ASoC: soc-core: Set dpcm_playback / dpcm_capture
	ASoC: stm32: spdifrx: fix inconsistent lock state
	ASoC: stm32: spdifrx: fix race condition in irq handler
	ASoC: stm32: spdifrx: fix input pin state management
	pinctrl: lochnagar: select GPIOLIB
	netfilter: nft_flow_offload: fix underflow in flowtable reference counter
	ASoC: SOF: imx8: Fix dsp_box offset
	mtd: onenand: omap2: Pass correct flags for prep_dma_memcpy
	gpio: zynq: Fix for bug in zynq_gpio_restore_context API
	pinctrl: meson: Fix wrong shift value when get drive-strength
	selftests: loopback.sh: skip this test if the driver does not support
	iommu/vt-d: Unlink device if failed to add to group
	iommu: Remove device link to group on failure
	bpf: cgroup: prevent out-of-order release of cgroup bpf
	fs: move guard_bio_eod() after bio_set_op_attrs
	scsi: mpt3sas: Fix double free in attach error handling
	gpio: Fix error message on out-of-range GPIO in lookup table
	PM / devfreq: tegra: Add COMMON_CLK dependency
	PCI: amlogic: Fix probed clock names
	drm/tegra: Fix ordering of cleanup code
	hsr: add hsr root debugfs directory
	hsr: rename debugfs file when interface name is changed
	hsr: reset network header when supervision frame is created
	s390/qeth: fix qdio teardown after early init error
	s390/qeth: fix false reporting of VNIC CHAR config failure
	s390/qeth: Fix vnicc_is_in_use if rx_bcast not set
	s390/qeth: vnicc Fix init to default
	s390/qeth: fix initialization on old HW
	cifs: Adjust indentation in smb2_open_file
	scsi: smartpqi: Update attribute name to `driver_version`
	MAINTAINERS: Append missed file to the database
	ath9k: use iowrite32 over __raw_writel
	can: j1939: fix address claim code example
	dt-bindings: reset: Fix brcmstb-reset example
	reset: brcmstb: Remove resource checks
	afs: Fix missing cell comparison in afs_test_super()
	perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description
	syscalls/x86: Wire up COMPAT_SYSCALL_DEFINE0
	syscalls/x86: Use COMPAT_SYSCALL_DEFINE0 for IA32 (rt_)sigreturn
	syscalls/x86: Use the correct function type for sys_ni_syscall
	syscalls/x86: Fix function types in COND_SYSCALL
	hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename()
	btrfs: simplify inode locking for RWF_NOWAIT
	netfilter: nf_tables_offload: release flow_rule on error from commit path
	netfilter: nft_meta: use 64-bit time arithmetic
	ASoC: dt-bindings: mt8183: add missing update
	ASoC: simple_card_utils.h: Add missing include
	ASoC: fsl_esai: Add spin lock to protect reset, stop and start
	ASoC: SOF: Intel: Broadwell: clarify mutual exclusion with legacy driver
	ASoC: core: Fix compile warning with CONFIG_DEBUG_FS=n
	ASoC: rsnd: fix DALIGN register for SSIU
	RDMA/hns: Prevent undefined behavior in hns_roce_set_user_sq_size()
	RDMA/hns: remove a redundant le16_to_cpu
	RDMA/hns: Modify return value of restrack functions
	RDMA/counter: Prevent QP counter manual binding in auto mode
	RDMA/siw: Fix port number endianness in a debug message
	RDMA/hns: Fix build error again
	RDMA/hns: Release qp resources when failed to destroy qp
	xprtrdma: Add unique trace points for posting Local Invalidate WRs
	xprtrdma: Connection becomes unstable after a reconnect
	xprtrdma: Fix MR list handling
	xprtrdma: Close window between waking RPC senders and posting Receives
	RDMA/hns: Fix to support 64K page for srq
	RDMA/hns: Bugfix for qpc/cqc timer configuration
	rdma: Remove nes ABI header
	RDMA/mlx5: Return proper error value
	RDMA/srpt: Report the SCSI residual to the initiator
	uaccess: Add non-pagefault user-space write function
	bpf: Make use of probe_user_write in probe write helper
	bpf: skmsg, fix potential psock NULL pointer dereference
	bpf: Support pre-2.25-binutils objcopy for vmlinux BTF
	libbpf: Fix Makefile' libbpf symbol mismatch diagnostic
	afs: Fix use-after-loss-of-ref
	afs: Fix afs_lookup() to not clobber the version on a new dentry
	keys: Fix request_key() cache
	scsi: enclosure: Fix stale device oops with hot replug
	scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI
	platform/mellanox: fix potential deadlock in the tmfifo driver
	platform/x86: asus-wmi: Fix keyboard brightness cannot be set to 0
	platform/x86: GPD pocket fan: Use default values when wrong modparams are given
	asm-generic/nds32: don't redefine cacheflush primitives
	Documentation/ABI: Fix documentation inconsistency for mlxreg-io sysfs interfaces
	Documentation/ABI: Add missed attribute for mlxreg-io sysfs interfaces
	xprtrdma: Fix create_qp crash on device unload
	xprtrdma: Fix completion wait during device removal
	xprtrdma: Fix oops in Receive handler after device removal
	dm: add dm-clone to the documentation index
	scsi: ufs: Give an unique ID to each ufs-bsg
	crypto: cavium/nitrox - fix firmware assignment to AE cores
	crypto: hisilicon - select NEED_SG_DMA_LENGTH in qm Kconfig
	crypto: arm64/aes-neonbs - add return value of skcipher_walk_done() in __xts_crypt()
	crypto: virtio - implement missing support for output IVs
	crypto: algif_skcipher - Use chunksize instead of blocksize
	crypto: geode-aes - convert to skcipher API and make thread-safe
	NFSv2: Fix a typo in encode_sattr()
	nfsd: Fix cld_net->cn_tfm initialization
	nfsd: v4 support requires CRYPTO_SHA256
	NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
	NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
	iio: imu: st_lsm6dsx: fix gyro gain definitions for LSM9DS1
	iio: imu: adis16480: assign bias value only if operation succeeded
	mei: fix modalias documentation
	clk: meson: axg-audio: fix regmap last register
	clk: samsung: exynos5420: Preserve CPU clocks configuration during suspend/resume
	clk: Fix memory leak in clk_unregister()
	dmaengine: dw: platform: Mark 'hclk' clock optional
	clk: imx: pll14xx: Fix quick switch of S/K parameter
	rsi: fix potential null dereference in rsi_probe()
	affs: fix a memory leak in affs_remount
	pinctl: ti: iodelay: fix error checking on pinctrl_count_index_with_args call
	pinctrl: sh-pfc: Fix PINMUX_IPSR_PHYS() to set GPSR
	pinctrl: sh-pfc: Do not use platform_get_irq() to count interrupts
	pinctrl: lewisburg: Update pin list according to v1.1v6
	PCI: pciehp: Do not disable interrupt twice on suspend
	Revert "drm/virtio: switch virtio_gpu_wait_ioctl() to gem helper."
	drm/amdgpu: cleanup creating BOs at fixed location (v2)
	drm/amdgpu/discovery: reserve discovery data at the top of VRAM
	scsi: sd: enable compat ioctls for sed-opal
	arm64: dts: apq8096-db820c: Increase load on l21 for SDCARD
	gfs2: add compat_ioctl support
	af_unix: add compat_ioctl support
	compat_ioctl: handle SIOCOUTQNSD
	PCI: aardvark: Use LTSSM state to build link training flag
	PCI: aardvark: Fix PCI_EXP_RTCTL register configuration
	PCI: dwc: Fix find_next_bit() usage
	PCI: Fix missing bridge dma_ranges resource list cleanup
	PCI/PM: Clear PCIe PME Status even for legacy power management
	tools: PCI: Fix fd leakage
	PCI/PTM: Remove spurious "d" from granularity message
	powerpc/powernv: Disable native PCIe port management
	MIPS: PCI: remember nasid changed by set interrupt affinity
	MIPS: Loongson: Fix return value of loongson_hwmon_init
	MIPS: SGI-IP27: Fix crash, when CPUs are disabled via nr_cpus parameter
	tty: serial: imx: use the sg count from dma_map_sg
	tty: serial: pch_uart: correct usage of dma_unmap_sg
	ARM: 8943/1: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC
	media: ov6650: Fix incorrect use of JPEG colorspace
	media: ov6650: Fix some format attributes not under control
	media: ov6650: Fix .get_fmt() V4L2_SUBDEV_FORMAT_TRY support
	media: ov6650: Fix default format not applied on device probe
	media: rcar-vin: Fix incorrect return statement in rvin_try_format()
	media: hantro: h264: Fix the frame_num wraparound case
	media: v4l: cadence: Fix how unsued lanes are handled in 'csi2rx_start()'
	media: exynos4-is: Fix recursive locking in isp_video_release()
	media: coda: fix deadlock between decoder picture run and start command
	media: cedrus: Use correct H264 8x8 scaling list
	media: hantro: Do not reorder H264 scaling list
	media: aspeed-video: Fix memory leaks in aspeed_video_probe
	media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly
	iommu/mediatek: Correct the flush_iotlb_all callback
	iommu/mediatek: Add a new tlb_lock for tlb_flush
	memory: mtk-smi: Add PM suspend and resume ops
	Revert "ubifs: Fix memory leak bug in alloc_ubifs_info() error path"
	ubifs: Fixed missed le64_to_cpu() in journal
	ubifs: do_kill_orphans: Fix a memory leak bug
	spi: sprd: Fix the incorrect SPI register
	mtd: spi-nor: fix silent truncation in spi_nor_read()
	mtd: spi-nor: fix silent truncation in spi_nor_read_raw()
	spi: pxa2xx: Set controller->max_transfer_size in dma mode
	spi: atmel: fix handling of cs_change set on non-last xfer
	spi: rspi: Use platform_get_irq_byname_optional() for optional irqs
	spi: lpspi: fix memory leak in fsl_lpspi_probe
	iwlwifi: mvm: consider ieee80211 station max amsdu value
	rtlwifi: Remove unnecessary NULL check in rtl_regd_init
	iwlwifi: mvm: fix support for single antenna diversity
	sch_cake: Add missing NLA policy entry TCA_CAKE_SPLIT_GSO
	f2fs: fix potential overflow
	NFSD fixing possible null pointer derefering in copy offload
	rtc: msm6242: Fix reading of 10-hour digit
	rtc: brcmstb-waketimer: add missed clk_disable_unprepare
	rtc: bd70528: Add MODULE ALIAS to autoload module
	gpio: mpc8xxx: Add platform device to gpiochip->parent
	scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy()
	scsi: target/iblock: Fix protection error with blocks greater than 512B
	selftests: firmware: Fix it to do root uid check and skip
	rseq/selftests: Turn off timeout setting
	riscv: export flush_icache_all to modules
	mips: cacheinfo: report shared CPU map
	mips: Fix gettimeofday() in the vdso library
	tomoyo: Suppress RCU warning at list_for_each_entry_rcu().
	MIPS: Prevent link failure with kcov instrumentation
	drm/arm/mali: make malidp_mw_connector_helper_funcs static
	rxrpc: Unlock new call in rxrpc_new_incoming_call() rather than the caller
	rxrpc: Don't take call->user_mutex in rxrpc_new_incoming_call()
	rxrpc: Fix missing security check on incoming calls
	dmaengine: k3dma: Avoid null pointer traversal
	s390/qeth: lock the card while changing its hsuid
	ioat: ioat_alloc_ring() failure handling.
	drm/amdgpu: enable gfxoff for raven1 refresh
	media: intel-ipu3: Align struct ipu3_uapi_awb_fr_config_s to 32 bytes
	kbuild/deb-pkg: annotate libelf-dev dependency as :native
	hexagon: parenthesize registers in asm predicates
	hexagon: work around compiler crash
	ocfs2: call journal flush to mark journal as empty after journal recovery when mount
	Linux 5.4.13

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I90734cd9d80f000e05a8109a529916ae641cdede
2020-01-17 23:38:39 +01:00
Ming Lei
3fe209c843 fs: move guard_bio_eod() after bio_set_op_attrs
commit 83c9c547168e8b914ea6398430473a4de68c52cc upstream.

Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.

So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.

Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.

Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Fold in kerneldoc and bio_op() change.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-17 19:48:21 +01:00
Jaegeuk Kim
57df3030e1 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.4.y' into android-5.4
Merged in v5.5-rc1.

* aosp/upstream-f2fs-stable-linux-5.4.y:
  docs: fs-verity: mention statx() support
  f2fs: support STATX_ATTR_VERITY
  ext4: support STATX_ATTR_VERITY
  statx: define STATX_ATTR_VERITY
  docs: fs-verity: document first supported kernel version
  f2fs: add support for IV_INO_LBLK_64 encryption policies
  ext4: add support for IV_INO_LBLK_64 encryption policies
  fscrypt: add support for IV_INO_LBLK_64 policies
  fscrypt: avoid data race on fscrypt_mode::logged_impl_name
  fscrypt: zeroize fscrypt_info before freeing
  fscrypt: remove struct fscrypt_ctx
  fscrypt: invoke crypto API for ESSIV handling
  null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
  block: set the zone size in blk_revalidate_disk_zones atomically
  block: don't handle bio based drivers in blk_revalidate_disk_zones
  null_blk: cleanup null_gendisk_register
  null_blk: fix zone size paramter check
  block: allocate the zone bitmaps lazily
  block: replace seq_zones_bitmap with conv_zones_bitmap
  block: simplify blkdev_nr_zones
  block: remove the empty line at the end of blk-zoned.c
  scsi: sd_zbc: Improve report zones error printout
  scsi: sd_zbc: Remove set but not used variable 'buflen'
  block: rework zone reporting
  scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  null_blk: clean up report zones
  null_blk: clean up the block device operations
  null_blk: return fixed zoned reads > write pointer
  scsi: sd_zbc: add zone open, close, and finish support
  block: Remove partition support for zoned block devices
  block: Simplify report zones execution
  block: cleanup the !zoned case in blk_revalidate_disk_zones
  block: Enhance blk_revalidate_disk_zones()
  block: add zone open, close and finish ioctl support
  block: add zone open, close and finish operations
  block: Simplify REQ_OP_ZONE_RESET_ALL handling
  block: Remove REQ_OP_ZONE_RESET plugging
  f2fs: stop GC when the victim becomes fully valid
  f2fs: expose main_blkaddr in sysfs
  f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
  f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
  f2fs: show f2fs instance in printk_ratelimited
  f2fs: fix potential overflow
  f2fs: fix to update dir's i_pino during cross_rename
  f2fs: support aligned pinned file
  f2fs: avoid kernel panic on corruption test
  f2fs: fix wrong description in document
  f2fs: cache global IPU bio
  f2fs: fix to avoid memory leakage in f2fs_listxattr
  f2fs: check total_segments from devices in raw_super
  f2fs: update multi-dev metadata in resize_fs
  f2fs: mark recovery flag correctly in read_raw_super_block()
  f2fs: fix to update time in lazytime mode

Change-Id: I9325127228fb82b67f064ce8b3bc8d40ac76e65b
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2020-01-15 16:41:51 -08:00
Satya Tangirala
b7b3af9614 BACKPORT: FROMLIST: Update Inline Encryption from v5 to v6 of patch series
Changes v5 => v6:
 - Blk-crypto's kernel crypto API fallback is no longer restricted to
   8-byte DUNs. It's also now separately configurable from blk-crypto, and
   can be disabled entirely, while still allowing the kernel to use inline
   encryption hardware. Further, struct bio_crypt_ctx takes up less space,
   and no longer contains the information needed by the crypto API
   fallback - the fallback allocates the required memory when necessary.
 - Blk-crypto now supports all file content encryption modes supported by
   fscrypt.
 - Fixed bio merging logic in blk-merge.c
 - Fscrypt now supports inline encryption with the direct key policy, since
   blk-crypto now has support for larger DUNs.
 - Keyslot manager now uses a hashtable to lookup which keyslot contains
   any particular key (thanks Eric!)
 - Fscrypt support for inline encryption now handles filesystems with
   multiple underlying block devices (thanks Eric!)
 - Numerous cleanups

Bug: 137270441
Test: refer to I26376479ee38259b8c35732cb3a1d7e15f9b05a3
Change-Id: I13e2e327e0b4784b394cb1e7cf32a04856d95f01
Link: https://lore.kernel.org/linux-block/20191218145136.172774-1-satyat@google.com/
Signed-off-by: Satya Tangirala <satyat@google.com>
2020-01-13 18:46:44 +00:00
Greg Kroah-Hartman
fde6e0c654 This is the 5.4.11 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4bAaMACgkQONu9yGCS
 aT6WThAApG5Lt+rOIIbb0JsTgiqzRs/5VkxQLDsDkn8QXMDDX44eY+cW3XLA+3nv
 UAU7wXraFCq7SznsQADHj4edAQN/urNXcIlLCIfoWCuq4Yk6DIgHNDZcC5a1PPiz
 ri96mrxHq24XDbRcXZFN4usC6Q1Q40W/N2NyZ7gIfCsYOeiaZzNhvw1Sh6Pkajb+
 jKe9Yzjolj2XJxrNgJxfsTJLnCEPwoQ/QoBIp1ffHqhhCjR/7tHm611Pj0q260Fj
 H6OGZaRNMoc4I+2dQXsYUfyPH5aMwx2/Nym4FNHye9LaoQl07m+uR0LqmytPQ0GL
 j6mQuMv+kdeVOOXO+zRJH8A2yq4mwvr80s15myhG9HvAzmcGAvagsCl19yy9/fJx
 6M2Sn8qDwJXRaxTc1e7figXkTZu5+sX7th3sUk0KbCHZ+UkJiCjXpJDgBK/HQkC3
 EsVFZGeIBySbWk2yYKzQkb4ZA32qbzUKW88Rjago3BOV96WHfnAhJDQPssDJqlcs
 cgK+UTQOJb9U1V+Kd4Z8uhlCeboaRj4yOFt2EGxkK2sqJse05eaTN0GPbP/3X6Be
 TyD17Cnv18Ltk2qf2DXanJSlrCUcHEfEDoQQTqJATxV4NLzTcwmVAscsv1aRmcot
 ii1ZTwqi04MLgaNla+6tqqZ/VufUtWVIbN73q2UdU8zZh5PHJ18=
 =cKx6
 -----END PGP SIGNATURE-----

Merge 5.4.11 into android-5.4

Changes in 5.4.11
	USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein
	bpf: Fix passing modified ctx to ld/abs/ind instruction
	ASoC: rt5682: fix i2c arbitration lost issue
	spi: pxa2xx: Add support for Intel Jasper Lake
	regulator: fix use after free issue
	ASoC: max98090: fix possible race conditions
	spi: fsl: Fix GPIO descriptor support
	gpio: Handle counting of Freescale chipselects
	spi: fsl: Handle the single hardwired chipselect case
	locking/spinlock/debug: Fix various data races
	netfilter: ctnetlink: netns exit must wait for callbacks
	x86/intel: Disable HPET on Intel Ice Lake platforms
	netfilter: nf_tables_offload: Check for the NETDEV_UNREGISTER event
	mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()
	libtraceevent: Fix lib installation with O=
	libtraceevent: Copy pkg-config file to output folder when using O=
	regulator: core: fix regulator_register() error paths to properly release rdev
	x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
	ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
	selftests: netfilter: use randomized netns names
	efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
	efi/gop: Return EFI_SUCCESS if a usable GOP was found
	efi/gop: Fix memory leak in __gop_query32/64()
	efi/earlycon: Remap entire framebuffer after page initialization
	ARM: dts: imx6ul: imx6ul-14x14-evk.dtsi: Fix SPI NOR probing
	ARM: vexpress: Set-up shared OPP table instead of individual for each CPU
	netfilter: uapi: Avoid undefined left-shift in xt_sctp.h
	netfilter: nft_set_rbtree: bogus lookup/get on consecutive elements in named sets
	netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_END
	netfilter: nf_tables: validate NFT_DATA_VALUE after nft_data_init()
	netfilter: nf_tables: skip module reference count bump on object updates
	netfilter: nf_tables_offload: return EOPNOTSUPP if rule specifies no actions
	ARM: dts: BCM5301X: Fix MDIO node address/size cells
	selftests/ftrace: Fix to check the existence of set_ftrace_filter
	selftests/ftrace: Fix ftrace test cases to check unsupported
	selftests/ftrace: Do not to use absolute debugfs path
	selftests/ftrace: Fix multiple kprobe testcase
	selftests: safesetid: Move link library to LDLIBS
	selftests: safesetid: Check the return value of setuid/setgid
	selftests: safesetid: Fix Makefile to set correct test program
	ARM: exynos_defconfig: Restore debugfs support
	ARM: dts: Cygnus: Fix MDIO node address/size cells
	spi: spi-cavium-thunderx: Add missing pci_release_regions()
	reset: Do not register resource data for missing resets
	ASoC: topology: Check return value for snd_soc_add_dai_link()
	ASoC: topology: Check return value for soc_tplg_pcm_create()
	ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header
	ASoC: SOF: Intel: split cht and byt debug window sizes
	ARM: dts: am335x-sancloud-bbe: fix phy mode
	ARM: omap2plus_defconfig: Add back DEBUG_FS
	ARM: dts: bcm283x: Fix critical trip point
	arm64: dts: ls1028a: fix typo in TMU calibration data
	bpf, riscv: Limit to 33 tail calls
	bpf, mips: Limit to 33 tail calls
	bpftool: Don't crash on missing jited insns or ksyms
	perf metricgroup: Fix printing event names of metric group with multiple events
	perf header: Fix false warning when there are no duplicate cache entries
	spi: spi-ti-qspi: Fix a bug when accessing non default CS
	ARM: dts: am437x-gp/epos-evm: fix panel compatible
	kselftest/runner: Print new line in print of timeout log
	kselftest: Support old perl versions
	samples: bpf: Replace symbol compare of trace_event
	samples: bpf: fix syscall_tp due to unused syscall
	arm64: dts: ls1028a: fix reboot node
	ARM: imx_v6_v7_defconfig: Explicitly restore CONFIG_DEBUG_FS
	pinctrl: aspeed-g6: Fix LPC/eSPI mux configuration
	bus: ti-sysc: Fix missing reset delay handling
	clk: walk orphan list on clock provider registration
	mac80211: fix TID field in monitor mode transmit
	cfg80211: fix double-free after changing network namespace
	pinctrl: pinmux: fix a possible null pointer in pinmux_can_be_used_for_gpio
	powerpc: Ensure that swiotlb buffer is allocated from low memory
	btrfs: Fix error messages in qgroup_rescan_init
	Btrfs: fix cloning range with a hole when using the NO_HOLES feature
	powerpc/vcpu: Assume dedicated processors as non-preempt
	powerpc/spinlocks: Include correct header for static key
	btrfs: handle error in btrfs_cache_block_group
	Btrfs: fix hole extent items with a zero size after range cloning
	ocxl: Fix potential memory leak on context creation
	bpf: Clear skb->tstamp in bpf_redirect when necessary
	habanalabs: rate limit error msg on waiting for CS
	habanalabs: remove variable 'val' set but not used
	bnx2x: Do not handle requests from VFs after parity
	bnx2x: Fix logic to get total no. of PFs per engine
	cxgb4: Fix kernel panic while accessing sge_info
	net: usb: lan78xx: Fix error message format specifier
	parisc: fix compilation when KEXEC=n and KEXEC_FILE=y
	parisc: add missing __init annotation
	rfkill: Fix incorrect check to avoid NULL pointer dereference
	ASoC: wm8962: fix lambda value
	regulator: rn5t618: fix module aliases
	spi: nxp-fspi: Ensure width is respected in spi-mem operations
	clk: at91: fix possible deadlock
	staging: axis-fifo: add unspecified HAS_IOMEM dependency
	iommu/iova: Init the struct iova to fix the possible memleak
	kconfig: don't crash on NULL expressions in expr_eq()
	scripts: package: mkdebian: add missing rsync dependency
	perf/x86: Fix potential out-of-bounds access
	perf/x86/intel: Fix PT PMI handling
	sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
	psi: Fix a division error in psi poll()
	usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state'
	block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
	fs: avoid softlockups in s_inodes iterators
	fs: call fsnotify_sb_delete after evict_inodes
	perf/smmuv3: Remove the leftover put_cpu() in error path
	iommu/dma: Relax locking in iommu_dma_prepare_msi()
	io_uring: don't wait when under-submitting
	clk: Move clk_core_reparent_orphans() under CONFIG_OF
	net: stmmac: selftests: Needs to check the number of Multicast regs
	net: stmmac: Determine earlier the size of RX buffer
	net: stmmac: Do not accept invalid MTU values
	net: stmmac: xgmac: Clear previous RX buffer size
	net: stmmac: RX buffer size must be 16 byte aligned
	net: stmmac: Always arm TX Timer at end of transmission start
	s390/purgatory: do not build purgatory with kcov, kasan and friends
	drm/exynos: gsc: add missed component_del
	tpm/tpm_ftpm_tee: add shutdown call back
	xsk: Add rcu_read_lock around the XSK wakeup
	net/mlx5e: Fix concurrency issues between config flow and XSK
	net/i40e: Fix concurrency issues between config flow and XSK
	net/ixgbe: Fix concurrency issues between config flow and XSK
	platform/x86: pcengines-apuv2: fix simswap GPIO assignment
	arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list
	block: Fix a lockdep complaint triggered by request queue flushing
	s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
	s390/dasd: fix memleak in path handling error case
	block: fix memleak when __blk_rq_map_user_iov() is failed
	parisc: Fix compiler warnings in debug_core.c
	sbitmap: only queue kyber's wait callback if not already active
	s390/qeth: handle error due to unsupported transport mode
	s390/qeth: fix promiscuous mode after reset
	s390/qeth: don't return -ENOTSUPP to userspace
	llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
	hv_netvsc: Fix unwanted rx_table reset
	selftests: pmtu: fix init mtu value in description
	tracing: Do not create directories if lockdown is in affect
	gtp: fix bad unlock balance in gtp_encap_enable_socket
	macvlan: do not assume mac_header is set in macvlan_broadcast()
	net: dsa: mv88e6xxx: Preserve priority when setting CPU port.
	net: freescale: fec: Fix ethtool -d runtime PM
	net: stmmac: dwmac-sun8i: Allow all RGMII modes
	net: stmmac: dwmac-sunxi: Allow all RGMII modes
	net: stmmac: Fixed link does not need MDIO Bus
	net: usb: lan78xx: fix possible skb leak
	pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
	sch_cake: avoid possible divide by zero in cake_enqueue()
	sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY
	tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
	vxlan: fix tos value before xmit
	mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO
	net: sch_prio: When ungrafting, replace with FIFO
	vlan: fix memory leak in vlan_dev_set_egress_priority
	vlan: vlan_changelink() should propagate errors
	macb: Don't unregister clks unconditionally
	net/mlx5: Move devlink registration before interfaces load
	net: dsa: mv88e6xxx: force cmode write on 6141/6341
	net/mlx5e: Always print health reporter message to dmesg
	net/mlx5: DR, No need for atomic refcount for internal SW steering resources
	net/mlx5e: Fix hairpin RSS table size
	net/mlx5: DR, Init lists that are used in rule's member
	usb: dwc3: gadget: Fix request complete check
	USB: core: fix check for duplicate endpoints
	USB: serial: option: add Telit ME910G1 0x110a composition
	usb: missing parentheses in USE_NEW_SCHEME
	Linux 5.4.11

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idb9985bebc97203fa305f881fd98a62ac08e66d9
2020-01-12 15:36:52 +01:00
Yang Yingliang
3a1cba8768 block: fix memleak when __blk_rq_map_user_iov() is failed
[ Upstream commit 3b7995a98ad76da5597b488fa84aa5a56d43b608 ]

When I doing fuzzy test, get the memleak report:

BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
  comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00   ...............
  backtrace:
    [<000000001c894df8>] bio_alloc_bioset+0x393/0x590
    [<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
    [<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
    [<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
    [<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
    [<00000000064bb208>] sg_write.part.25+0x5d9/0x950
    [<000000004fc670f6>] sg_write+0x5f/0x8c
    [<00000000b0d05c7b>] __vfs_write+0x7c/0x100
    [<000000008e177714>] vfs_write+0x1c3/0x500
    [<0000000087d23f34>] ksys_write+0xf9/0x200
    [<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
    [<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().

Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:21:43 +01:00
Bart Van Assche
f7cc2f988f block: Fix a lockdep complaint triggered by request queue flushing
[ Upstream commit b3c6a59975415bde29cfd76ff1ab008edbf614a9 ]

Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:

============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&fq->mq_flush_lock)->rlock);
  lock(&(&fq->mq_flush_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by ksoftirqd/1/16:
 #0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 dump_stack+0x67/0x90
 __lock_acquire.cold.45+0x2b4/0x313
 lock_acquire+0x98/0x160
 _raw_spin_lock_irqsave+0x3b/0x80
 flush_end_io+0x4e/0x1d0
 blk_mq_complete_request+0x76/0x110
 nvmet_req_complete+0x15/0x110 [nvmet]
 nvmet_bio_done+0x27/0x50 [nvmet]
 blk_update_request+0xd7/0x2d0
 blk_mq_end_request+0x1a/0x100
 blk_flush_complete_seq+0xe5/0x350
 flush_end_io+0x12f/0x1d0
 blk_done_softirq+0x9f/0xd0
 __do_softirq+0xca/0x440
 run_ksoftirqd+0x24/0x50
 smpboot_thread_fn+0x113/0x1e0
 kthread+0x121/0x140
 ret_from_fork+0x3a/0x50

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:21:42 +01:00
Roman Penyaev
2ac95310fe block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
[ Upstream commit c58c1f83436b501d45d4050fd1296d71a9760bcb ]

Non-mq devs do not honor REQ_NOWAIT so give a chance to the caller to repeat
request gracefully on -EAGAIN error.

The problem is well reproduced using io_uring:

   mkfs.ext4 /dev/ram0
   mount /dev/ram0 /mnt

   # Preallocate a file
   dd if=/dev/zero of=/mnt/file bs=1M count=1

   # Start fio with io_uring and get -EIO
   fio --rw=write --ioengine=io_uring --size=1M --direct=1 --name=job --filename=/mnt/file

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:21:37 +01:00
Greg Kroah-Hartman
813bf83282 This is the 5.4.9 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4W8EgACgkQONu9yGCS
 aT4szA//fqXI1OQ3xcCt5s9MYZYYa6IpX/VZ0H7lNC/7pkJzccKo+aSer7ppEn4o
 ND8sHNx/lhfZorhvLdqJK4PLThC+fXmXnLvFOzqvZeUVyesnv9zlhd/5JNu18Fvc
 RNjcIRIAHFwanZLAw8uft1DIZXcZ8wNkAAugn/WQV3FN/TG+FsrDzWYnmbBhRIQS
 XC/2jSlFpMTKoExNzEdbduG0XH5plWeE+AdY3a+DQsOBUO2XrAuk5HTEByM1jzPV
 W7U9vMqvw3OyrERcA0lmjs37Waw1e0qzfUaa8Bman5Uc0StOTq0UwschX21SB5yP
 MvbAKhqaKtSff7b4lNrDP9Kj1O/lH84WPSn/aao9D083m/ZYdkkd4AWMlS480lL5
 oJ28tFbgwLayIqDbwCggHluTsNUdQSTwahVbnp4GMqxfjWrApdLPCqloSb+x9JCF
 9pWJf3awI53mA864pH/uOM7pDOz5/c/oJ4QzVmOmR48dsddorY+gPcwk+YpElJcZ
 +xCBQDN5JkNC7lwqu2lvaoq/5cMC5lO/v6aeTfsYCRVnlNY12TY8z352zzMZfCKG
 GRkNvDqWZ5ZmQ+LblWRVbgdGxU42wIYXUS1jUdFd+5DRzz17+ZKUy7YbLNmZMcpY
 UyiM2Ij7X7HsNGrYDKFq0lZPw6k7v3FshvMwQ8C6dNk+l3o9oCA=
 =M+hs
 -----END PGP SIGNATURE-----

Merge 5.4.9 into android-5.4

Changes in 5.4.9
	drm/mcde: dsi: Fix invalid pointer dereference if panel cannot be found
	nvme_fc: add module to ops template to allow module references
	nvme-fc: fix double-free scenarios on hw queues
	drm/amdgpu: add check before enabling/disabling broadcast mode
	drm/amdgpu: add header line for power profile on Arcturus
	drm/amdgpu: add cache flush workaround to gfx8 emit_fence
	drm/amd/display: Map DSC resources 1-to-1 if numbers of OPPs and DSCs are equal
	drm/amd/display: Fixed kernel panic when booting with DP-to-HDMI dongle
	drm/amd/display: Change the delay time before enabling FEC
	drm/amd/display: Reset steer fifo before unblanking the stream
	drm/amd/display: update dispclk and dppclk vco frequency
	nvme/pci: Fix write and poll queue types
	nvme/pci: Fix read queue count
	iio: st_accel: Fix unused variable warning
	iio: adc: max9611: Fix too short conversion time delay
	PM / devfreq: Fix devfreq_notifier_call returning errno
	PM / devfreq: Set scaling_max_freq to max on OPP notifier error
	PM / devfreq: Don't fail devfreq_dev_release if not in list
	afs: Fix afs_find_server lookups for ipv4 peers
	afs: Fix SELinux setting security label on /afs
	RDMA/cma: add missed unregister_pernet_subsys in init failure
	rxe: correctly calculate iCRC for unaligned payloads
	scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
	scsi: qla2xxx: Use explicit LOGO in target mode
	scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
	scsi: qla2xxx: Don't call qlt_async_event twice
	scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length
	scsi: qla2xxx: Configure local loop for N2N target
	scsi: qla2xxx: Send Notify ACK after N2N PLOGI
	scsi: qla2xxx: Don't defer relogin unconditonally
	scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI
	scsi: iscsi: qla4xxx: fix double free in probe
	scsi: libsas: stop discovering if oob mode is disconnected
	scsi: iscsi: Avoid potential deadlock in iscsi_if_rx func
	staging/wlan-ng: add CRC32 dependency in Kconfig
	drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
	drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware
	drm/nouveau/kms/nv50-: fix panel scaling
	usb: gadget: fix wrong endpoint desc
	net: make socket read/write_iter() honor IOCB_NOWAIT
	afs: Fix mountpoint parsing
	afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
	raid5: need to set STRIPE_HANDLE for batch head
	md: raid1: check rdev before reference in raid1_sync_request func
	s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
	s390/cpum_sf: Avoid SBD overflow condition in irq handler
	RDMA/counter: Prevent auto-binding a QP which are not tracked with res
	IB/mlx4: Follow mirror sequence of device add during device removal
	IB/mlx5: Fix steering rule of drop and count
	xen-blkback: prevent premature module unload
	xen/balloon: fix ballooned page accounting without hotplug enabled
	PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
	ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker
	ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC
	PCI: Add a helper to check Power Resource Requirements _PR3 existence
	ALSA: hda: Allow HDA to be runtime suspended when dGPU is not bound to a driver
	PCI: Fix missing inline for pci_pr3_present()
	ALSA: hda - fixup for the bass speaker on Lenovo Carbon X1 7th gen
	tcp: fix data-race in tcp_recvmsg()
	shmem: pin the file in shmem_fault() if mmap_sem is dropped
	taskstats: fix data-race
	ALSA: hda - Downgrade error message for single-cmd fallback
	netfilter: nft_tproxy: Fix port selector on Big Endian
	block: add bio_truncate to fix guard_bio_eod
	mm: drop mmap_sem before calling balance_dirty_pages() in write fault
	ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
	ALSA: usb-audio: fix set_format altsetting sanity check
	ALSA: usb-audio: set the interface format after resume on Dell WD19
	ALSA: hda - Apply sync-write workaround to old Intel platforms, too
	ALSA: hda/realtek - Add headset Mic no shutup for ALC283
	drm/sun4i: hdmi: Remove duplicate cleanup calls
	drm/amdgpu/smu: add metrics table lock
	drm/amdgpu/smu: add metrics table lock for arcturus (v2)
	drm/amdgpu/smu: add metrics table lock for navi (v2)
	drm/amdgpu/smu: add metrics table lock for vega20 (v2)
	MIPS: BPF: Disable MIPS32 eBPF JIT
	MIPS: BPF: eBPF JIT: check for MIPS ISA compliance in Kconfig
	MIPS: Avoid VDSO ABI breakage due to global register variable
	media: pulse8-cec: fix lost cec_transmit_attempt_done() call
	media: cec: CEC 2.0-only bcast messages were ignored
	media: cec: avoid decrementing transmit_queue_sz if it is 0
	media: cec: check 'transmit_in_progress', not 'transmitting'
	mm/memory_hotplug: shrink zones when offlining memory
	mm/zsmalloc.c: fix the migrated zspage statistics.
	memcg: account security cred as well to kmemcg
	mm: move_pages: return valid node id in status if the page is already on the target node
	mm/oom: fix pgtables units mismatch in Killed process message
	ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
	pstore/ram: Write new dumps to start of recycled zones
	pstore/ram: Fix error-path memory leak in persistent_ram_new() callers
	gcc-plugins: make it possible to disable CONFIG_GCC_PLUGINS again
	locks: print unsigned ino in /proc/locks
	selftests/seccomp: Zero out seccomp_notif
	seccomp: Check that seccomp_notif is zeroed out by the user
	samples/seccomp: Zero out members based on seccomp_notif_sizes
	selftests/seccomp: Catch garbage on SECCOMP_IOCTL_NOTIF_RECV
	dmaengine: Fix access to uninitialized dma_slave_caps
	dmaengine: dma-jz4780: Also break descriptor chains on JZ4725B
	Btrfs: fix infinite loop during nocow writeback due to race
	compat_ioctl: block: handle Persistent Reservations
	compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
	compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES
	bpf: Fix precision tracking for unbounded scalars
	ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
	ata: ahci_brcm: Fix AHCI resources management
	ata: ahci_brcm: Add missing clock management during recovery
	ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE
	libata: Fix retrieving of active qcs
	gpio: xtensa: fix driver build
	gpiolib: fix up emulated open drain outputs
	clocksource: riscv: add notrace to riscv_sched_clock
	riscv: ftrace: correct the condition logic in function graph tracer
	rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30
	tracing: Fix lock inversion in trace_event_enable_tgid_record()
	tracing: Avoid memory leak in process_system_preds()
	tracing: Have the histogram compare functions convert to u64 first
	tracing: Fix endianness bug in histogram trigger
	samples/trace_printk: Wait for IRQ work to finish
	io_uring: use current task creds instead of allocating a new one
	mm/gup: fix memory leak in __gup_benchmark_ioctl
	apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
	dmaengine: virt-dma: Fix access after free in vchan_complete()
	gen_initramfs_list.sh: fix 'bad variable name' error
	ALSA: cs4236: fix error return comparison of an unsigned integer
	ALSA: pcm: Yet another missing check of non-cached buffer type
	ALSA: firewire-motu: Correct a typo in the clock proc string
	scsi: lpfc: Fix rpi release when deleting vport
	exit: panic before exit_mm() on global init exit
	arm64: Revert support for execute-only user mappings
	ftrace: Avoid potential division by zero in function profiler
	spi: spi-fsl-dspi: Fix 16-bit word order in 32-bit XSPI mode
	drm/msm: include linux/sched/task.h
	PM / devfreq: Check NULL governor in available_governors_show
	sunrpc: fix crash when cache_head become valid before update
	arm64: dts: qcom: msm8998-clamshell: Remove retention idle state
	nfsd4: fix up replay_matches_cache()
	powerpc: Chunk calls to flush_dcache_range in arch_*_memory
	HID: i2c-hid: Reset ALPS touchpads on resume
	net/sched: annotate lockless accesses to qdisc->empty
	kernel/module.c: wakeup processes in module_wq on module unload
	ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100
	perf callchain: Fix segfault in thread__resolve_callchain_sample()
	iommu/vt-d: Remove incorrect PSI capability check
	of: overlay: add_changeset_property() memory leak
	cifs: Fix potential softlockups while refreshing DFS cache
	firmware: arm_scmi: Avoid double free in error flow
	xfs: don't check for AG deadlock for realtime files in bunmapi
	platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
	netfilter: nf_queue: enqueue skbs with NULL dst
	net, sysctl: Fix compiler warning when only cBPF is present
	watchdog: tqmx86_wdt: Fix build error
	regulator: axp20x: Fix axp20x_set_ramp_delay
	regulator: bd70528: Remove .set_ramp_delay for bd70528_ldo_ops
	spi: uniphier: Fix FIFO threshold
	regulator: axp20x: Fix AXP22x ELDO2 regulator enable bitmask
	powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notrace
	Bluetooth: btusb: fix PM leak in error case of setup
	Bluetooth: delete a stray unlock
	Bluetooth: Fix memory leak in hci_connect_le_scan
	arm64: dts: meson-gxl-s905x-khadas-vim: fix uart_A bluetooth node
	arm64: dts: meson-gxm-khadas-vim2: fix uart_A bluetooth node
	media: flexcop-usb: ensure -EIO is returned on error condition
	regulator: ab8500: Remove AB8505 USB regulator
	media: usb: fix memory leak in af9005_identify_state
	dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example
	arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning
	phy: renesas: rcar-gen3-usb2: Use platform_get_irq_optional() for optional irq
	tty: serial: msm_serial: Fix lockup for sysrq and oops
	cifs: Fix lookup of root ses in DFS referral cache
	fs: cifs: Fix atime update check vs mtime
	fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
	ath9k_htc: Modify byte order for an error message
	ath9k_htc: Discard undersized packets
	drm/i915/execlists: Fix annotation for decoupling virtual request
	xfs: periodically yield scrub threads to the scheduler
	net: add annotations on hh->hh_len lockless accesses
	ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps
	btrfs: get rid of unique workqueue helper functions
	Btrfs: only associate the locked page with one async_chunk struct
	s390/smp: fix physical to logical CPU map for SMT
	mm/sparse.c: mark populate_section_memmap as __meminit
	xen/blkback: Avoid unmapping unmapped grant pages
	lib/ubsan: don't serialize UBSAN report
	efi: Don't attempt to map RCI2 config table if it doesn't exist
	perf/x86/intel/bts: Fix the use of page_private()
	net: annotate lockless accesses to sk->sk_pacing_shift
	hsr: avoid debugfs warning message when module is remove
	hsr: fix error handling routine in hsr_dev_finalize()
	hsr: fix a race condition in node list insertion and deletion
	mm/hugetlb: defer freeing of huge pages if in non-task context
	Linux 5.4.9

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8eebcdac421faf74f70af8e8666abfdcdc45c86b
2020-01-09 16:00:18 +01:00
Arnd Bergmann
247aca0b6b compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES
commit 21d37340912d74b1222d43c11aa9dd0687162573 upstream.

These were added to blkdev_ioctl() in v4.20 but not blkdev_compat_ioctl,
so add them now.

Cc: <stable@vger.kernel.org> # v4.20+
Fixes: 72cd87576d ("block: Introduce BLKGETZONESZ ioctl")
Fixes: 65e4e3eee8 ("block: Introduce BLKGETNRZONES ioctl")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:58 +01:00
Arnd Bergmann
17d3c07aab compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
commit 673bdf8ce0a387ef585c13b69a2676096c6edfe9 upstream.

These were added to blkdev_ioctl() but not blkdev_compat_ioctl,
so add them now.

Cc: <stable@vger.kernel.org> # v4.10+
Fixes: 3ed05a987e ("blk-zoned: implement ioctls")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:58 +01:00
Arnd Bergmann
755d02fcf8 compat_ioctl: block: handle Persistent Reservations
commit b2c0fcd28772f99236d261509bcd242135677965 upstream.

These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.

Cc: <stable@vger.kernel.org> # v4.4+
Fixes: bbd3e06436 ("block: add an API for Persistent Reservations")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Fold in followup patch from Arnd with missing pr.h header include.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-09 10:19:58 +01:00
Ming Lei
943cd69efa block: add bio_truncate to fix guard_bio_eod
[ Upstream commit 85a8ce62c2eabe28b9d76ca4eecf37922402df93 ]

Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.

Commit dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.

Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.

Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:

- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.

- zero all truncated segments for read bio

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09 10:19:54 +01:00
Christoph Hellwig
93eab050d3 block: set the zone size in blk_revalidate_disk_zones atomically
The current zone revalidation code has a major problem in that it
doesn't update the zone size and q->nr_zones atomically, leading
to a short window where an out of bounds access to the zone arrays
is possible.

To fix this move the setting of the zone size into the crticial
sections blk_revalidate_disk_zones so that it gets updated together
with the zone bitmaps and q->nr_zones.  This also slightly simplifies
the caller as it deducts the zone size from the report_zones.

This change also allows to check for a power of two zone size in generic
code.

Reported-by: Hans Holmberg <hans@owltronix.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:50 -08:00
Christoph Hellwig
43ea7a9a2d block: don't handle bio based drivers in blk_revalidate_disk_zones
bio based drivers only need to update q->nr_zones.  Do that manually
instead of overloading blk_revalidate_disk_zones to keep that function
simpler for the next round of changes that will rely even more on the
request based functionality.

Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:50 -08:00
Christoph Hellwig
290fc473cc block: allocate the zone bitmaps lazily
Allocate the conventional zone bitmap and the sequential zone locking
bitmap only when we find a zone of the respective type.  This avoids
wasting memory on the conventional zone bitmap for devices that only
have sequential zones, and will also prepare for other future changes.

Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:49 -08:00
Christoph Hellwig
a062da74fe block: replace seq_zones_bitmap with conv_zones_bitmap
Invert the meaning of seq_zones_bitmap by keeping a bitmap of
conventional zones.  This allows not having a bitmap for devices
that do not have conventional zones.

Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:49 -08:00
Christoph Hellwig
a354b75ed5 block: simplify blkdev_nr_zones
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity.  This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:49 -08:00
Christoph Hellwig
9d7cf94280 block: remove the empty line at the end of blk-zoned.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:48 -08:00
Christoph Hellwig
202b0716c3 block: rework zone reporting
Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:47 -08:00
Damien Le Moal
18cfc4d87f block: Remove partition support for zoned block devices
No known partitioning tool supports zoned block devices, especially the
host managed flavor with strong sequential write constraints.
Furthermore, there are also no known user nor use cases for partitioned
zoned block devices.

This patch removes partition device creation for zoned block devices,
which allows simplifying the processing of zone commands for zoned
block devices. A warning is added if a partition table is found on the
device.

For report zones operations no zone sector information remapping is
necessary anymore, simplifying the code. Of note is that remapping of
zone reports for DM targets is still necessary as done by
dm_remap_zone_report().

Similarly, remaping of a zone reset bio is not necessary anymore.
Testing for the applicability of the zone reset all request also becomes
simpler and only needs to check that the number of sectors of the
requested zone range is equal to the disk capacity.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:46 -08:00
Damien Le Moal
345c506d20 block: Simplify report zones execution
All kernel users of blkdev_report_zones() as well as applications use
through ioctl(BLKZONEREPORT) expect to potentially get less zone
descriptors than requested. As such, the use of the internal report
zones command execution loop implemented by blk_report_zones() is
not necessary and can even be harmful to performance by causing the
execution of inefficient small zones report command to service the
reminder of a requested zone array.

This patch removes blk_report_zones(), simplifying the code. Also
remove a now incorrect comment in dm_blk_report_zones().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier Gonzalez <javier@javigon.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:45 -08:00
Christoph Hellwig
c193df7366 block: cleanup the !zoned case in blk_revalidate_disk_zones
blk_revalidate_disk_zones is never called for non-zoned devices.  Just
return early and warn instead of trying to handle this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:45 -08:00
Damien Le Moal
71cf2b3aea block: Enhance blk_revalidate_disk_zones()
For ZBC and ZAC zoned devices, the scsi driver revalidation processing
implemented by sd_revalidate_disk() includes a call to
sd_zbc_read_zones() which executes a full disk zone report used to
check that all zones of the disk are the same size. This processing is
followed by a call to blk_revalidate_disk_zones(), used to initialize
the device request queue zone bitmaps (zone type and zone write lock
bitmaps). To do so, blk_revalidate_disk_zones() also executes a full
device zone report to obtain zone types. As a result, the entire
zoned block device revalidation process includes two full device zone
report.

By moving the zone size checks into blk_revalidate_disk_zones(), this
process can be optimized to a single full device zone report, leading to
shorter device scan and revalidation times. This patch implements this
optimization, reducing the original full device zone report implemented
in sd_zbc_check_zones() to a single, small, report zones command
execution to obtain the size of the first zone of the device. Checks
whether all zones of the device are the same size as the first zone
size are moved to the generic blk_check_zone() function called from
blk_revalidate_disk_zones().

This optimization also has the following benefits:
1) fewer memory allocations in the scsi layer during disk revalidation
   as the potentailly large buffer for zone report execution is not
   needed.
2) Implement zone checks in a generic manner, reducing the burden on
   device driver which only need to obtain the zone size and check that
   this size is a power of 2 number of LBAs. Any new type of zoned
   block device will benefit from this.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:45 -08:00
Ajay Joshi
85154041a8 block: add zone open, close and finish ioctl support
Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and
BLKFINISHZONE to allow applications to control the condition of zones
on a zoned block device through the execution of the REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.

Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:45 -08:00
Ajay Joshi
60793393f3 block: add zone open, close and finish operations
Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
  actively be written
* Close a zone: Transition to closed condition to release the drive
  resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
  condition to prevent write operations

To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.

Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:44 -08:00
Damien Le Moal
7aa2b3379c block: Simplify REQ_OP_ZONE_RESET_ALL handling
There is no need for the function __blkdev_reset_all_zones() as
REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones()
bio loop with an early break from the loop. This patch removes this
function and modifies blkdev_reset_zones(), simplifying the code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:44 -08:00
Damien Le Moal
bdf770eb9c block: Remove REQ_OP_ZONE_RESET plugging
REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests
do not have a size and are never sequential due to the zone start sector
position required for their execution. As a result, there is no point in
using a plug around blkdev_reset_zones() bio issuing loop. This patch
removes this unnecessary plugging.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 16:02:44 -08:00
Greg Kroah-Hartman
861433ef01 This is the 5.4.7 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4LbVwACgkQONu9yGCS
 aT6zdhAAkTwLWNfzyk1cSPKzWdguZWoqAuddmIeCUaDmyPwI6TE5a2J8IfZ7upYU
 4U2J4nO4I9WxVuTUgJpE0wnDidkvL6U7YfbMbqjVkxAfbFboxN8dJYDNTISehK2A
 WgpIpJadhj1M8Akxq0MLxuCMg11UJU2PP5Tc7K5aKgiVVneDodupjY8Ddksuw8SZ
 5Mus33uOjpCvtxt3GZIRgAdduhL3s3h2Vp+dyAzV2eBvSGwd9Rz4/p0OGZytH780
 oywFYzIU7CxtI7pxIQKBxegb4incjWnlRpP3Dk80CNXHzcuU6WGXARoHjgXmWYcu
 b9hX+/fM76qxbxojM1vI9QVuwy5uB++4NMVsX3e0xFxdvkTo2+Y/vbO3sBMmAy5i
 L0S0sftTuE6bg1XCWoeFbLUaXIWF7g0Xbc91VP7Wv5VolpIrwZsVHJNT4Nf8KHM4
 DuRLmrANhU7ax2E26Bbt17/otCFvjyeQj5fggw/1rkEN8fJSY1YU/SWOxizDY6GZ
 S3ovivhqhLIaDPzW+qmphYRGnBDkTq7HxHal8eJoy/cgvFxBOYAbfXiHuPuNP4Kj
 zbIThujSlbI0gNGymoHH8EoVOJeNcK8L2PsilIZdlPDWi45v5tqYIgiYIA7mqqu+
 6O2plNGWbYK2+ARPkCJ08XTdDFSm+B6Cm0+KFvsdjuQ6xNkhMwc=
 =HvEl
 -----END PGP SIGNATURE-----

Merge 5.4.7 into android-5.4

Changes in 5.4.7
	af_packet: set defaule value for tmo
	fjes: fix missed check in fjes_acpi_add
	mod_devicetable: fix PHY module format
	net: dst: Force 4-byte alignment of dst_metrics
	net: gemini: Fix memory leak in gmac_setup_txqs
	net: hisilicon: Fix a BUG trigered by wrong bytes_compl
	net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive()
	net: phy: ensure that phy IDs are correctly typed
	net: qlogic: Fix error paths in ql_alloc_large_buffers()
	net-sysfs: Call dev_hold always in rx_queue_add_kobject
	net: usb: lan78xx: Fix suspend/resume PHY register access error
	nfp: flower: fix stats id allocation
	qede: Disable hardware gro when xdp prog is installed
	qede: Fix multicast mac configuration
	sctp: fix memleak on err handling of stream initialization
	sctp: fully initialize v4 addr in some functions
	selftests: forwarding: Delete IPv6 address at the end
	neighbour: remove neigh_cleanup() method
	bonding: fix bond_neigh_init()
	net: ena: fix default tx interrupt moderation interval
	net: ena: fix issues in setting interrupt moderation params in ethtool
	dpaa2-ptp: fix double free of the ptp_qoriq IRQ
	mlxsw: spectrum_router: Remove unlikely user-triggerable warning
	net: ethernet: ti: davinci_cpdma: fix warning "device driver frees DMA memory with different size"
	net: stmmac: platform: Fix MDIO init for platforms without PHY
	net: dsa: b53: Fix egress flooding settings
	NFC: nxp-nci: Fix probing without ACPI
	btrfs: don't double lock the subvol_sem for rename exchange
	btrfs: do not call synchronize_srcu() in inode_tree_del
	Btrfs: make tree checker detect checksum items with overlapping ranges
	btrfs: return error pointer from alloc_test_extent_buffer
	Btrfs: fix missing data checksums after replaying a log tree
	btrfs: send: remove WARN_ON for readonly mount
	btrfs: abort transaction after failed inode updates in create_subvol
	btrfs: skip log replay on orphaned roots
	btrfs: do not leak reloc root if we fail to read the fs root
	btrfs: handle ENOENT in btrfs_uuid_tree_iterate
	Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
	ALSA: pcm: Avoid possible info leaks from PCM stream buffers
	ALSA: hda/ca0132 - Keep power on during processing DSP response
	ALSA: hda/ca0132 - Avoid endless loop
	ALSA: hda/ca0132 - Fix work handling in delayed HP detection
	drm/vc4/vc4_hdmi: fill in connector info
	drm/virtio: switch virtio_gpu_wait_ioctl() to gem helper.
	drm: mst: Fix query_payload ack reply struct
	drm/mipi-dbi: fix a loop in debugfs code
	drm/panel: Add missing drm_panel_init() in panel drivers
	drm: exynos: exynos_hdmi: use cec_notifier_conn_(un)register
	drm: Use EOPNOTSUPP, not ENOTSUPP
	drm/amd/display: verify stream link before link test
	drm/bridge: analogix-anx78xx: silence -EPROBE_DEFER warnings
	drm/amd/display: OTC underflow fix
	iio: max31856: add missing of_node and parent references to iio_dev
	iio: light: bh1750: Resolve compiler warning and make code more readable
	drm/amdgpu/sriov: add ring_stop before ring_create in psp v11 code
	drm/amdgpu: grab the id mgr lock while accessing passid_mapping
	drm/ttm: return -EBUSY on pipelining with no_gpu_wait (v2)
	drm/amd/display: Rebuild mapped resources after pipe split
	ath10k: add cleanup in ath10k_sta_state()
	drm/amd/display: Handle virtual signal type in disable_link()
	ath10k: Check if station exists before forwarding tx airtime report
	spi: Add call to spi_slave_abort() function when spidev driver is released
	drm/meson: vclk: use the correct G12A frac max value
	staging: rtl8192u: fix multiple memory leaks on error path
	staging: rtl8188eu: fix possible null dereference
	rtlwifi: prevent memory leak in rtl_usb_probe
	libertas: fix a potential NULL pointer dereference
	Revert "pinctrl: sh-pfc: r8a77990: Fix MOD_SEL1 bit30 when using SSI_SCK2 and SSI_WS2"
	Revert "pinctrl: sh-pfc: r8a77990: Fix MOD_SEL1 bit31 when using SIM0_D"
	ath10k: fix backtrace on coredump
	IB/iser: bound protection_sg size by data_sg size
	drm/komeda: Workaround for broken FLIP_COMPLETE timestamps
	spi: gpio: prevent memory leak in spi_gpio_probe
	media: am437x-vpfe: Setting STD to current value is not an error
	media: cedrus: fill in bus_info for media device
	media: seco-cec: Add a missing 'release_region()' in an error handling path
	media: vim2m: Fix abort issue
	media: vim2m: Fix BUG_ON in vim2m_device_release()
	media: max2175: Fix build error without CONFIG_REGMAP_I2C
	media: ov6650: Fix control handler not freed on init error
	media: i2c: ov2659: fix s_stream return value
	media: ov6650: Fix crop rectangle alignment not passed back
	media: i2c: ov2659: Fix missing 720p register config
	media: ov6650: Fix stored frame format not in sync with hardware
	media: ov6650: Fix stored crop rectangle not in sync with hardware
	tools/power/cpupower: Fix initializer override in hsw_ext_cstates
	media: venus: core: Fix msm8996 frequency table
	ath10k: fix offchannel tx failure when no ath10k_mac_tx_frm_has_freq
	media: vimc: Fix gpf in rmmod path when stream is active
	drm/amd/display: Set number of pipes to 1 if the second pipe was disabled
	pinctrl: devicetree: Avoid taking direct reference to device name string
	drm/sun4i: dsi: Fix TCON DRQ set bits
	drm/amdkfd: fix a potential NULL pointer dereference (v2)
	x86/math-emu: Check __copy_from_user() result
	drm/amd/powerplay: A workaround to GPU RESET on APU
	selftests/bpf: Correct path to include msg + path
	drm/amd/display: set minimum abm backlight level
	media: venus: Fix occasionally failures to suspend
	rtw88: fix NSS of hw_cap
	drm/amd/display: fix struct init in update_bounding_box
	usb: renesas_usbhs: add suspend event support in gadget mode
	crypto: aegis128-neon - use Clang compatible cflags for ARM
	hwrng: omap3-rom - Call clk_disable_unprepare() on exit only if not idled
	regulator: max8907: Fix the usage of uninitialized variable in max8907_regulator_probe()
	tools/memory-model: Fix data race detection for unordered store and load
	media: flexcop-usb: fix NULL-ptr deref in flexcop_usb_transfer_init()
	media: cec-funcs.h: add status_req checks
	media: meson/ao-cec: move cec_notifier_cec_adap_register after hw setup
	drm/bridge: dw-hdmi: Refuse DDC/CI transfers on the internal I2C controller
	samples: pktgen: fix proc_cmd command result check logic
	block: Fix writeback throttling W=1 compiler warnings
	drm/amdkfd: Fix MQD size calculation
	MIPS: futex: Emit Loongson3 sync workarounds within asm
	mwifiex: pcie: Fix memory leak in mwifiex_pcie_init_evt_ring
	drm/drm_vblank: Change EINVAL by the correct errno
	selftests/bpf: Fix btf_dump padding test case
	libbpf: Fix struct end padding in btf_dump
	libbpf: Fix passing uninitialized bytes to setsockopt
	net/smc: increase device refcount for added link group
	team: call RCU read lock when walking the port_list
	media: cx88: Fix some error handling path in 'cx8800_initdev()'
	crypto: inside-secure - Fix a maybe-uninitialized warning
	crypto: aegis128/simd - build 32-bit ARM for v8 architecture explicitly
	misc: fastrpc: fix memory leak from miscdev->name
	ASoC: SOF: enable sync_write in hdac_bus
	media: ti-vpe: vpe: Fix Motion Vector vpdma stride
	media: ti-vpe: vpe: fix a v4l2-compliance warning about invalid pixel format
	media: ti-vpe: vpe: fix a v4l2-compliance failure about frame sequence number
	media: ti-vpe: vpe: Make sure YUYV is set as default format
	media: ti-vpe: vpe: fix a v4l2-compliance failure causing a kernel panic
	media: ti-vpe: vpe: ensure buffers are cleaned up properly in abort cases
	drm/amd/display: Properly round nominal frequency for SPD
	drm/amd/display: wait for set pipe mcp command completion
	media: ti-vpe: vpe: fix a v4l2-compliance failure about invalid sizeimage
	drm/amd/display: add new active dongle to existent w/a
	syscalls/x86: Use the correct function type in SYSCALL_DEFINE0
	drm/amd/display: Fix dongle_caps containing stale information.
	extcon: sm5502: Reset registers during initialization
	drm/amd/display: Program DWB watermarks from correct state
	x86/mm: Use the correct function type for native_set_fixmap()
	ath10k: Correct error handling of dma_map_single()
	rtw88: coex: Set 4 slot mode for A2DP
	drm/bridge: dw-hdmi: Restore audio when setting a mode
	perf test: Report failure for mmap events
	perf report: Add warning when libunwind not compiled in
	perf test: Avoid infinite loop for task exit case
	perf vendor events arm64: Fix Hisi hip08 DDRC PMU eventname
	usb: usbfs: Suppress problematic bind and unbind uevents.
	drm/amd/powerplay: avoid disabling ECC if RAS is enabled for VEGA20
	iio: adc: max1027: Reset the device at probe time
	Bluetooth: btusb: avoid unused function warning
	Bluetooth: missed cpu_to_le16 conversion in hci_init4_req
	Bluetooth: Workaround directed advertising bug in Broadcom controllers
	Bluetooth: hci_core: fix init for HCI_USER_CHANNEL
	bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
	x86/mce: Lower throttling MCE messages' priority to warning
	drm/amd/display: enable hostvm based on roimmu active for dcn2.1
	drm/amd/display: fix header for RN clk mgr
	drm/amdgpu: fix amdgpu trace event print string format error
	staging: iio: ad9834: add a check for devm_clk_get
	power: supply: cpcap-battery: Check voltage before orderly_poweroff
	perf tests: Disable bp_signal testing for arm64
	selftests/bpf: Make a copy of subtest name
	net: hns3: log and clear hardware error after reset complete
	RDMA/hns: Fix wrong parameters when initial mtt of srq->idx_que
	drm/gma500: fix memory disclosures due to uninitialized bytes
	ASoC: soc-pcm: fixup dpcm_prune_paths() loop continue
	rtl8xxxu: fix RTL8723BU connection failure issue after warm reboot
	RDMA/siw: Fix SQ/RQ drain logic
	ipmi: Don't allow device module unload when in use
	x86/ioapic: Prevent inconsistent state when moving an interrupt
	media: cedrus: Fix undefined shift with a SHIFT_AND_MASK_BITS macro
	media: aspeed: set hsync and vsync polarities to normal before starting mode detection
	drm/nouveau: Don't grab runtime PM refs for HPD IRQs
	media: ov6650: Fix stored frame interval not in sync with hardware
	media: ad5820: Define entity function
	media: ov5640: Make 2592x1944 mode only available at 15 fps
	media: st-mipid02: add a check for devm_gpiod_get_optional
	media: imx7-mipi-csis: Add a check for devm_regulator_get
	media: aspeed: clear garbage interrupts
	media: smiapp: Register sensor after enabling runtime PM on the device
	md: no longer compare spare disk superblock events in super_load
	staging: wilc1000: potential corruption in wilc_parse_join_bss_param()
	md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
	drm: Don't free jobs in wait_event_interruptible()
	EDAC/amd64: Set grain per DIMM
	arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
	drm/amd/display: setting the DIG_MODE to the correct value.
	i40e: initialize ITRN registers with correct values
	drm/amd/display: correctly populate dpp refclk in fpga
	i40e: Wrong 'Advertised FEC modes' after set FEC to AUTO
	net: phy: dp83867: enable robust auto-mdix
	drm/tegra: sor: Use correct SOR index on Tegra210
	regulator: core: Release coupled_rdevs on regulator_init_coupling() error
	ubsan, x86: Annotate and allow __ubsan_handle_shift_out_of_bounds() in uaccess regions
	spi: sprd: adi: Add missing lock protection when rebooting
	ACPI: button: Add DMI quirk for Medion Akoya E2215T
	RDMA/qedr: Fix memory leak in user qp and mr
	RDMA/hns: Fix memory leak on 'context' on error return path
	RDMA/qedr: Fix srqs xarray initialization
	RDMA/core: Set DMA parameters correctly
	staging: wilc1000: check if device is initialzied before changing vif
	gpu: host1x: Allocate gather copy for host1x
	net: dsa: LAN9303: select REGMAP when LAN9303 enable
	phy: renesas: phy-rcar-gen2: Fix the array off by one warning
	phy: qcom-usb-hs: Fix extcon double register after power cycle
	s390/time: ensure get_clock_monotonic() returns monotonic values
	s390: add error handling to perf_callchain_kernel
	s390/mm: add mm_pxd_folded() checks to pxd_free()
	net: hns3: add struct netdev_queue debug info for TX timeout
	libata: Ensure ata_port probe has completed before detach
	loop: fix no-unmap write-zeroes request behavior
	net/mlx5e: Verify that rule has at least one fwd/drop action
	pinctrl: sh-pfc: sh7734: Fix duplicate TCLK1_B
	ALSA: bebob: expand sleep just after breaking connections for protocol version 1
	iio: dln2-adc: fix iio_triggered_buffer_postenable() position
	libbpf: Fix error handling in bpf_map__reuse_fd()
	Bluetooth: Fix advertising duplicated flags
	ALSA: pcm: Fix missing check of the new non-cached buffer type
	spi: sifive: disable clk when probe fails and remove
	ASoC: SOF: imx: fix reverse CONFIG_SND_SOC_SOF_OF dependency
	pinctrl: qcom: sc7180: Add missing tile info in SDC_QDSD_PINGROUP/UFS_RESET
	pinctrl: amd: fix __iomem annotation in amd_gpio_irq_handler()
	ixgbe: protect TX timestamping from API misuse
	cpufreq: sun50i: Fix CPU speed bin detection
	media: rcar_drif: fix a memory disclosure
	media: v4l2-core: fix touch support in v4l_g_fmt
	nvme: introduce "Command Aborted By host" status code
	media: staging/imx: Use a shorter name for driver
	nvmem: imx-ocotp: reset error status on probe
	nvmem: core: fix nvmem_cell_write inline function
	ASoC: SOF: topology: set trigger order for FE DAI link
	media: vivid: media_device_cleanup was called too early
	spi: dw: Fix Designware SPI loopback
	bnx2x: Fix PF-VF communication over multi-cos queues.
	spi: img-spfi: fix potential double release
	ALSA: timer: Limit max amount of slave instances
	RDMA/core: Fix return code when modify_port isn't supported
	drm: msm: a6xx: fix debug bus register configuration
	rtlwifi: fix memory leak in rtl92c_set_fw_rsvdpagepkt()
	perf probe: Fix to find range-only function instance
	perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR
	perf probe: Fix to list probe event with correct line number
	perf jevents: Fix resource leak in process_mapfile() and main()
	perf probe: Walk function lines in lexical blocks
	perf probe: Fix to probe an inline function which has no entry pc
	perf probe: Fix to show ranges of variables in functions without entry_pc
	perf probe: Fix to show inlined function callsite without entry_pc
	libsubcmd: Use -O0 with DEBUG=1
	perf probe: Fix to probe a function which has no entry pc
	perf tools: Fix cross compile for ARM64
	perf tools: Splice events onto evlist even on error
	drm/amdgpu: disallow direct upload save restore list from gfx driver
	drm/amd/powerplay: fix struct init in renoir_print_clk_levels
	drm/amdgpu: fix potential double drop fence reference
	ice: Check for null pointer dereference when setting rings
	xen/gntdev: Use select for DMA_SHARED_BUFFER
	perf parse: If pmu configuration fails free terms
	perf probe: Skip overlapped location on searching variables
	net: avoid potential false sharing in neighbor related code
	perf probe: Return a better scope DIE if there is no best scope
	perf probe: Fix to show calling lines of inlined functions
	perf probe: Skip end-of-sequence and non statement lines
	perf probe: Filter out instances except for inlined subroutine and subprogram
	libbpf: Fix negative FD close() in xsk_setup_xdp_prog()
	s390/bpf: Use kvcalloc for addrs array
	cgroup: freezer: don't change task and cgroups status unnecessarily
	selftests: proc: Make va_max 1MB
	drm/amdgpu: Avoid accidental thread reactivation.
	media: exynos4-is: fix wrong mdev and v4l2 dev order in error path
	ath10k: fix get invalid tx rate for Mesh metric
	fsi: core: Fix small accesses and unaligned offsets via sysfs
	selftests: net: Fix printf format warnings on arm
	media: pvrusb2: Fix oops on tear-down when radio support is not present
	soundwire: intel: fix PDI/stream mapping for Bulk
	crypto: atmel - Fix authenc support when it is set to m
	ice: delay less
	media: si470x-i2c: add missed operations in remove
	media: cedrus: Use helpers to access capture queue
	media: v4l2-ctrl: Lock main_hdl on operations of requests_queued.
	iio: cros_ec_baro: set info_mask_shared_by_all_available field
	EDAC/ghes: Fix grain calculation
	media: vicodec: media_device_cleanup was called too early
	media: vim2m: media_device_cleanup was called too early
	spi: pxa2xx: Add missed security checks
	ASoC: rt5677: Mark reg RT5677_PWR_ANLG2 as volatile
	iio: dac: ad5446: Add support for new AD5600 DAC
	bpf, testing: Workaround a verifier failure for test_progs
	ASoC: Intel: kbl_rt5663_rt5514_max98927: Add dmic format constraint
	net: dsa: sja1105: Disallow management xmit during switch reset
	r8169: respect EEE user setting when restarting network
	s390/disassembler: don't hide instruction addresses
	net: ethernet: ti: Add dependency for TI_DAVINCI_EMAC
	nvme: Discard workaround for non-conformant devices
	parport: load lowlevel driver if ports not found
	bcache: fix static checker warning in bcache_device_free()
	cpufreq: Register drivers only after CPU devices have been registered
	qtnfmac: fix debugfs support for multiple cards
	qtnfmac: fix invalid channel information output
	x86/crash: Add a forward declaration of struct kimage
	qtnfmac: fix using skb after free
	RDMA/efa: Clear the admin command buffer prior to its submission
	tracing: use kvcalloc for tgid_map array allocation
	MIPS: ralink: enable PCI support only if driver for mt7621 SoC is selected
	tracing/kprobe: Check whether the non-suffixed symbol is notrace
	bcache: fix deadlock in bcache_allocator
	iwlwifi: mvm: fix unaligned read of rx_pkt_status
	ASoC: wm8904: fix regcache handling
	regulator: core: Let boot-on regulators be powered off
	spi: tegra20-slink: add missed clk_unprepare
	tun: fix data-race in gro_normal_list()
	xhci-pci: Allow host runtime PM as default also for Intel Ice Lake xHCI
	crypto: virtio - deal with unsupported input sizes
	mmc: tmio: Add MMC_CAP_ERASE to allow erase/discard/trim requests
	btrfs: don't prematurely free work in end_workqueue_fn()
	btrfs: don't prematurely free work in run_ordered_work()
	sched/uclamp: Fix overzealous type replacement
	ASoC: wm2200: add missed operations in remove and probe failure
	spi: st-ssc4: add missed pm_runtime_disable
	ASoC: wm5100: add missed pm_runtime_disable
	perf/core: Fix the mlock accounting, again
	selftests, bpf: Fix test_tc_tunnel hanging
	selftests, bpf: Workaround an alu32 sub-register spilling issue
	bnxt_en: Return proper error code for non-existent NVM variable
	net: phy: avoid matching all-ones clause 45 PHY IDs
	firmware_loader: Fix labels with comma for builtin firmware
	ASoC: Intel: bytcr_rt5640: Update quirk for Acer Switch 10 SW5-012 2-in-1
	x86/insn: Add some Intel instructions to the opcode map
	net-af_xdp: Use correct number of channels from ethtool
	brcmfmac: remove monitor interface when detaching
	perf session: Fix decompression of PERF_RECORD_COMPRESSED records
	perf probe: Fix to show function entry line as probe-able
	s390/crypto: Fix unsigned variable compared with zero
	s390/kasan: support memcpy_real with TRACE_IRQFLAGS
	bnxt_en: Improve RX buffer error handling.
	iwlwifi: check kasprintf() return value
	fbtft: Make sure string is NULL terminated
	ASoC: soc-pcm: check symmetry before hw_params
	net: ethernet: ti: ale: clean ale tbl on init and intf restart
	mt76: fix possible out-of-bound access in mt7615_fill_txs/mt7603_fill_txs
	s390/cpumf: Adjust registration of s390 PMU device drivers
	crypto: sun4i-ss - Fix 64-bit size_t warnings
	crypto: sun4i-ss - Fix 64-bit size_t warnings on sun4i-ss-hash.c
	mac80211: consider QoS Null frames for STA_NULLFUNC_ACKED
	crypto: vmx - Avoid weird build failures
	libtraceevent: Fix memory leakage in copy_filter_type
	mips: fix build when "48 bits virtual memory" is enabled
	drm/amdgpu: fix bad DMA from INTERRUPT_CNTL2
	ice: Only disable VF state when freeing each VF resources
	ice: Fix setting coalesce to handle DCB configuration
	net: phy: initialise phydev speed and duplex sanely
	tools, bpf: Fix build for 'make -s tools/bpf O=<dir>'
	RDMA/bnxt_re: Fix missing le16_to_cpu
	RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices
	bpf: Provide better register bounds after jmp32 instructions
	RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series
	ibmvnic: Fix completion structure initialization
	net: wireless: intel: iwlwifi: fix GRO_NORMAL packet stalling
	MIPS: futex: Restore \n after sync instructions
	btrfs: don't prematurely free work in reada_start_machine_worker()
	btrfs: don't prematurely free work in scrub_missing_raid56_worker()
	Revert "mmc: sdhci: Fix incorrect switch to HS mode"
	mmc: mediatek: fix CMD_TA to 2 for MT8173 HS200/HS400 mode
	tpm_tis: reserve chip for duration of tpm_tis_core_init
	tpm: fix invalid locking in NONBLOCKING mode
	iommu: fix KASAN use-after-free in iommu_insert_resv_region
	iommu: set group default domain before creating direct mappings
	iommu/vt-d: Fix dmar pte read access not set error
	iommu/vt-d: Set ISA bridge reserved region as relaxable
	iommu/vt-d: Allocate reserved region for ISA with correct permission
	can: xilinx_can: Fix missing Rx can packets on CANFD2.0
	can: m_can: tcan4x5x: add required delay after reset
	can: j1939: j1939_sk_bind(): take priv after lock is held
	can: flexcan: fix possible deadlock and out-of-order reception after wakeup
	can: flexcan: poll MCR_LPM_ACK instead of GPR ACK for stop mode acknowledgment
	can: kvaser_usb: kvaser_usb_leaf: Fix some info-leaks to USB devices
	selftests: net: tls: remove recv_rcvbuf test
	spi: dw: Correct handling of native chipselect
	spi: cadence: Correct handling of native chipselect
	usb: xhci: Fix build warning seen with CONFIG_PM=n
	drm/amdgpu: fix uninitialized variable pasid_mapping_needed
	ath10k: Revert "ath10k: add cleanup in ath10k_sta_state()"
	RDMA/siw: Fix post_recv QP state locking
	md: avoid invalid memory access for array sb->dev_roles
	s390/ftrace: fix endless recursion in function_graph tracer
	ARM: dts: Fix vcsi regulator to be always-on for droid4 to prevent hangs
	can: flexcan: add low power enter/exit acknowledgment helper
	usbip: Fix receive error in vhci-hcd when using scatter-gather
	usbip: Fix error path of vhci_recv_ret_submit()
	spi: fsl: don't map irq during probe
	spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
	efi/memreserve: Register reservations as 'reserved' in /proc/iomem
	cpufreq: Avoid leaving stale IRQ work items during CPU offline
	KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails
	mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG
	USB: EHCI: Do not return -EPIPE when hub is disconnected
	intel_th: pci: Add Comet Lake PCH-V support
	intel_th: pci: Add Elkhart Lake SOC support
	intel_th: Fix freeing IRQs
	intel_th: msu: Fix window switching without windows
	platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
	staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value
	tty/serial: atmel: fix out of range clock divider handling
	serial: sprd: Add clearing break interrupt operation
	pinctrl: baytrail: Really serialize all register accesses
	clk: imx: clk-imx7ulp: Add missing sentinel of ulp_div_table
	clk: imx: clk-composite-8m: add lock to gate/mux
	clk: imx: pll14xx: fix clk_pll14xx_wait_lock
	ext4: fix ext4_empty_dir() for directories with holes
	ext4: check for directory entries too close to block end
	ext4: unlock on error in ext4_expand_extra_isize()
	ext4: validate the debug_want_extra_isize mount option at parse time
	iocost: over-budget forced IOs should schedule async delay
	KVM: PPC: Book3S HV: Fix regression on big endian hosts
	kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
	kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD
	KVM: arm/arm64: Properly handle faulting of device mappings
	KVM: arm64: Ensure 'params' is initialised when looking up sys register
	x86/intel: Disable HPET on Intel Coffee Lake H platforms
	x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()
	x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
	x86/mce: Fix possibly incorrect severity calculation on AMD
	powerpc/vcpu: Assume dedicated processors as non-preempt
	powerpc/irq: fix stack overflow verification
	ocxl: Fix concurrent AFU open and device removal
	mmc: sdhci-msm: Correct the offset and value for DDR_CONFIG register
	mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add erratum A-009204 support"
	mmc: sdhci: Update the tuning failed messages to pr_debug level
	mmc: sdhci-of-esdhc: fix P2020 errata handling
	mmc: sdhci: Workaround broken command queuing on Intel GLK
	mmc: sdhci: Add a quirk for broken command queuing
	nbd: fix shutdown and recv work deadlock v2
	iwlwifi: pcie: move power gating workaround earlier in the flow
	Linux 5.4.7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3585238149235bf73bb453e25861d9a6b9193dfa
2019-12-31 17:56:13 +01:00
Tejun Heo
377a8744c3 iocost: over-budget forced IOs should schedule async delay
commit d7bd15a138aef3be227818aad9c501e43c89c8c5 upstream.

When over-budget IOs are force-issued through root cgroup,
iocg_kick_delay() adjusts the async delay accordingly but doesn't
actually schedule async throttle for the issuing task.  This bug is
pretty well masked because sooner or later the offending threads are
gonna get directly throttled on regular IOs or have async delay
scheduled by mem_cgroup_throttle_swaprate().

However, it can affect control quality on filesystem metadata heavy
operations.  Let's fix it by invoking blkcg_schedule_throttle() when
iocg_kick_delay() says async delay is needed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31 16:46:19 +01:00
Greg Kroah-Hartman
63dd580f09 This is the 5.4.6 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl397mMACgkQONu9yGCS
 aT4pqxAAu5o3/xNNdKcfR6GkobAEHq/62MuBpQmqt3Ii8G9JSleASQmsYyZZVHI6
 JiMrmWiULJ1T2R8lUz8Li2PumIvmPSaW6H9VjiaaGz6U8XSJZzrxhgdNkibz4uzJ
 JhNaikLo4mWz3p/O7OkvFoHjMGmgc3m2rjuukMOUokQCQFYMrKms8FjFzKHk/87H
 RfvIHtwcvL6aVv3BzHXC8od3mC69uE56+zdB9lBYPeH38EjShtcd58jsnLYusSnM
 BqFChcYqHC5WHZ8WfWcLmcmgkaKotoxzhgX/YUpXo1pScfxht3rHqE0h/XB/aPFR
 U8V7gT1af0cbumB7bHDKje7ROyPQUoxrrsZtYYct0DS831sAFTfiTac4NlyElnIv
 PgknutRdK3FwD+GjI5BCHveGyiucKQhet+qTglzg8J942PQjdy7Bl8tIV2smqHrA
 0/I66jcusr1G7leD/QjAc8Sli6tX9LJslW/SJXlgdxdfQ9bptRnl8aOfh3ZQILOS
 lqCm297APCi4bVS4YFrLg6kc+FPMtAW7+oIXarbhNtxgW1P9vuNOhS0c7qRH/51i
 j6WPjqfJA0itcKI3bZUlaZR6kzKhkgTnvy6phIxMpaQxdkNh+vuBvLzK/dP9o0Hv
 sgZ+fcAZjqzAKjOFJ1kXBNv9AN2mGKN9cw+6zvngcybMv9aKUWk=
 =V0vt
 -----END PGP SIGNATURE-----

Merge 5.4.6 into android-5.4

Changes in 5.4.6
	USB: Fix incorrect DMA allocations for local memory pool drivers
	mmc: block: Make card_busy_detect() a bit more generic
	mmc: block: Add CMD13 polling for MMC IOCTLS with R1B response
	mmc: core: Drop check for mmc_card_is_removable() in mmc_rescan()
	mmc: core: Re-work HW reset for SDIO cards
	PCI/switchtec: Read all 64 bits of part_event_bitmap
	PCI/PM: Always return devices to D0 when thawing
	PCI: pciehp: Avoid returning prematurely from sysfs requests
	PCI: Fix Intel ACS quirk UPDCR register address
	PCI/MSI: Fix incorrect MSI-X masking on resume
	PCI: Do not use bus number zero from EA capability
	PCI: rcar: Fix missing MACCTLR register setting in initialization sequence
	PCI: Apply Cavium ACS quirk to ThunderX2 and ThunderX3
	PM / QoS: Redefine FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX
	block: fix "check bi_size overflow before merge"
	xtensa: use MEMBLOCK_ALLOC_ANYWHERE for KASAN shadow map
	gfs2: Multi-block allocations in gfs2_page_mkwrite
	gfs2: fix glock reference problem in gfs2_trans_remove_revoke
	xtensa: fix TLB sanity checker
	xtensa: fix syscall_set_return_value
	rpmsg: glink: Set tail pointer to 0 at end of FIFO
	rpmsg: glink: Fix reuse intents memory leak issue
	rpmsg: glink: Fix use after free in open_ack TIMEOUT case
	rpmsg: glink: Put an extra reference during cleanup
	rpmsg: glink: Fix rpmsg_register_device err handling
	rpmsg: glink: Don't send pending rx_done during remove
	rpmsg: glink: Free pending deferred work on remove
	cifs: smbd: Return -EAGAIN when transport is reconnecting
	cifs: smbd: Only queue work for error recovery on memory registration
	cifs: smbd: Add messages on RDMA session destroy and reconnection
	cifs: smbd: Return -EINVAL when the number of iovs exceeds SMBDIRECT_MAX_SGE
	cifs: smbd: Return -ECONNABORTED when trasnport is not in connected state
	cifs: Don't display RDMA transport on reconnect
	CIFS: Respect O_SYNC and O_DIRECT flags during reconnect
	CIFS: Close open handle after interrupted close
	CIFS: Do not miss cancelled OPEN responses
	CIFS: Fix NULL pointer dereference in mid callback
	cifs: Fix retrieval of DFS referrals in cifs_mount()
	ARM: dts: s3c64xx: Fix init order of clock providers
	ARM: tegra: Fix FLOW_CTLR_HALT register clobbering by tegra_resume()
	vfio/pci: call irq_bypass_unregister_producer() before freeing irq
	dma-buf: Fix memory leak in sync_file_merge()
	drm/panfrost: Fix a race in panfrost_ioctl_madvise()
	drm/panfrost: Fix a BO leak in panfrost_ioctl_mmap_bo()
	drm/panfrost: Fix a race in panfrost_gem_free_object()
	drm/mgag200: Extract device type from flags
	drm/mgag200: Store flags from PCI driver data in device structure
	drm/mgag200: Add workaround for HW that does not support 'startadd'
	drm/mgag200: Flag all G200 SE A machines as broken wrt <startadd>
	drm: meson: venc: cvbs: fix CVBS mode matching
	dm mpath: remove harmful bio-based optimization
	dm btree: increase rebalance threshold in __rebalance2()
	dm clone metadata: Track exact changes per transaction
	dm clone metadata: Use a two phase commit
	dm clone: Flush destination device before committing metadata
	dm thin metadata: Add support for a pre-commit callback
	dm thin: Flush data device before committing metadata
	scsi: ufs: Disable autohibern8 feature in Cadence UFS
	scsi: iscsi: Fix a potential deadlock in the timeout handler
	scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd
	scsi: qla2xxx: Initialize free_work before flushing it
	scsi: qla2xxx: Added support for MPI and PEP regions for ISP28XX
	scsi: qla2xxx: Change discovery state before PLOGI
	scsi: qla2xxx: Correctly retrieve and interpret active flash region
	scsi: qla2xxx: Fix incorrect SFUB length used for Secure Flash Update MB Cmd
	drm/nouveau/kms/nv50-: Call outp_atomic_check_view() before handling PBN
	drm/nouveau/kms/nv50-: Store the bpc we're using in nv50_head_atom
	drm/nouveau/kms/nv50-: Limit MST BPC to 8
	drm/i915/fbc: Disable fbc by default on all glk+
	drm/radeon: fix r1xx/r2xx register checker for POT textures
	drm/dp_mst: Correct the bug in drm_dp_update_payload_part1()
	drm/amd/display: re-enable wait in pipelock, but add timeout
	drm/amd/display: add default clocks if not able to fetch them
	drm/amdgpu: initialize vm_inv_eng0_sem for gfxhub and mmhub
	drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10
	drm/amdgpu/gfx10: explicitly wait for cp idle after halt/unhalt
	drm/amdgpu/gfx10: re-init clear state buffer after gpu reset
	drm/i915/gvt: Fix cmd length check for MI_ATOMIC
	drm/amdgpu: avoid using invalidate semaphore for picasso
	drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9
	ALSA: hda: Fix regression by strip mask fix
	Linux 5.4.6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ieb34254e2c00cee10d0012aaaf8fef323280c04f
2019-12-21 11:19:45 +01:00
Andreas Gruenbacher
06ad673b6c block: fix "check bi_size overflow before merge"
commit cc90bc68422318eb8e75b15cd74bc8d538a7df29 upstream.

This partially reverts commit e3a5d8e386.

Commit e3a5d8e386 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page.  This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached.  Instead, what we want here is only
the bi_size overflow check.

Fixes: e3a5d8e386 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21 11:04:32 +01:00
Greg Kroah-Hartman
7dc1159904 This is the 5.4.4 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl35Ju4ACgkQONu9yGCS
 aT7IMQ//Vs+kE7VCDeHfzNu4ivh3+X0bleYWo/OPxNQRFx7IL4Lb2jVjWrMnAGld
 u8H31nVaL28bhJAcQJg+O/OD3AXWQ8nzRKixfVfwd669Zv1ggaLM8MW9xuTBwmQs
 ciFrQ4GirfjvUP0VdkpsSLp+bbi61nkqGB4WHgX3gL+8LWxtpE8nFlh+8P+3TT9Z
 h4BuYQHhGF8JyhOFKhfWbWAFUoHLsR7Aa7TPOeQPXGfzWBaSfISA/UVo6N7QGpVw
 4DP5rc4Wb+tzNAJHkkgzbSDzuM6L6BHrO3SQCisd0Viw9XYO9M0U9NVbExtu9x9l
 M/sqE61cX1AxUmjstdgca37vyHCntJ+wMT7WRnOBVJfodZDjB/6wzPz8rWIZrSX+
 +uXSJF8Xpp5FuvB0td0CQoyUsnrMDnwWbjnMnlVXY6mPcgD5fgB9dTDJN3FEQ8t6
 MixN3JBy1KseY9ihe9cINaR0CBKqyogOwbGrqqw3XJNWzpJ5xgxhLuSc2PyuDLwZ
 bDaj7K9Ukx4n0QzXWvW5pZTQi+yNVc7DUVOKFDXTf2xR2/KvEVNW25gL7oBxrTTu
 q5gw4bBkO0KsbgofTMWUyUfO3aYhDqc7H4yfRRbc/Q08hiJHORFC2jhLXG9oubNN
 QrZ4AxJvzRSYl5vwrVpHW9vnlPe5SuJvRbkMN8164togc9BMNbc=
 =qUmE
 -----END PGP SIGNATURE-----

Merge 5.4.4 into android-5.4

Changes in 5.4.4
	usb: gadget: configfs: Fix missing spin_lock_init()
	usb: gadget: pch_udc: fix use after free
	nvme: Namepace identification descriptor list is optional
	Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
	scsi: lpfc: Fix bad ndlp ptr in xri aborted handling
	scsi: zfcp: trace channel log even for FCP command responses
	scsi: qla2xxx: Do command completion on abort timeout
	scsi: qla2xxx: Fix driver unload hang
	scsi: qla2xxx: Fix double scsi_done for abort path
	scsi: qla2xxx: Fix memory leak when sending I/O fails
	compat_ioctl: add compat_ptr_ioctl()
	ceph: fix compat_ioctl for ceph_dir_operations
	media: venus: remove invalid compat_ioctl32 handler
	USB: uas: honor flag to avoid CAPACITY16
	USB: uas: heed CAPACITY_HEURISTICS
	USB: documentation: flags on usb-storage versus UAS
	usb: Allow USB device to be warm reset in suspended state
	usb: host: xhci-tegra: Correct phy enable sequence
	binder: fix incorrect calculation for num_valid
	staging: exfat: fix multiple definition error of `rename_file'
	staging: rtl8188eu: fix interface sanity check
	staging: rtl8712: fix interface sanity check
	staging: vchiq: call unregister_chrdev_region() when driver registration fails
	staging: gigaset: fix general protection fault on probe
	staging: gigaset: fix illegal free on probe errors
	staging: gigaset: add endpoint-type sanity check
	usb: xhci: only set D3hot for pci device
	xhci: Fix memory leak in xhci_add_in_port()
	xhci: fix USB3 device initiated resume race with roothub autosuspend
	xhci: Increase STS_HALT timeout in xhci_suspend()
	xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
	xhci: make sure interrupts are restored to correct state
	interconnect: qcom: sdm845: Walk the list safely on node removal
	interconnect: qcom: qcs404: Walk the list safely on node removal
	usb: common: usb-conn-gpio: Don't log an error on probe deferral
	ARM: dts: pandora-common: define wl1251 as child node of mmc3
	iio: adis16480: Add debugfs_reg_access entry
	iio: imu: st_lsm6dsx: fix ODR check in st_lsm6dsx_write_raw
	iio: adis16480: Fix scales factors
	iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
	iio: imu: inv_mpu6050: fix temperature reporting using bad unit
	iio: adc: ad7606: fix reading unnecessary data from device
	iio: adc: ad7124: Enable internal reference
	USB: atm: ueagle-atm: add missing endpoint check
	USB: idmouse: fix interface sanity checks
	USB: serial: io_edgeport: fix epic endpoint lookup
	usb: roles: fix a potential use after free
	USB: adutux: fix interface sanity check
	usb: core: urb: fix URB structure initialization function
	usb: mon: Fix a deadlock in usbmon between mmap and read
	tpm: add check after commands attribs tab allocation
	tpm: Switch to platform_get_irq_optional()
	EDAC/altera: Use fast register IO for S10 IRQs
	brcmfmac: disable PCIe interrupts before bus reset
	mtd: spear_smi: Fix Write Burst mode
	mtd: rawnand: Change calculating of position page containing BBM
	virt_wifi: fix use-after-free in virt_wifi_newlink()
	virtio-balloon: fix managed page counts when migrating pages between zones
	usb: dwc3: pci: add ID for the Intel Comet Lake -H variant
	usb: dwc3: gadget: Fix logical condition
	usb: dwc3: gadget: Clear started flag for non-IOC
	usb: dwc3: ep0: Clear started flag on completion
	phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
	usb: typec: fix use after free in typec_register_port()
	iwlwifi: pcie: fix support for transmitting SKBs with fraglist
	btrfs: check page->mapping when loading free space cache
	btrfs: use btrfs_block_group_cache_done in update_block_group
	btrfs: use refcount_inc_not_zero in kill_all_nodes
	Btrfs: fix metadata space leak on fixup worker failure to set range as delalloc
	Btrfs: fix negative subv_writers counter and data space leak after buffered write
	btrfs: Avoid getting stuck during cyclic writebacks
	btrfs: Remove btrfs_bio::flags member
	Btrfs: send, skip backreference walking for extents with many references
	btrfs: record all roots for rename exchange on a subvol
	rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
	rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
	rtlwifi: rtl8192de: Fix missing enable interrupt flag
	lib: raid6: fix awk build warnings
	ovl: fix lookup failure on multi lower squashfs
	ovl: fix corner case of non-unique st_dev;st_ino
	ovl: relax WARN_ON() on rename to self
	hwrng: omap - Fix RNG wait loop timeout
	dm writecache: handle REQ_FUA
	dm zoned: reduce overhead of backing device checks
	workqueue: Fix spurious sanity check failures in destroy_workqueue()
	workqueue: Fix pwq ref leak in rescuer_thread()
	ASoC: rt5645: Fixed buddy jack support.
	ASoC: rt5645: Fixed typo for buddy jack support.
	ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
	ASoC: fsl_audmix: Add spin lock to protect tdms
	md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
	blk-mq: avoid sysfs buffer overflow with too many CPU cores
	cgroup: pids: use atomic64_t for pids->limit
	wil6210: check len before memcpy() calls
	ar5523: check NULL before memcpy() in ar5523_cmd()
	s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
	media: hantro: Fix s_fmt for dynamic resolution changes
	media: hantro: Fix motion vectors usage condition
	media: hantro: Fix picture order count table enable
	media: vimc: sen: remove unused kthread_sen field
	media: bdisp: fix memleak on release
	media: radio: wl1273: fix interrupt masking on release
	media: cec.h: CEC_OP_REC_FLAG_ values were swapped
	cpuidle: Do not unset the driver if it is there already
	cpuidle: teo: Ignore disabled idle states that are too deep
	cpuidle: teo: Rename local variable in teo_select()
	cpuidle: teo: Consider hits and misses metrics of disabled states
	cpuidle: teo: Fix "early hits" handling for disabled idle states
	cpuidle: use first valid target residency as poll time
	erofs: zero out when listxattr is called with no xattr
	perf tests: Fix out of bounds memory access
	drm/panfrost: Open/close the perfcnt BO
	powerpc/perf: Disable trace_imc pmu
	intel_th: Fix a double put_device() in error path
	intel_th: pci: Add Ice Lake CPU support
	intel_th: pci: Add Tiger Lake CPU support
	PM / devfreq: Lock devfreq in trans_stat_show
	cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
	ALSA: fireface: fix return value in error path of isochronous resources reservation
	ALSA: oxfw: fix return value in error path of isochronous resources reservation
	ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
	ACPI / utils: Move acpi_dev_get_first_match_dev() under CONFIG_ACPI
	ACPI: LPSS: Add LNXVIDEO -> BYT I2C7 to lpss_device_links
	ACPI: LPSS: Add LNXVIDEO -> BYT I2C1 to lpss_device_links
	ACPI: LPSS: Add dmi quirk for skipping _DEP check for some device-links
	ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge
	ACPI: OSL: only free map once in osl.c
	ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
	ACPI: EC: Rework flushing of pending work
	ACPI: PM: Avoid attaching ACPI PM domain to certain devices
	pinctrl: rza2: Fix gpio name typos
	pinctrl: armada-37xx: Fix irq mask access in armada_37xx_irq_set_type()
	pinctrl: samsung: Add of_node_put() before return in error path
	pinctrl: samsung: Fix device node refcount leaks in Exynos wakeup controller init
	pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
	pinctrl: samsung: Fix device node refcount leaks in init code
	pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
	mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
	ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
	RDMA/core: Fix ib_dma_max_seg_size()
	ppdev: fix PPGETTIME/PPSETTIME ioctls
	stm class: Lose the protocol driver when dropping its reference
	coresight: Serialize enabling/disabling a link device.
	powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
	powerpc/xive: Prevent page fault issues in the machine crash handler
	powerpc: Allow flush_icache_range to work across ranges >4GB
	powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
	video/hdmi: Fix AVI bar unpack
	quota: Check that quota is not dirty before release
	ext2: check err when partial != NULL
	seccomp: avoid overflow in implicit constant conversion
	quota: fix livelock in dquot_writeback_dquots
	ext4: Fix credit estimate for final inode freeing
	reiserfs: fix extended attributes on the root directory
	scsi: qla2xxx: Fix SRB leak on switch command timeout
	scsi: qla2xxx: Fix a dma_pool_free() call
	Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
	iio: ad7949: kill pointless "readback"-handling code
	iio: ad7949: fix channels mixups
	omap: pdata-quirks: revert pandora specific gpiod additions
	omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
	powerpc: Avoid clang warnings around setjmp and longjmp
	powerpc: Fix vDSO clock_getres()
	mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings
	mfd: rk808: Fix RK818 ID template
	mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
	ext4: work around deleting a file with i_nlink == 0 safely
	firmware: qcom: scm: Ensure 'a0' status code is treated as signed
	s390/smp,vdso: fix ASCE handling
	s390/kaslr: store KASLR offset for early dumps
	mm/shmem.c: cast the type of unmap_start to u64
	powerpc: Define arch_is_kernel_initmem_freed() for lockdep
	USB: dummy-hcd: increase max number of devices to 32
	rtc: disable uie before setting time and enable after
	splice: only read in as much information as there is pipe buffer space
	ext4: fix a bug in ext4_wait_for_tail_page_commit
	ext4: fix leak of quota reservations
	blk-mq: make sure that line break can be printed
	workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
	r8169: fix rtl_hw_jumbo_disable for RTL8168evl
	EDAC/ghes: Do not warn when incrementing refcount on 0
	Linux 5.4.4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8949a5fb2fbd836ce34907e70906e3aeb8a58b7c
2019-12-17 20:10:28 +01:00
Ming Lei
e13c3c2196 blk-mq: make sure that line break can be printed
commit d2c9be89f8ebe7ebcc97676ac40f8dec1cf9b43a upstream.

8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.

Fixes: 8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 19:56:53 +01:00
Ming Lei
285b073489 blk-mq: avoid sysfs buffer overflow with too many CPU cores
commit 8962842ca5abdcf98e22ab3b2b45a103f0408b95 upstream.

It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.

Use snprintf to avoid the potential buffer overflow.

This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.

Cc: stable@vger.kernel.org
Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 19:56:14 +01:00
Greg Kroah-Hartman
ad5859c6ae Linux 5.4-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3RzgkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN18H/0JZbfIpy8/4Irol
 0va7Aj2fBi1a5oxfqYsMKN0u3GKbN3OV9tQ+7w1eBNGvL72TGadgVTzTY+Im7A9U
 UjboAc7jDPCG+YhIwXFufMiIAq5jDIj6h0LDas7ALsMfsnI/RhTwgNtLTAkyI3dH
 YV/6ljFULwueJHCxzmrYbd1x39PScj3kCNL2pOe6On7rXMKOemY/nbbYYISxY30E
 GMgKApSS+li7VuSqgrKoq5Qaox26LyR2wrXB1ij4pqEJ9xgbnKRLdHuvXZnE+/5p
 46EMirt+yeSkltW3d2/9MoCHaA76ESzWMMDijLx7tPgoTc3RB3/3ZLsm3rYVH+cR
 cRlNNSk=
 =0+Cg
 -----END PGP SIGNATURE-----

Merge 5.4-rc8 into android-mainline

Linux 5.4-rc8

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1f55e5d34dc78ddb064910ce1e1b7a7b5b39aaba
2019-11-18 08:31:11 +01:00
Jiufei Xue
8b37bc277f iocost: check active_list of all the ancestors in iocg_activate()
There is a bug that checking the same active_list over and over again
in iocg_activate(). The intention of the code was checking whether all
the ancestors and self have already been activated. So fix it.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-14 13:56:54 -07:00
Paolo Valente
478de3380c block, bfq: deschedule empty bfq_queues not referred by any process
Since commit 3726112ec7 ("block, bfq: re-schedule empty queues if
they deserve I/O plugging"), to prevent the service guarantees of a
bfq_queue from being violated, the bfq_queue may be left busy, i.e.,
scheduled for service, even if empty (see comments in
__bfq_bfqq_expire() for details). But, if no process will send
requests to the bfq_queue any longer, then there is no point in
keeping the bfq_queue scheduled for service.

In addition, keeping the bfq_queue scheduled for service, but with no
process reference any longer, may cause the bfq_queue to be freed when
descheduled from service. But this is assumed to never happen, and
causes a UAF if it happens. This, in turn, caused crashes [1, 2].

This commit fixes this issue by descheduling an empty bfq_queue when
it remains with not process reference.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1767539
[2] https://bugzilla.kernel.org/show_bug.cgi?id=205447

Fixes: 3726112ec7 ("block, bfq: re-schedule empty queues if they deserve I/O plugging")
Reported-by: Chris Evich <cevich@redhat.com>
Reported-by: Patrick Dung <patdung100@gmail.com>
Reported-by: Thorsten Schubert <tschubert@bafh.org>
Tested-by: Thorsten Schubert <tschubert@bafh.org>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-14 07:00:54 -07:00
Junichi Nomura
e3a5d8e386 block: check bi_size overflow before merge
__bio_try_merge_page() may merge a page to bio without bio_full() check
and cause bi_size overflow.

The overflow typically ends up with sd_init_command() warning on zero
segment request with call trace like this:

    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180
    CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1
    Workqueue: kblockd blk_mq_run_work_fn
    RIP: 0010:scsi_init_io+0x156/0x180
    RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246
    RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000
    RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118
    RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000
    R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000
    FS:  0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0
    Call Trace:
     sd_init_command+0x326/0xb40 [sd_mod]
     scsi_queue_rq+0x502/0xaa0
     ? blk_mq_get_driver_tag+0xe7/0x120
     blk_mq_dispatch_rq_list+0x256/0x5a0
     ? elv_rb_del+0x24/0x30
     ? deadline_remove_request+0x7b/0xc0
     blk_mq_do_dispatch_sched+0xa3/0x140
     blk_mq_sched_dispatch_requests+0xfb/0x170
     __blk_mq_run_hw_queue+0x81/0x130
     blk_mq_run_work_fn+0x1b/0x20
     process_one_work+0x179/0x390
     worker_thread+0x4f/0x3e0
     kthread+0x105/0x140
     ? max_active_store+0x80/0x80
     ? kthread_bind+0x20/0x20
     ret_from_fork+0x35/0x40
    ---[ end trace f9036abf5af4a4d3 ]---
    blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0
    XFS (sdd1): writeback error on sector 2875552

__bio_try_merge_page() should check the overflow before actually doing
merge.

Fixes: 07173c3ec2 ("block: enable multipage bvecs")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12 07:26:27 -07:00
Greg Kroah-Hartman
682d8bf784 Linux 5.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3IqJQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOiUH+gOEDwid5OODaFAd
 CggXugdFIlBZefKqGVNW5sjgX8pxFWHXuEMC8iNb6QXtQZdFrI6LFf9hhUDmzQtm
 6y1LPxxEiTZjObMEsBNylb7tyzgujFHcAlp0Zro3w/HLCqmYTSP3FF46i2u6KZfL
 XhkpM4X7R7qxlfpdhlfESv/ElRGocZe6SwXfC7pcPo5flFcmkdu9ijqhNd/6CZ/h
 Nf9rTsD/wEDVUelFbgVN+LJzlaB0tsyc4Zbof07n8OsFZjhdEOop8gfM/kTBLcyY
 6bh66SfDScdsNnC/l8csbPjSZRx+i+nQs67DyhGNnsSAFgHBZdC4Tb/2mDCwhCLR
 dUvuYZc=
 =1N6F
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc7' into android-mainline

Linux 5.4-rc7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I505207a0a6f68ccc3519d7f190d8faf25d9d479a
2019-11-11 06:10:55 +01:00
Tejun Heo
b0814361a2 blkcg: make blkcg_print_stat() print stats only for online blkgs
blkcg_print_stat() iterates blkgs under RCU and doesn't test whether
the blkg is online.  This can call into pd_stat_fn() on a pd which is
still being initialized leading to an oops.

The heaviest operation - recursively summing up rwstat counters - is
already done while holding the queue_lock.  Expand queue_lock to cover
the other operations and skip the blkg if it isn't online yet.  The
online state is protected by both blkcg and queue locks, so this
guarantees that only online blkgs are processed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Roman Gushchin <guro@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: 903d23f0a3 ("blk-cgroup: allow controllers to output their own stats")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-06 17:08:38 -07:00
Greg Kroah-Hartman
2a71bdee3f Linux 5.4-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl2/T6oeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGB7kH/2asrl7RylHPQfsI
 pkjPPq32MHLq6Zmr8y0N0hyJCm1Dxu8qCIktqrEV5vsbKdv5pb8EMdQRlhJUtR8e
 TZdYDGHiL0sFb2HYiEvL0qD9BNeI+Kftw/kUffVXRzyMWex/f5S6mW5QNTuv9SQT
 Zfa+sXreFPCCyd3jhQFRyguogaCXBmTYvO6glmc96Yi4nA1URtIxNXhXumoklElF
 8Ka7UqtoJk2nPns+oV9I5xohghgJHHjA3A96WURku1UdO9dRoHiyS05RjnijBxsk
 ffenk09qbGvnvvgP93Q23CoTO8ndIm12ZL8C9jX49CS21j5SVG0PLPhS08f70vEf
 h5K/OtE=
 =nmDP
 -----END PGP SIGNATURE-----

Merge 5.4-rc6 into android-mainline

Linux 5.4-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I211f7159a46cf2a3dbb18afe56777cae1c13ac73
2019-11-04 06:51:47 +01:00
Dan Carpenter
41591a51f0 iocost: don't nest spin_lock_irq in ioc_weight_write()
This code causes a static analysis warning:

    block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'

We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish().  IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-31 11:40:57 -06:00
Satya Tangirala
600e29fce0 FROMLIST: block: blk-crypto for Inline Encryption
We introduce blk-crypto, which manages programming keyslots for struct
bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with
the encryption key, algorithm and data_unit_num; they don't have to worry
about getting a keyslot for each encryption context, as blk-crypto handles
that. Blk-crypto also makes it possible for layered devices like device
mapper to make use of inline encryption hardware.

Blk-crypto delegates crypto operations to inline encryption hardware when
available, and also contains a software fallback to the kernel crypto API.
For more details, refer to Documentation/block/inline-encryption.rst.

Bug: 137270441
Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
Change-Id: I7df59fef0c1e90043b1899c5a95973e23afac0c5
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214731/
2019-10-30 21:59:38 +00:00
Satya Tangirala
d85bf5e127 ANDROID: block: Fix bio_crypt_should_process WARN_ON
bio_crypt_should_process would WARN that the bio did not have a
keyslot in any keyslot manager even when we were on the decrypt path
of blk-crypto, which is a bug. The WARN is now conditional on the
caller being responible for handling encryption rather than blk-crypto
(i.e. the WARN happens only if this function return true).

Bug: 137270441
Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
Change-Id: I01aa7a04a5ab9c9e579bde8a06d095916d880e2c
Signed-off-by: Satya Tangirala <satyat@google.com>
2019-10-30 21:57:08 +00:00
Satya Tangirala
10f1b43cbc FROMLIST: block: Add encryption context to struct bio
We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the filesystem/fscrypt that knows about and manages encryption
contexts. As such, when the filesystem layer submits a bio to the block
layer, and this bio eventually reaches a device driver with support for
inline encryption, the device driver will need to have been told the
encryption context for that bio.

We want to communicate the encryption context from the filesystem layer
to the storage device along with the bio, when the bio is submitted to the
block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which
can represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.

Bug: 137270441
Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
Change-Id: I479de9ec13758f1978b34d897e6956e680caeb92
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214719/
2019-10-30 14:14:59 -07:00
Satya Tangirala
aac6c3decd FROMLIST: block: Keyslot Manager for Inline Encryption
Inline Encryption hardware allows software to specify an encryption context
(an encryption key, crypto algorithm, data unit num, data unit size, etc.)
along with a data transfer request to a storage device, and the inline
encryption hardware will use that context to en/decrypt the data. The
inline encryption hardware is part of the storage device, and it
conceptually sits on the data path between system memory and the storage
device.

Inline Encryption hardware implementations often function around the
concept of "keyslots". These implementations often have a limited number
of "keyslots", each of which can hold an encryption context (we say that
an encryption context can be "programmed" into a keyslot). Requests made
to the storage device may have a keyslot associated with them, and the
inline encryption hardware will en/decrypt the data in the requests using
the encryption context programmed into that associated keyslot. As
keyslots are limited, and programming keys may be expensive in many
implementations, and multiple requests may use exactly the same encryption
contexts, we introduce a Keyslot Manager to efficiently manage keyslots.
The keyslot manager also functions as the interface that upper layers will
use to program keys into inline encryption hardware. For more information
on the Keyslot Manager, refer to documentation found in
block/keyslot-manager.c and linux/keyslot-manager.h.

Bug: 137270441
Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
Change-Id: Iea1ee5a7eec46cb50d33cf1e2d20dfb7335af4ed
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214713/
2019-10-30 13:15:53 -07:00
Tejun Heo
307f4065b9 blk-rq-qos: fix first node deletion of rq_qos_del()
rq_qos_del() incorrectly assigns the node being deleted to the head if
it was the first on the list in the !prev path.  Fix it by iterating
with ** instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-15 10:13:13 -06:00
Tejun Heo
9d179b8654 blkcg: Fix multiple bugs in blkcg_activate_policy()
blkcg_activate_policy() has the following bugs.

* cf09a8ee19 ("blkcg: pass @q and @blkcg into
  blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
  blkcg_activate_policy() ends up using pd's allocated for the root
  blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
  can be passed in pd's which are allocated for the root blkcg.

  For blk-iocost, this means that ->pd_init_fn() can write beyond the
  end of the allocated object as it determines the length of the flex
  array at the end based on the blkcg's nesting level.

* Each pd is initialized as they get allocated.  If alloc fails, the
  policy will get freed with pd's initialized on it.

* After the above partial failure, the partial pds are not freed.

This patch fixes all the above issues by

* Restructuring blkcg_activate_policy() so that alloc and init passes
  are separate.  Init takes place only after all allocs succeeded and
  on failure all allocated pds are freed.

* Unifying and fixing the cleanup of the remaining pd_prealloc.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: cf09a8ee19 ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-15 10:13:00 -06:00
Damien Le Moal
7a7c5e715e block: Fix elv_support_iosched()
A BIO based request queue does not have a tag_set, which prevent testing
for the flag BLK_MQ_F_NO_SCHED indicating that the queue does not
require an elevator. This leads to an incorrect initialization of a
default elevator in some cases such as BIO based null_blk
(queue_mode == BIO) with zoned mode enabled as the default elevator in
this case is mq-deadline instead of "none".

Fix this by testing for a NULL queue mq_ops field which indicates that
the queue is BIO based and should not have an elevator.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-14 13:54:09 -06:00
Harshad Shirwadkar
b84477d3eb blk-wbt: fix performance regression in wbt scale_up/scale_down
scale_up wakes up waiters after scaling up. But after scaling max, it
should not wake up more waiters as waiters will not have anything to
do. This patch fixes this by making scale_up (and also scale_down)
return when threshold is reached.

This bug causes increased fdatasync latency when fdatasync and dd
conv=sync are performed in parallel on 4.19 compared to 4.14. This
bug was introduced during refactoring of blk-wbt code.

Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-06 09:26:41 -06:00
Randy Dunlap
a9eb49c964 block: sed-opal: fix sparse warning: convert __be64 data
sparse warns about incorrect type when using __be64 data.
It is not being converted to CPU-endian but it should be.

Fixes these sparse warnings:

../block/sed-opal.c:375:20: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:375:20:    expected unsigned long long [usertype] align
../block/sed-opal.c:375:20:    got restricted __be64 const [usertype] alignment_granularity
../block/sed-opal.c:376:25: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:376:25:    expected unsigned long long [usertype] lowest_lba
../block/sed-opal.c:376:25:    got restricted __be64 const [usertype] lowest_aligned_lba

Fixes: 455a7b238c ("block: Add Sed-opal library")
Cc: Scott Bauer <scott.bauer@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: linux-block@vger.kernel.org
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-03 14:21:32 -06:00
Randy Dunlap
dc30102565 block: sed-opal: fix sparse warning: obsolete array init.
Fix sparse warning: (missing '=')
../block/sed-opal.c:133:17: warning: obsolete array initializer, use C99 syntax

Fixes: ff91064ea3 ("block: sed-opal: check size of shadow mbr")
Cc: linux-block@vger.kernel.org
Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Cc: David Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by:  Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-03 14:21:30 -06:00
Ming Lei
3154df262d blk-mq: apply normal plugging for HDD
Some HDD drive may expose multiple hardware queues, such as MegraRaid.
Let's apply the normal plugging for such devices because sequential IO
may benefit a lot from plug merging.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-27 11:40:21 -06:00
Ming Lei
a12de1d42d blk-mq: honor IO scheduler for multiqueue devices
If a device is using multiple queues, the IO scheduler may be bypassed.
This may hurt performance for some slow MQ devices, and it also breaks
zoned devices which depend on mq-deadline for respecting the write order
in one zone.

Don't bypass io scheduler if we have one setup.

This patch can double sequential write performance basically on MQ
scsi_debug when mq-deadline is applied.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-27 11:38:28 -06:00
Yufen Yu
8d6996630c block: fix null pointer dereference in blk_mq_rq_timed_out()
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:

[  108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[  108.827059] PGD 0 P4D 0
[  108.827313] Oops: 0000 [#1] SMP PTI
[  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[  108.829503] Workqueue: kblockd blk_mq_timeout_work
[  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[  108.838191] Call Trace:
[  108.838406]  bt_iter+0x74/0x80
[  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
[  108.839074]  ? __switch_to_asm+0x34/0x70
[  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
[  108.840732]  blk_mq_timeout_work+0x74/0x200
[  108.841151]  process_one_work+0x297/0x680
[  108.841550]  worker_thread+0x29c/0x6f0
[  108.841926]  ? rescuer_thread+0x580/0x580
[  108.842344]  kthread+0x16a/0x1a0
[  108.842666]  ? kthread_flush_work+0x170/0x170
[  108.843100]  ret_from_fork+0x35/0x40

The bug is caused by the race between timeout handle and completion for
flush request.

When timeout handle function blk_mq_rq_timed_out() try to read
'req->q->mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.

After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq->ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.

However, flush request has defined .end_io and rq->end_io() is still
called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.

We fix this problem by covering flush request with 'rq->ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>

-------
v2:
 - move rq_status from struct request to struct blk_flush_queue
v3:
 - remove unnecessary '{}' pair.
v4:
 - let spinlock to protect 'fq->rq_status'
v5:
 - move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-27 07:01:25 -06:00
Yufen Yu
2af2783f2e rq-qos: get rid of redundant wbt_update_limits()
We have updated limits after calling wbt_set_min_lat(). No need to
update again.

Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-27 01:13:10 -06:00
Tejun Heo
7afcccafa5 iocost: bump up default latency targets for hard disks
The default hard disk param sets latency targets at 50ms.  As the
default target percentiles are zero, these don't directly regulate
vrate; however, they're still used to calculate the period length -
100ms in this case.

This is excessively low.  A SATA drive with QD32 saturated with random
IOs can easily reach avg completion latency of several hundred msecs.
A period duration which is substantially lower than avg completion
latency can lead to wildly fluctuating vrate.

Let's bump up the default latency targets to 250ms so that the period
duration is sufficiently long.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26 01:12:01 -06:00
Tejun Heo
7cd806a9a9 iocost: improve nr_lagging handling
Some IOs may span multiple periods.  As latencies are collected on
completion, the inbetween periods won't register them and may
incorrectly decide to increase vrate.  nr_lagging tracks these IOs to
avoid those situations.  Currently, whenever there are IOs which are
spanning from the previous period, busy_level is reset to 0 if
negative thus suppressing vrate increase.

This has the following two problems.

* When latency target percentiles aren't set, vrate adjustment should
  only be governed by queue depth depletion; however, the current code
  keeps nr_lagging active which pulls in latency results and can keep
  down vrate unexpectedly.

* When lagging condition is detected, it resets the entire negative
  busy_level.  This turned out to be way too aggressive on some
  devices which sometimes experience extended latencies on a small
  subset of commands.  In addition, a lagging IO will be accounted as
  latency target miss on completion anyway and resetting busy_level
  amplifies its impact unnecessarily.

This patch fixes the above two problems by disabling nr_lagging
counting when latency target percentiles aren't set and blocking vrate
increases when there are lagging IOs while leaving busy_level as-is.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26 01:12:00 -06:00
Tejun Heo
25d41e4aad iocost: better trace vrate changes
vrate_adj tracepoint traces vrate changes; however, it does so only
when busy_level is non-zero.  busy_level turning to zero can sometimes
be as interesting an event.  This patch also enables vrate_adj
tracepoint on other vrate related events - busy_level changes and
non-zero nr_lagging.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26 01:11:58 -06:00
Ming Lei
b89f625e28 block: don't release queue's sysfs lock during switching elevator
cecf5d87ff ("block: split .sysfs_lock into two locks") starts to
release & acquire sysfs_lock before registering/un-registering elevator
queue during switching elevator for avoiding potential deadlock from
showing & storing 'queue/iosched' attributes and removing elevator's
kobject.

Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
required in .show & .store of queue/iosched's attributes, and just
elevator's sysfs lock is acquired in elv_iosched_store() and
elv_iosched_show(). So it is safe to hold queue's sysfs lock when
registering/un-registering elevator queue.

The biggest issue is that commit cecf5d87ff assumes that concurrent
write on 'queue/scheduler' can't happen. However, this assumption isn't
true, because kernfs_fop_write() only guarantees that concurrent write
aren't called on the same open file, but the write could be from
different open on the file. So we can't release & re-acquire queue's
sysfs lock during switching elevator, otherwise use-after-free on
elevator could be triggered.

Fixes the issue by not releasing queue's sysfs lock during switching
elevator.

Fixes: cecf5d87ff ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26 00:45:51 -06:00
Ming Lei
284b94be19 blk-mq: move lockdep_assert_held() into elevator_exit
Commit c48dac137a ("block: don't hold q->sysfs_lock in elevator_init_mq")
removes q->sysfs_lock from elevator_init_mq(), but forgot to deal with
lockdep_assert_held() called in blk_mq_sched_free_requests() which is
run in failure path of elevator_init_mq().

blk_mq_sched_free_requests() is called in the following 3 functions:

	elevator_init_mq()
	elevator_exit()
	blk_cleanup_queue()

In blk_cleanup_queue(), blk_mq_sched_free_requests() is followed exactly
by 'mutex_lock(&q->sysfs_lock)'.

So moving the lockdep_assert_held() from blk_mq_sched_free_requests()
into elevator_exit() for fixing the report by syzbot.

Reported-by: syzbot+da3b7677bb913dc1b737@syzkaller.appspotmail.com
Fixed: c48dac137a ("block: don't hold q->sysfs_lock in elevator_init_mq")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26 00:45:05 -06:00
Linus Torvalds
2e959dd87a for-5.4/post-2019-09-24
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2J8xQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpujgD/94s9GGKN8JShxCpT0YNuWyyFF5gNlaimQU
 RSGAwnv2YUgEGNSUOPpcaj5FAYhTfYzbqoHlE+jytA2U5KXTOhc5Z85QV+TY4HPs
 I03xczYuYD/uX0QuF00zU2+6eV3lETELPiBARbfEQdHfm72iwurweHzlh4dfhbxW
 P7UA/cKixXWF2CH9wg5347Ll93nD24f2pi8BUyLJi/xpdlaRrN11Ii8AzNlRmq52
 VRxURuogl98W89F6EV2VhPGFgUEYHY2Ot7II2OqqV+jmjHDQW9y5hximzINOqkxs
 bQwo5J+WrDSPoqwl8+db2k7QQjAl1XKDAHmCwz+7J/BoOgZj8/M1FMBwzita+5x+
 UqxEYe7k+2G3w2zuhBrq03BypU8pwqFep/QI0cCCPaHs4J5QnkVOScEqd6iV/C3T
 FPvMvqDf7MrElghj4Qa2IZlh/CgqmLG5NUEz8E40cXkdiP+E+eK9ZY2Uwx2XhBrm
 7Gl+SpG5DxWqqJeRNVWjFwM4p5L+01NtwDbTjZ1rsf+mCW5cNsy/L9B4UpPz4HxW
 coAs0y/Ce+ZhCopIXZ4jLDBoTG9yoVg8EcyfaHKD2Zz0mUFxa2xm+LvXKeT49qqx
 xuodpKD3fiuM7h9Xgv+cDsmn8Rr8gSeXEGV7qzpudmkxbp6IVg/yG5hC/dM921GR
 EVrRtUIwdw==
 =aAPP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:
 "Some later additions that weren't quite done for the first pull
  request, and also a few fixes that have arrived since.

  This contains:

   - Kill silly pktcdvd warning on attempting to register a non-scsi
     passthrough device (me)

   - Use symbolic constants for the block t10 protection types, and
     switch to handling it in core rather than in the drivers (Max)

   - libahci platform missing node put fix (Nishka)

   - Small series of fixes for BFQ (Paolo)

   - Fix possible nbd crash (Xiubo)"

* tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block:
  block: drop device references in bsg_queue_rq()
  block: t10-pi: fix -Wswitch warning
  pktcdvd: remove warning on attempting to register non-passthrough dev
  ata: libahci_platform: Add of_node_put() before loop exit
  nbd: fix possible page fault for nbd disk
  nbd: rename the runtime flags as NBD_RT_ prefixed
  block, bfq: push up injection only after setting service time
  block, bfq: increase update frequency of inject limit
  block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1
  block, bfq: update inject limit only after injection occurred
  block: centralize PI remapping logic to the block layer
  block: use symbolic constants for t10_pi type
2019-09-24 16:31:50 -07:00
Martin Wilck
d46fe2cb2d block: drop device references in bsg_queue_rq()
Make sure that bsg_queue_rq() calls put_device() if an error is
encountered after get_device() was successful.

Fixes: cd2f076f1d ("bsg: convert to use blk-mq")
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-23 11:17:24 -06:00
Max Gurtovoy
be21683e48 block: t10-pi: fix -Wswitch warning
Changing the switch() statement to symbolic constants made the compiler
(at least clang-9, did not check gcc) notice that there is one enum value
that is not handled here:

block/t10-pi.c:62:11: error: enumeration value 'T10_PI_TYPE0_PROTECTION'
not handled in switch [-Werror,-Wswitch]

Add a BUG_ON statement if we ever get to t10_pi_verify function with
TYPE0 and replace the switch() statement with if/else clause for the
valid types.

Fixes: 9b2061b1a262 ("block: use symbolic constants for t10_pi type")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-23 08:05:19 -06:00
Linus Torvalds
671df18953 dma-mapping updates for 5.4:
- add dma-mapping and block layer helpers to take care of IOMMU
    merging for mmc plus subsequent fixups (Yoshihiro Shimoda)
  - rework handling of the pgprot bits for remapping (me)
  - take care of the dma direct infrastructure for swiotlb-xen (me)
  - improve the dma noncoherent remapping infrastructure (me)
  - better defaults for ->mmap, ->get_sgtable and ->get_required_mask (me)
  - cleanup mmaping of coherent DMA allocations (me)
  - various misc cleanups (Andy Shevchenko, me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl2CSucLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPfrhAAgXZA/EdFPvkkCoDrmgtf3XkudX9gajeCd9g4NZy6
 ZBQElTVvm4S0sQj7IXgALnMumDMbbTibW5SQLX5GwQDe+XXBpZ8ajpAnJAXc8a5T
 qaFQ4SInr4CgBZf9nZKDkbSBZ1Tu3AQm1c0QI8riRCkrVTuX4L06xpCef4Yh4mgO
 rwWEjIioYpQiKZMmu98riXh3ZNfFG3mVJRhKt8B6XJbBgnUnjDOPYGgaUwp6CU20
 tFBKL2GaaV0vdLJ5wYhIGXT4DJ8tp9T5n3IYGZv1Ux889RaZEHlCrMxzelYeDbCT
 KhZbhcSECGnddsh73t/UX7/KhytuqnfKa9n+Xo6AWuA47xO4c36quOOcTk9M0vE5
 TfGDmewgL6WIv4lzokpRn5EkfDhyL33j8eYJrJ8e0ldcOhSQIFk4ciXnf2stWi6O
 JrlzzzSid+zXxu48iTfoPdnMr7psTpiMvvRvKfEeMp2FX9Fg6EdMzJYLTEl+COHB
 0WwNacZmY3P01+b5EZXEgqKEZevIIdmPKbyM9rPtTjz8BjBwkABHTpN3fWbVBf7/
 Ax6OPYyW40xp1fnJuzn89m3pdOxn88FpDdOaeLz892Zd+Qpnro1ayulnFspVtqGM
 mGbzA9whILvXNRpWBSQrvr2IjqMRjbBxX3BVACl3MMpOChgkpp5iANNfSDjCftSF
 Zu8=
 =/wGv
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - add dma-mapping and block layer helpers to take care of IOMMU merging
   for mmc plus subsequent fixups (Yoshihiro Shimoda)

 - rework handling of the pgprot bits for remapping (me)

 - take care of the dma direct infrastructure for swiotlb-xen (me)

 - improve the dma noncoherent remapping infrastructure (me)

 - better defaults for ->mmap, ->get_sgtable and ->get_required_mask
   (me)

 - cleanup mmaping of coherent DMA allocations (me)

 - various misc cleanups (Andy Shevchenko, me)

* tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping: (41 commits)
  mmc: renesas_sdhi_internal_dmac: Add MMC_CAP2_MERGE_CAPABLE
  mmc: queue: Fix bigger segments usage
  arm64: use asm-generic/dma-mapping.h
  swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page
  swiotlb-xen: simplify cache maintainance
  swiotlb-xen: use the same foreign page check everywhere
  swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable
  xen: remove the exports for xen_{create,destroy}_contiguous_region
  xen/arm: remove xen_dma_ops
  xen/arm: simplify dma_cache_maint
  xen/arm: use dev_is_dma_coherent
  xen/arm: consolidate page-coherent.h
  xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
  arm: remove wrappers for the generic dma remap helpers
  dma-mapping: introduce a dma_common_find_pages helper
  dma-mapping: always use VM_DMA_COHERENT for generic DMA remap
  vmalloc: lift the arm flag for coherent mappings to common code
  dma-mapping: provide a better default ->get_required_mask
  dma-mapping: remove the dma_declare_coherent_memory export
  remoteproc: don't allow modular build
  ...
2019-09-19 13:27:23 -07:00
Paolo Valente
58494c980f block, bfq: push up injection only after setting service time
If equal to 0, the injection limit for a bfq_queue is pushed to 1
after a first sample of the total service time of the I/O requests of
the queue is computed (to allow injection to start). Yet, because of a
mistake in the branch that performs this action, the push may happen
also in some other case. This commit fixes this issue.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17 20:03:49 -06:00
Paolo Valente
17c3d26602 block, bfq: increase update frequency of inject limit
The update period of the injection limit has been tentatively set to
100 ms, to reduce fluctuations. This value however proved to cause,
occasionally, the limit to be decremented for some bfq_queue only
after the queue underwent excessive injection for a lot of time. This
commit reduces the period to 10 ms.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17 20:03:49 -06:00
Paolo Valente
c1e0a18228 block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1
Upon an increment attempt of the injection limit, the latter is
constrained not to become higher than twice the maximum number
max_rq_in_driver of I/O requests that have happened to be in service
in the drive. This high bound allows the injection limit to grow
beyond max_rq_in_driver, which may then cause max_rq_in_driver itself
to grow.

However, since the limit is incremented by only one unit at a time,
there is no need for such a high bound, and just max_rq_in_driver+1 is
enough.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17 20:03:49 -06:00
Paolo Valente
23ed570acc block, bfq: update inject limit only after injection occurred
BFQ updates the injection limit of each bfq_queue as a function of how
much the limit inflates the service times experienced by the I/O
requests of the queue. So only service times affected by injection
must be taken into account. Unfortunately, in the current
implementation of this update scheme, the service time of an I/O
request rq not affected by injection may happen to be considered in
the following case: there is no I/O request in service when rq
arrives.

This commit fixes this issue by making sure that only service times
affected by injection are considered for updating the injection
limit. In particular, the service time of an I/O request rq is now
considered only if at least one of the following two conditions holds:
- the destination bfq_queue for rq underwent injection before rq
arrival, and there is still I/O in service in the drive on rq arrival
(the service of such unfinished I/O may delay the service of rq);
- injection occurs between the arrival and the completion time of rq.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17 20:03:49 -06:00
Max Gurtovoy
54d4e6ab91 block: centralize PI remapping logic to the block layer
Currently t10_pi_prepare/t10_pi_complete functions are called during the
NVMe and SCSi layers command preparetion/completion, but their actual
place should be the block layer since T10-PI is a general data integrity
feature that is used by block storage protocols. Introduce .prepare_fn
and .complete_fn callbacks within the integrity profile that each type
can implement according to its needs.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>

Fixed to not call queue integrity functions if BLK_DEV_INTEGRITY
isn't defined in the config.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17 20:03:49 -06:00
Max Gurtovoy
5eaed68dd3 block: use symbolic constants for t10_pi type
Replace all hard-coded values with T10_PI_TYPES to make the code more
readable.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17 20:03:49 -06:00
Linus Torvalds
7ad67ca553 for-5.4/block-2019-09-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1/no0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmo9EACFXMbdNmEEUMyRSdOkVLlr7ZlTyQi1tLpB
 YESDPxdBfybzpi0qa8JSaysGIfvSkSjmSAqBqrWPmASOSOL6CK4bbA4fTYbgPplk
 XeHUdgGiG34oCQUn8Xil5reYaTm7I6LQWnWTpVa5fIhAyUYaGJL+987ykoGmpQmB
 Dvf3YSc+8H0RTp9PCMVd6UCGPkZbVlLImGad3PF5ULvTEaE4RCXC2aiAgh0p1l5A
 J2CkRZ+/mio3zN2O4YN7VdPGfr1Wo1iZ834xbIGLegv1miHXagFk7jwTcC7zIt5t
 oSnJnqIg3iCe7SpWt4Bkzw/zy/2UqaspifbCMgw8vychlViVRUHFO5h85Yboo7kQ
 OMLEQPcwjm6dTHv5h1iXF9LW1O7NoiYmmgvApU9uOo1HUrl1X7PZ3JEfUsVHxkOO
 T4D5igf0Krsl1eAbiwEUQzy7vFZ8PlRHqrHgK+fkyotzHu1BJR7OQkYygEfGFOB/
 EfMxplGDpmibYGuWCwDX2bPAmLV3SPUQENReHrfPJRDt5TD1UkFpVGv/PLLhbr0p
 cLYI78DKpDSigBpVMmwq5nTYpnex33eyDTTA8C0sakcsdzdmU5qv30y3wm4nTiep
 f6gZo6IMXwRg/rCgVVrd9SKQAr/8wEzVlsDW3qyi2pVT8sHIgm0tFv7paihXGdDV
 xsKgmTrQQQ==
 =Qt+h
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Two NVMe pull requests:
     - ana log parse fix from Anton
     - nvme quirks support for Apple devices from Ben
     - fix missing bio completion tracing for multipath stack devices
       from Hannes and Mikhail
     - IP TOS settings for nvme rdma and tcp transports from Israel
     - rq_dma_dir cleanups from Israel
     - tracing for Get LBA Status command from Minwoo
     - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
     - Some consolidation between the fabrics transports for handling
       the CAP register
     - reset race with ns scanning fix for fabrics (move fabrics
       commands to a dedicated request queue with a different lifetime
       from the admin request queue)."
     - controller reset and namespace scan races fixes
     - nvme discovery log change uevent support
     - naming improvements from Keith
     - multiple discovery controllers reject fix from James
     - some regular cleanups from various people

 - Series fixing (and re-fixing) null_blk debug printing and nr_devices
   checks (André)

 - A few pull requests from Song, with fixes from Andy, Guoqing,
   Guilherme, Neil, Nigel, and Yufen.

 - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

 - Bio merge handling unification (Christoph)

 - Pick default elevator correctly for devices with special needs
   (Damien)

 - Block stats fixes (Hou)

 - Timeout and support devices nbd fixes (Mike)

 - Series fixing races around elevator switching and device add/remove
   (Ming)

 - sed-opal cleanups (Revanth)

 - Per device weight support for BFQ (Fam)

 - Support for blk-iocost, a new model that can properly account cost of
   IO workloads. (Tejun)

 - blk-cgroup writeback fixes (Tejun)

 - paride queue init fixes (zhengbin)

 - blk_set_runtime_active() cleanup (Stanley)

 - Block segment mapping optimizations (Bart)

 - lightnvm fixes (Hans/Minwoo/YueHaibing)

 - Various little fixes and cleanups

* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
  null_blk: format pr_* logs with pr_fmt
  null_blk: match the type of parameter nr_devices
  null_blk: do not fail the module load with zero devices
  block: also check RQF_STATS in blk_mq_need_time_stamp()
  block: make rq sector size accessible for block stats
  bfq: Fix bfq linkage error
  raid5: use bio_end_sector in r5_next_bio
  raid5: remove STRIPE_OPS_REQ_PENDING
  md: add feature flag MD_FEATURE_RAID0_LAYOUT
  md/raid0: avoid RAID0 data corruption due to layout confusion.
  raid5: don't set STRIPE_HANDLE to stripe which is in batch list
  raid5: don't increment read_errors on EILSEQ return
  nvmet: fix a wrong error status returned in error log page
  nvme: send discovery log page change events to userspace
  nvme: add uevent variables for controller devices
  nvme: enable aen regardless of the presence of I/O queues
  nvme-fabrics: allow discovery subsystems accept a kato
  nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
  nvme: Remove redundant assignment of cq vector
  nvme: Assign subsys instance from first ctrl
  ...
2019-09-17 16:57:47 -07:00
Linus Torvalds
7f2444d38f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Thomas Gleixner:
 "Timers and timekeeping updates:

   - A large overhaul of the posix CPU timer code which is a preparation
     for moving the CPU timer expiry out into task work so it can be
     properly accounted on the task/process.

     An update to the bogus permission checks will come later during the
     merge window as feedback was not complete before heading of for
     travel.

   - Switch the timerqueue code to use cached rbtrees and get rid of the
     homebrewn caching of the leftmost node.

   - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
     single function

   - Implement the separation of hrtimers to be forced to expire in hard
     interrupt context even when PREEMPT_RT is enabled and mark the
     affected timers accordingly.

   - Implement a mechanism for hrtimers and the timer wheel to protect
     RT against priority inversion and live lock issues when a (hr)timer
     which should be canceled is currently executing the callback.
     Instead of infinitely spinning, the task which tries to cancel the
     timer blocks on a per cpu base expiry lock which is held and
     released by the (hr)timer expiry code.

   - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
     resulting in faster access to timekeeping functions.

   - Updates to various clocksource/clockevent drivers and their device
     tree bindings.

   - The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  posix-cpu-timers: Fix permission check regression
  posix-cpu-timers: Always clear head pointer on dequeue
  hrtimer: Add a missing bracket and hide `migration_base' on !SMP
  posix-cpu-timers: Make expiry_active check actually work correctly
  posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
  tick: Mark sched_timer to expire in hard interrupt context
  hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
  x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
  posix-cpu-timers: Utilize timerqueue for storage
  posix-cpu-timers: Move state tracking to struct posix_cputimers
  posix-cpu-timers: Deduplicate rlimit handling
  posix-cpu-timers: Remove pointless comparisons
  posix-cpu-timers: Get rid of 64bit divisions
  posix-cpu-timers: Consolidate timer expiry further
  posix-cpu-timers: Get rid of zero checks
  rlimit: Rewrite non-sensical RLIMIT_CPU comment
  posix-cpu-timers: Respect INFINITY for hard RTTIME limit
  posix-cpu-timers: Switch thread group sampling to array
  posix-cpu-timers: Restructure expiry array
  posix-cpu-timers: Remove cputime_expires
  ...
2019-09-17 12:35:15 -07:00
Hou Tao
9a91b05bba block: also check RQF_STATS in blk_mq_need_time_stamp()
In __blk_mq_end_request() if block stats needs update, we should
ensure now is valid instead of 0 even when iostat is disabled.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-15 16:02:10 -06:00
Hou Tao
3d24430694 block: make rq sector size accessible for block stats
Currently rq->data_len will be decreased by partial completion or
zeroed by completion, so when blk_stat_add() is invoked, data_len
will be zero and there will never be samples in poll_cb because
blk_mq_poll_stats_bkt() will return -1 if data_len is zero.

We could move blk_stat_add() back to __blk_mq_complete_request(),
but that would make the effort of trying to call ktime_get_ns()
once in vain. Instead we can reuse throtl_size field, and use
it for both block stats and block throttle, and adjust the
logic in blk_mq_poll_stats_bkt() accordingly.

Fixes: 4bc6339a58 ("block: move blk_stat_add() to __blk_mq_end_request()")
Tested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-15 16:02:08 -06:00
Pavel Begunkov
89f3b6d62f bfq: Fix bfq linkage error
Since commit 795fe54c2a ("bfq: Add per-device weight"), bfq uses
blkg_conf_prep() and blkg_conf_finish(), which are not exported. So, it
causes linkage error if bfq compiled as a module.

Fixes: 795fe54c2a ("bfq: Add per-device weight")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-14 11:33:43 -06:00
Ming Lei
0a67b5a926 block: fix race between switching elevator and removing queues
cecf5d87ff ("block: split .sysfs_lock into two locks") starts to
release & actuire sysfs_lock again during switching elevator. So it
isn't enough to prevent switching elevator from happening by simply
clearing QUEUE_FLAG_REGISTERED with holding sysfs_lock, because
in-progress switch still can move on after re-acquiring the lock,
meantime the flag of QUEUE_FLAG_REGISTERED won't get checked.

Fixes this issue by checking 'q->elevator' directly & locklessly after
q->kobj is removed in blk_unregister_queue(), this way is safe because
q->elevator can't be changed at that time.

Fixes: cecf5d87ff ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12 07:13:22 -06:00
Stanley Chu
8a15b4d7cd block: bypass blk_set_runtime_active for uninitialized q->dev
Some devices may skip blk_pm_runtime_init() and have null pointer
in its request_queue->dev. For example, SCSI devices of UFS Well-Known
LUNs.

Currently the null pointer is checked by the user of
blk_set_runtime_active(), i.e., scsi_dev_type_resume(). It is better to
check it by blk_set_runtime_active() itself instead of by its users.

Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12 07:11:56 -06:00
Tejun Heo
7c1ee704a1 iocost_monitor: Report debt
Report debt and rename del_ms row to delay for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e1518f63f2 blk-iocost: Don't let merges push vtime into the future
Merges have the same problem that forced-bios had which is fixed by
the previous patch.  The cost of a merge is calculated at the time of
issue and force-advances vtime into the future.  Until global vtime
catches up, how the cgroup's hweight changes in the meantime doesn't
matter and it often leads to situations where the cost is calculated
at one hweight and paid at a very different one.  See the previous
patch for more details.

Fix it by never advancing vtime into the future for merges.  If budget
is available, vtime is advanced.  Otherwise, the cost is charged as
debt.

This brings merge cost handling in line with issue cost handling in
ioc_rqos_throttle().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
36a524814f blk-iocost: Account force-charged overage in absolute vtime
Currently, when a bio needs to be force-charged and there isn't enough
budget, vtime is simply pushed into the future.  This means that the
cost of the whole bio is scaled using the current hweight and then
charged immediately.  Until the global vtime advances beyond this
future vtime, the cgroup won't be allowed to issue normal IOs.

This is incorrect and can lead to, for example, exploding vrate or
extended stalls if vrate range is constrained.  Consider the following
scenario.

1. A cgroup with a very low hweight runs out of budget.

2. A storm of swap-out happens on it.  All of them are scaled
   according to the current low hweight and charged to vtime pushing
   it to a far future.

3. All other cgroups go idle and now the above cgroup has access to
   the whole device.  However, because vtime is already wound using
   the past low hweight, what its current hweight is doesn't matter
   until global vtime catches up to the local vtime.

4. As a result, either vrate gets ramped up extremely or the IOs stall
   while the underlying device is idle.

This is because the hweight the overage is calculated at is different
from the hweight that it's being paid at.

Fix it by remembering the overage in absoulte vtime and continuously
paying with the actual budget according to the current hweight at each
period.

Note that non-forced bios which wait already remembers the cost in
absolute vtime.  This brings forced-bio accounting in line.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e036c4caba blk-iocost: Fix incorrect operation order during iocg free
ioc_pd_free() first cancels the hrtimers and then deactivates the
iocg.  However, the iocg timer can run inbetween and reschedule the
hrtimers which will end up running after the iocg is freed leading to
crashes like the following.

  general protection fault: 0000 [#1] SMP
  ...
  RIP: 0010:iocg_kick_delay+0xbe/0x1b0
  RSP: 0018:ffffc90003598ea0 EFLAGS: 00010046
  RAX: 1cee00fd69512b54 RBX: ffff8881bba48400 RCX: 00000000000003e8
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881bba48400
  RBP: 0000000000004e20 R08: 0000000000000002 R09: 00000000000003e8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003598ef0
  R13: 00979f3810ad461f R14: ffff8881bba4b400 R15: 25439f950d26e1d1
  FS:  0000000000000000(0000) GS:ffff88885f800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f64328c7e40 CR3: 0000000002409005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   iocg_delay_timer_fn+0x3d/0x60
   __hrtimer_run_queues+0xfe/0x270
   hrtimer_interrupt+0xf4/0x210
   smp_apic_timer_interrupt+0x5e/0x120
   apic_timer_interrupt+0xf/0x20
   </IRQ>

Fix it by canceling hrtimers after deactivating the iocg.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:17:04 -06:00
Fam Zheng
795fe54c2a bfq: Add per-device weight
This adds to BFQ the missing per-device weight interfaces:
blkio.bfq.weight_device on legacy and io.bfq.weight on unified. The
implementation pretty closely resembles what we had in CFQ and the parsing code
is basically reused.

Tests
=====

Using two cgroups and three block devices, having weights setup as:

Cgroup          test1           test2
============================================
default         100             500
sda             500             100
sdb             default         default
sdc             200             200

cgroup v1 runs
--------------

    sda.test1.out:   READ: bw=913MiB/s
    sda.test2.out:   READ: bw=183MiB/s

    sdb.test1.out:   READ: bw=213MiB/s
    sdb.test2.out:   READ: bw=1054MiB/s

    sdc.test1.out:   READ: bw=650MiB/s
    sdc.test2.out:   READ: bw=650MiB/s

cgroup v2 runs
--------------

    sda.test1.out:   READ: bw=915MiB/s
    sda.test2.out:   READ: bw=184MiB/s

    sdb.test1.out:   READ: bw=216MiB/s
    sdb.test2.out:   READ: bw=1069MiB/s

    sdc.test1.out:   READ: bw=621MiB/s
    sdc.test2.out:   READ: bw=622MiB/s

Signed-off-by: Fam Zheng <zhengfeiran@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Paolo Valente <paolo.valente@linaro.org>

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-06 14:33:52 -06:00
Fam Zheng
5ff047e328 bfq: Extract bfq_group_set_weight from bfq_io_set_weight_legacy
This function will be useful when we update weight from the soon-coming
per-device interface.

Signed-off-by: Fam Zheng <zhengfeiran@bytedance.com>
Reviewed-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-06 14:33:50 -06:00
Fam Zheng
e9d3c866bf bfq: Fix the missing barrier in __bfq_entity_update_weight_prio
The comment of bfq_group_set_weight says the reading of prio_changed
should happen before the reading of weight, but a memory barrier is
missing here. Add it now, to match the smp_wmb() there.

Signed-off-by: Fam Zheng <zhengfeiran@bytedance.com>
Reviewed-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-06 14:33:48 -06:00
Jens Axboe
a26142559c block: fix elevator_get_by_features()
The lookup logic is broken - 'e' will never be NULL, even if the
list is empty. Maintain lookup hit in a separate variable instead.

Fixes: a0958ba7fc ("block: Improve default elevator selection")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-06 07:02:31 -06:00
Damien Le Moal
737eb78e82 block: Delay default elevator initialization
When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet for most drivers. The device type and elevator required features
are not set yet, preventing to correctly select the default elevator
most suitable for the device.

This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.

Fix this by adding the boolean elevator_init argument to
blk_mq_init_allocated_queue() to control the execution of
elevator_init_mq(). Two cases exist:
1) elevator_init = false is used for calls to
   blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
   case, a call to elevator_init_mq() is added to __device_add_disk(),
   resulting in the delayed initialization of the queue elevator
   after the device driver finished probing the device information. This
   effectively allows elevator_init_mq() access to more information
   about the device.
2) elevator_init = true preserves the current behavior of initializing
   the elevator directly from blk_mq_init_allocated_queue(). This case
   is used for the special request based DM devices where the device
   gendisk is created before the queue initialization and device
   information (e.g. queue limits) is already known when the queue
   initialization is executed.

Additionally, to make sure that the elevator initialization is never
done while requests are in-flight (there should be none when the device
driver calls device_add_disk()), freeze and quiesce the device request
queue before calling blk_mq_init_sched() in elevator_init_mq().

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05 19:52:34 -06:00
Damien Le Moal
a0958ba7fc block: Improve default elevator selection
For block devices that do not specify required features, preserve the
current default elevator selection (mq-deadline for single queue
devices, none for multi-queue devices). However, for devices specifying
required features (e.g. zoned block devices ELEVATOR_F_ZBD_SEQ_WRITE
feature), select the first available elevator providing the required
features.

In all cases, default to "none" if no elevator is available or if the
initialization of the default elevator fails.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05 19:52:34 -06:00
Damien Le Moal
68c43f133a block: Introduce elevator features
Introduce the definition of elevator features through the
elevator_features flags in the elevator_type structure. Each flag can
represent a feature supported by an elevator. The first feature defined
by this patch is support for zoned block device sequential write
constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
by the mq-deadline elevator using zone write locking.

Other possible features are IO priorities, write hints, latency targets
or single-LUN dual-actuator disks (for which the elevator could maintain
one LBA ordered list per actuator).

The required_elevator_features field is also added to the request_queue
structure to allow a device driver to specify elevator feature flags
that an elevator must support for the correct operation of the device
(e.g. device drivers for zoned block devices can have the
ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
The helper function blk_queue_required_elevator_features() is
defined for setting this new field.

With these two new fields in place, the elevator functions
elevator_match() and elevator_find() are modified to allow a user to set
only an elevator with a set of features that satisfies the device
required features. Elevators not matching the device requirements are
not shown in the device sysfs queue/scheduler file to prevent their use.

The "none" elevator can always be selected as before.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05 19:52:33 -06:00
Damien Le Moal
954b4a5ce4 block: Change elevator_init_mq() to always succeed
If the default elevator chosen is mq-deadline, elevator_init_mq() may
return an error if mq-deadline initialization fails, leading to
blk_mq_init_allocated_queue() returning an error, which in turn will
cause the block device initialization to fail and the device not being
exposed.

Instead of taking such extreme measure, handle mq-deadline
initialization failures in the same manner as when mq-deadline is not
available (no module to load), that is, default to the "none" scheduler.
With this change, elevator_init_mq() return type can be changed to void.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05 19:52:33 -06:00
Damien Le Moal
61db437d1c block: Cleanup elevator_init_mq() use
Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before
calling elevator_init_mq() to make sure that the queue supports IO
scheduling, use the elevator.c function elv_support_iosched() in
elevator_init_mq(). This does not introduce any functional change but
ensure that elevator_init_mq() does the right thing based on the queue
settings.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05 19:52:33 -06:00
Marcos Paulo de Souza
85c0a037dc block: elevator.c: Remove now unused elevator= argument
Since the inclusion of blk-mq, elevator argument was not being
considered anymore, and it's utility died long with the legacy IO path,
now removed too.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>

Fold with doc removal patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-03 08:02:53 -06:00
Damien Le Moal
cb8acabbe3 block: mq-deadline: Fix queue restart handling
Commit 7211aef86f ("block: mq-deadline: Fix write completion
handling") added a call to blk_mq_sched_mark_restart_hctx() in
dd_dispatch_request() to make sure that write request dispatching does
not stall when all target zones are locked. This fix left a subtle race
when a write completion happens during a dispatch execution on another
CPU:

CPU 0: Dispatch			CPU1: write completion

dd_dispatch_request()
    lock(&dd->lock);
    ...
    lock(&dd->zone_lock);	dd_finish_request()
    rq = find request		lock(&dd->zone_lock);
    unlock(&dd->zone_lock);
    				zone write unlock
				unlock(&dd->zone_lock);
				...
				__blk_mq_free_request
                                      check restart flag (not set)
				      -> queue not run
    ...
    if (!rq && have writes)
        blk_mq_sched_mark_restart_hctx()
    unlock(&dd->lock)

Since the dispatch context finishes after the write request completion
handling, marking the queue as needing a restart is not seen from
__blk_mq_free_request() and blk_mq_sched_restart() not executed leading
to the dispatch stall under 100% write workloads.

Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
dd_dispatch_request() into dd_finish_request() under the zone lock to
ensure full mutual exclusion between write request dispatch selection
and zone unlock on write request completion.

Fixes: 7211aef86f ("block: mq-deadline: Fix write completion handling")
Cc: stable@vger.kernel.org
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-03 07:59:51 -06:00
Yoshihiro Shimoda
45147fb522 block: add a helper function to merge the segments
This patch adds a helper function whether a queue can merge
the segments by the DMA MAP layer (e.g. via IOMMU).

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-03 08:32:50 +02:00
Tejun Heo
e916ad29d9 blkcg: add missing NULL check in ioc_cpd_alloc()
ioc_cpd_alloc() forgot to check NULL return from kzalloc().  Add it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-30 07:16:19 -06:00
Tejun Heo
3532e72272 blkcg: fix missing free on error path of blk_iocost_init()
blk_iocost_init() forgot to free its percpu stat on the error path.
Fix it.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Reported-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-29 09:59:14 -06:00
Tejun Heo
8504dea783 blkcg: add tools/cgroup/iocost_coef_gen.py
Add a script which can be used to generate device-specific iocost
linear model coefficients.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:17 -06:00
Tejun Heo
6954ff185e blkcg: add tools/cgroup/iocost_monitor.py
Instead of mucking with debugfs and ->pd_stat(), add drgn based
monitoring script.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:14 -06:00
Tejun Heo
7caa47151a blkcg: implement blk-iocost
This patchset implements IO cost model based work-conserving
proportional controller.

While io.latency provides the capability to comprehensively prioritize
and protect IOs depending on the cgroups, its protection is binary -
the lowest latency target cgroup which is suffering is protected at
the cost of all others.  In many use cases including stacking multiple
workload containers in a single system, it's necessary to distribute
IO capacity with better granularity.

One challenge of controlling IO resources is the lack of trivially
observable cost metric.  The most common metrics - bandwidth and iops
- can be off by orders of magnitude depending on the device type and
IO pattern.  However, the cost isn't a complete mystery.  Given
several key attributes, we can make fairly reliable predictions on how
expensive a given stream of IOs would be, at least compared to other
IO patterns.

The function which determines the cost of a given IO is the IO cost
model for the device.  This controller distributes IO capacity based
on the costs estimated by such model.  The more accurate the cost
model the better but the controller adapts based on IO completion
latency and as long as the relative costs across differents IO
patterns are consistent and sensible, it'll adapt to the actual
performance of the device.

Currently, the only implemented cost model is a simple linear one with
a few sets of default parameters for different classes of device.
This covers most common devices reasonably well.  All the
infrastructure to tune and add different cost models is already in
place and a later patch will also allow using bpf progs for cost
models.

Please see the top comment in blk-iocost.c and documentation for
more details.

v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix
    for a divide-by-zero bug in current_hweight() triggered by zero
    inuse_sum.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:12 -06:00
Tejun Heo
6f816b4b74 blk-mq: add optional request->alloc_time_ns
There are currently two start time timestamps - start_time_ns and
io_start_time_ns.  The former marks the request allocation and and the
second issue-to-device time.  The planned io.weight controller needs
to measure the total time bios take to execute after it leaves rq_qos
including the time spent waiting for request to become available,
which can easily dominate on saturated devices.

This patch adds request->alloc_time_ns which records when the request
allocation attempt started.  As it isn't used for the usual stats,
make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
no users and it's active only on queues which need it even when
compiled in.

v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
    gating as suggested by Jens.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:10 -06:00
Tejun Heo
beab17fc2a blkcg: s/RQ_QOS_CGROUP/RQ_QOS_LATENCY/
io.weight is gonna be another rq_qos cgroup mechanism.  Let's rename
RQ_QOS_CGROUP which is being used by io.latency to RQ_QOS_LATENCY in
preparation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:08 -06:00
Tejun Heo
9677a3e01f block/rq_qos: implement rq_qos_ops->queue_depth_changed()
wbt already gets queue depth changed notification through
wbt_set_queue_depth().  Generalize it into
rq_qos_ops->queue_depth_changed() so that other rq_qos policies can
easily hook into the events too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:07 -06:00
Tejun Heo
d3e65ffff6 block/rq_qos: add rq_qos_merge()
Add a merge hook for rq_qos.  This will be used by io.weight.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:05 -06:00
Tejun Heo
015d254cb0 blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep()
Separate out blkcg_conf_get_disk() so that it can be used by blkcg
policy interface file input parsers before the policy is actually
enabled.  This doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:04 -06:00
Tejun Heo
86a5bba5c2 blkcg: make ->cpd_init_fn() optional
For policies which can do enough initialization from ->cpd_alloc_fn(),
make ->cpd_init_fn() optional.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:03 -06:00
Tejun Heo
cf09a8ee19 blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()
Instead of @node, pass in @q and @blkcg so that the alloc function has
more context.  This doesn't cause any behavior change and will be used
by io.weight implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28 21:17:01 -06:00
Ming Lei
cecf5d87ff block: split .sysfs_lock into two locks
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.

However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].

On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.

So split .sysfs_lock into two: one is still named as .sysfs_lock for
covering sync .store, the other one is named as .sysfs_dir_lock
for covering kobjects and related status change.

sysfs itself can handle the race between add/remove kobjects and
showing/storing attributes under kobjects. For switching scheduler
via storing to 'queue/scheduler', we use the queue flag of
QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
we can avoid to hold .sysfs_lock during removing/adding kobjects.

[1]  lockdep warning
    ======================================================
    WARNING: possible circular locking dependency detected
    5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
    ------------------------------------------------------
    rmmod/777 is trying to acquire lock:
    00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72

    but task is already holding lock:
    00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&q->sysfs_lock){+.+.}:
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           __mutex_lock+0x14a/0xa9b
           blk_mq_hw_sysfs_show+0x63/0xb6
           sysfs_kf_seq_show+0x11f/0x196
           seq_read+0x2cd/0x5f2
           vfs_read+0xc7/0x18c
           ksys_read+0xc4/0x13e
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #0 (kn->count#202){++++}:
           check_prev_add+0x5d2/0xc45
           validate_chain+0xed3/0xf94
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           __kernfs_remove+0x237/0x40b
           kernfs_remove_by_name_ns+0x59/0x72
           remove_files+0x61/0x96
           sysfs_remove_group+0x81/0xa4
           sysfs_remove_groups+0x3b/0x44
           kobject_del+0x44/0x94
           blk_mq_unregister_dev+0x83/0xdd
           blk_unregister_queue+0xa0/0x10b
           del_gendisk+0x259/0x3fa
           null_del_dev+0x8b/0x1c3 [null_blk]
           null_exit+0x5c/0x95 [null_blk]
           __se_sys_delete_module+0x204/0x337
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&q->sysfs_lock);
                                   lock(kn->count#202);
                                   lock(&q->sysfs_lock);
      lock(kn->count#202);

     *** DEADLOCK ***

    2 locks held by rmmod/777:
     #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
     #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

    stack backtrace:
    CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
    Call Trace:
     dump_stack+0x9a/0xe6
     check_noncircular+0x207/0x251
     ? print_circular_bug+0x32a/0x32a
     ? find_usage_backwards+0x84/0xb0
     check_prev_add+0x5d2/0xc45
     validate_chain+0xed3/0xf94
     ? check_prev_add+0xc45/0xc45
     ? mark_lock+0x11b/0x804
     ? check_usage_forwards+0x1ca/0x1ca
     __lock_acquire+0x95f/0xa2f
     lock_acquire+0x1b4/0x1e8
     ? kernfs_remove_by_name_ns+0x59/0x72
     __kernfs_remove+0x237/0x40b
     ? kernfs_remove_by_name_ns+0x59/0x72
     ? kernfs_next_descendant_post+0x7d/0x7d
     ? strlen+0x10/0x23
     ? strcmp+0x22/0x44
     kernfs_remove_by_name_ns+0x59/0x72
     remove_files+0x61/0x96
     sysfs_remove_group+0x81/0xa4
     sysfs_remove_groups+0x3b/0x44
     kobject_del+0x44/0x94
     blk_mq_unregister_dev+0x83/0xdd
     blk_unregister_queue+0xa0/0x10b
     del_gendisk+0x259/0x3fa
     ? disk_events_poll_msecs_store+0x12b/0x12b
     ? check_flags+0x1ea/0x204
     ? mark_held_locks+0x1f/0x7a
     null_del_dev+0x8b/0x1c3 [null_blk]
     null_exit+0x5c/0x95 [null_blk]
     __se_sys_delete_module+0x204/0x337
     ? free_module+0x39f/0x39f
     ? blkcg_maybe_throttle_current+0x8a/0x718
     ? rwlock_bug+0x62/0x62
     ? __blkcg_punt_bio_submit+0xd0/0xd0
     ? trace_hardirqs_on_thunk+0x1a/0x20
     ? mark_held_locks+0x1f/0x7a
     ? do_syscall_64+0x4c/0x295
     do_syscall_64+0xa7/0x295
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7fb696cdbe6b
    Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
    RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
    RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
    RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
    R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
    R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:40:20 -06:00
Ming Lei
58c898ba37 block: add helper for checking if queue is registered
There are 4 users which check if queue is registered, so add one helper
to check it.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:40:20 -06:00
Ming Lei
c6ba933358 blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue
blk_mq_map_swqueue() is called from blk_mq_init_allocated_queue()
and blk_mq_update_nr_hw_queues(). For the former caller, the kobject
isn't exposed to userspace yet. For the latter caller, hctx sysfs entries
and debugfs are un-registered before updating nr_hw_queues.

On the other hand, commit 2f8f1336a4 ("blk-mq: always free hctx after
request queue is freed") moves freeing hctx into queue's release
handler, so there won't be race with queue release path too.

So don't hold q->sysfs_lock in blk_mq_map_swqueue().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:40:20 -06:00
Ming Lei
c48dac137a block: don't hold q->sysfs_lock in elevator_init_mq
The original comment says:

	q->sysfs_lock must be held to provide mutual exclusion between
	elevator_switch() and here.

Which is simply wrong. elevator_init_mq() is only called from
blk_mq_init_allocated_queue, which is always called before the request
queue is registered via blk_register_queue(), for dm-rq or normal rq
based driver. However, queue's kobject is only exposed and added to sysfs
in blk_register_queue(). So there isn't such race between elevator_switch()
and elevator_init_mq().

So avoid to hold q->sysfs_lock in elevator_init_mq().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:40:20 -06:00
Bart Van Assche
9685b22702 block: Remove blk_mq_register_dev()
This function has no callers. Hence remove it.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:40:19 -06:00
Christoph Hellwig
d1916c86cc block: move same page handling from __bio_add_pc_page to the callers
Hiding page refcount manipulation inside a low-level bio helper is
somewhat awkward.  Instead return the same page information to the
callers, where it fits in much better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-22 07:14:39 -06:00
Christoph Hellwig
384209cd5b block: create a bio_try_merge_pc_page helper
Passsthrough bio handling should be the same as normal bio handling,
except that we need to take hardware limitations into account.  Thus
use the common try_merge implementation after checking the hardware
limits.  This changes behavior in that we now also check segment
and dma boundary settings for same page merges, which is a little
more work but has no effect as those need to be larger than the
page size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-22 07:14:38 -06:00
Christoph Hellwig
320ea869a1 block: improve the gap check in __bio_add_pc_page
If we can add more data into an existing segment we do not create a gap
per definition, so move the check for a gap after the attempt to merge
into the segment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-22 07:14:36 -06:00
Revanth Rajashekar
238bdcdf5d block: sed-opal: Removed duplicate OPAL_METHOD_LENGTH definition
The original commit adding the sed-opal library by mistake added two
definitions of OPAL_METHOD_LENGTH, remove one of them.

Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20 09:34:49 -06:00
Revanth Rajashekar
89c6cc2cab block: sed-opal: Remove always false conditional statement
In the function 'response_parse', num_entries will never be 0 as
slen is checked for 0. Hence, the condition 'if (num_entries == 0)'
can never be true.

Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20 09:33:21 -06:00
Revanth Rajashekar
5cc23ed75b block: sed-opal: Add/remove spaces
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20 09:33:20 -06:00
Junxiao Bi
988721db93 block: remove struct request_queue queue_head
The dispatch list is not used any more, as the legacy block IO stack
has been removed.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-19 08:55:10 -06:00
Jens Axboe
7b6620d7db block: remove REQ_NOWAIT_INLINE
We had a few issues with this code, and there's still a problem around
how we deal with error handling for chained/split bios. For now, just
revert the code and we'll try again with a thoroug solution. This
reverts commits:

e15c2ffa10 ("block: fix O_DIRECT error handling for bio fragments")
0eb6ddfb86 ("block: Fix __blkdev_direct_IO() for bio fragments")
6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
893a1c9720 ("blk-mq: allow REQ_NOWAIT to return an error inline")

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15 11:09:16 -06:00
Johannes Weiner
b8e24a9300 block: annotate refault stalls from IO submission
psi tracks the time tasks wait for refaulting pages to become
uptodate, but it does not track the time spent submitting the IO. The
submission part can be significant if backing storage is contended or
when cgroup throttling (io.latency) is in effect - a lot of time is
spent in submit_bio(). In that case, we underreport memory pressure.

Annotate submit_bio() to account submission time as memory stall when
the bio is reading userspace workingset pages.

Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-14 08:50:01 -06:00
zhengbin
73d9c8d4c0 blk-mq: Fix memory leak in blk_mq_init_allocated_queue error handling
If blk_mq_init_allocated_queue->elevator_init_mq fails, need to release
the previously requested resources.

Fixes: d348499138 ("blk-mq-sched: allow setting of default IO scheduler")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-12 08:15:36 -06:00
zhengbin
e26cc08265 blk-mq: move cancel of requeue_work to the front of blk_exit_queue
blk_exit_queue will free elevator_data, while blk_mq_requeue_work
will access it. Move cancel of requeue_work to the front of
blk_exit_queue to avoid use-after-free.

blk_exit_queue                blk_mq_requeue_work
  __elevator_exit               blk_mq_run_hw_queues
    blk_mq_exit_sched             blk_mq_run_hw_queue
      dd_exit_queue                 blk_mq_hctx_has_pending
        kfree(elevator_data)          blk_mq_sched_has_work
                                        dd_has_work

Fixes: fbc2a15e34 ("blk-mq: move cancel of requeue_work into blk_mq_release")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-12 08:14:11 -06:00
Paolo Valente
fd03177c33 block, bfq: handle NULL return value by bfq_init_rq()
As reported in [1], the call bfq_init_rq(rq) may return NULL in case
of OOM (in particular, if rq->elv.icq is NULL because memory
allocation failed in failed in ioc_create_icq()).

This commit handles this circumstance.

[1] https://lkml.org/lkml/2019/7/22/824

Cc: Hsin-Yi Wang <hsinyi@google.com>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Hsin-Yi Wang <hsinyi@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08 07:31:50 -06:00
Paolo Valente
3f758e844a block, bfq: move update of waker and woken list to queue freeing
Since commit 13a857a4c4 ("block, bfq: detect wakers and
unconditionally inject their I/O"), every bfq_queue has a pointer to a
waker bfq_queue and a list of the bfq_queues it may wake. In this
respect, when a bfq_queue, say Q, remains with no I/O source attached
to it, Q cannot be woken by any other bfq_queue, and cannot wake any
other bfq_queue. Then Q must be removed from the woken list of its
possible waker bfq_queue, and all bfq_queues in the woken list of Q
must stop having a waker bfq_queue.

Q remains with no I/O source in two cases: when the last process
associated with Q exits or when such a process gets associated with a
different bfq_queue. Unfortunately, commit 13a857a4c4 ("block, bfq:
detect wakers and unconditionally inject their I/O") performed the
above updates only in the first case.

This commit fixes this bug by moving these updates to when Q gets
freed. This is a simple and safe way to handle all cases, as both the
above events, process exit and re-association, lead to Q being freed
soon, and because dangling references would come out only after Q gets
freed (if no update were performed).

Fixes: 13a857a4c4 ("block, bfq: detect wakers and unconditionally inject their I/O")
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08 07:30:52 -06:00
Paolo Valente
08d383a749 block, bfq: reset last_completed_rq_bfqq if the pointed queue is freed
Since commit 13a857a4c4 ("block, bfq: detect wakers and
unconditionally inject their I/O"), BFQ stores, in a per-device
pointer last_completed_rq_bfqq, the last bfq_queue that had an I/O
request completed. If some bfq_queue receives new I/O right after the
last request of last_completed_rq_bfqq has been completed, then
last_completed_rq_bfqq may be a waker bfq_queue.

But if the bfq_queue last_completed_rq_bfqq points to is freed, then
last_completed_rq_bfqq becomes a dangling reference. This commit
resets last_completed_rq_bfqq if the pointed bfq_queue is freed.

Fixes: 13a857a4c4 ("block, bfq: detect wakers and unconditionally inject their I/O")
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08 07:30:50 -06:00
Hans Holmberg
00ec4f3039 block: stop exporting bio_map_kern
Now that there no module users left of bio_map_kern, stop exporting the
symbol.

Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hans Holmberg <hans@owltronix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-06 08:20:10 -06:00
Ming Lei
556f36e90d blk-mq: balance mapping between present CPUs and queues
Spread queues among present CPUs first, then building mapping on other
non-present CPUs.

So we can minimize count of dead queues which are mapped by un-present
CPUs only. Then bad IO performance can be avoided by unbalanced mapping
between present CPUs and queues.

The similar policy has been applied on Managed IRQ affinity.

Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:43:12 -06:00
Chaitanya Kulkarni
6e33dbf280 blk-zoned: implement REQ_OP_ZONE_RESET_ALL
This implements REQ_OP_ZONE_RESET_ALL as a special case of the block
device zone reset operations where we just simply issue bio with the
newly introduced req op.

We issue this req op when the number of sectors is equal to the device's
partition's number of sectors and device has no partitions.

We also add support so that blk_op_str() can print the new reset-all
zone operation.

This patch also adds a generic make request check for newly
introduced REQ_OP_ZONE_RESET_ALL req_opf. We simply return error
when queue is zoned and reset-all flag is not set for
REQ_OP_ZONE_RESET_ALL.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Bart Van Assche
67ed8b7386 block: Fix a comment in blk_cleanup_queue()
Change a reference to the legacy block layer into a reference to blk-mq.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Bart Van Assche
9cc5169cd4 block: Improve physical block alignment of split bios
Consider the following example:
* The logical block size is 4 KB.
* The physical block size is 8 KB.
* max_sectors equals (16 KB >> 9) sectors.
* A non-aligned 4 KB and an aligned 64 KB bio are merged into a single
  non-aligned 68 KB bio.

The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB
+ 16 KB + 4 KB). The start of none of these five bio's is aligned to a
physical block boundary.

This patch ensures that such a bio is split into four aligned and
one non-aligned bio instead of being split into five non-aligned bios.
This improves performance because most block devices can handle aligned
requests faster than non-aligned requests.

Since the physical block size is larger than or equal to the logical
block size, this patch preserves the guarantee that the returned
value is a multiple of the logical block size.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Bart Van Assche
708b25b344 block: Simplify blk_bio_segment_split()
Move the max_sectors check into bvec_split_segs() such that a single
call to that function can do all the necessary checks. This patch
optimizes the fast path further, namely if a bvec fits in a page.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Bart Van Assche
ff9811b3cf block: Simplify bvec_split_segs()
Simplify this function by by removing two if-tests. Other than requiring
that the @sectors pointer is not NULL, this patch does not change the
behavior of bvec_split_segs().

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Bart Van Assche
dad7758459 block: Document the bio splitting functions
Since what the bio splitting functions do is nontrivial, document these
functions.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Bart Van Assche
af2c68fe94 block: Declare several function pointer arguments 'const'
Make it clear to the compiler and also to humans that the functions
that query request queue properties do not modify any member of the
request_queue data structure.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Ming Lei
a87ccce0b5 blk-mq: remove blk_mq_complete_request_sync
blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Ming Lei
f9934a80f9 blk-mq: introduce blk_mq_tagset_wait_completed_request()
blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Ming Lei
aa306ab703 blk-mq: introduce blk_mq_request_completed()
NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.

So introduce it.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04 21:41:29 -06:00
Thomas Gleixner
9dd8813ed9 hrtimer/treewide: Use hrtimer_sleeper_start_expires()
hrtimer_sleepers will gain a scheduling class dependent treatment on
PREEMPT_RT. Use the new hrtimer_sleeper_start_expires() function to make
that possible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-08-01 17:43:16 +02:00
Sebastian Andrzej Siewior
dbc1625fc9 hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls
hrtimer_init_sleeper() calls require prior initialisation of the hrtimer
object which is embedded into the hrtimer_sleeper.

Combine the initialization and spare a function call. Fixup all call sites.

This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper
specific initializations of the embedded hrtimer without modifying any of
the call sites.

No functional change.

[ anna-maria: Minor cleanups ]
[ tglx: Adopted to the removal of the task argument of
  	hrtimer_init_sleeper() and trivial polishing.
	Folded a fix from Stephen Rothwell for the vsoc code ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.887468908@linutronix.de
2019-08-01 17:43:15 +02:00
Thomas Gleixner
b744948725 hrtimer: Remove task argument from hrtimer_init_sleeper()
All callers hand in 'current' and that's the only task pointer which
actually makes sense. Remove the task argument and set current in the
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.791885290@linutronix.de
2019-07-30 23:57:51 +02:00
Linus Torvalds
5168afe6ef for-linus-20190726-2
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl07oAsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiDOD/wLARyG0xGhavA7WQFjtCSyASJdfH5wz/zE
 lCpYWTEuD5fZlrVLg3wyKdJKq2g+nqtHxo8I1z7CvdJmky9hRlHqWmJqCVzS3hzq
 dkskY8ZlXDzDxvVOs+3QgNrhZrZF+ScnmcSsO2+UHmmdlXZiyWRA5pgzgI9AA77o
 HbF4izj86jhlKnuKgtNB9RoNbUEnmxBCksi+4+lU1zEM416/l0aXzBO+l3bYW0mn
 td8/oruXLB3v49dMqo/Xy10/+6PYsvVs+6gvT2sCCr6pMyM5XwWyZzEdRB448Nem
 1f+trlxCmLo3f81YsQOkMYD0dRnmNHGTk2BOazkJd7ZytKpjSUMgIkk9ty7R27kR
 Ct1cfxqfp6IyEwIwVmGpo7HW486wytmuq0WZGsqc2G0Cg23QIRE4/HVQypMUemgk
 RGCx5CBJLGZHqrHMzTGhU31hY5XPcd5dBd9W/UdloFP3ta3jkd2sFqzzevqtQhfV
 Mbva3YJCujQObWFJd7+L+LRrW1mLFecnKJZYKetOvDQ48gAfy7OQePZRTmcM0YhH
 VGbj8dRnXLmF1b+4KBnPni7xBLKN14zdvm7jnFViGjCG9CESHm3Gv2/4ZHqaj9kq
 gK7Ze9cSCXwk2R235m0/DucCUiFOLcvpcFEgtW+h+0jio7NEN/3NIZ00gepKyuhp
 S7GZhmFfRw==
 =iI4A
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190726-2' of git://git.kernel.dk/linux-block

Pull block DMA segment fix from Jens Axboe:
 "Here's the virtual boundary segment size fix"

* tag 'for-linus-20190726-2' of git://git.kernel.dk/linux-block:
  block: fix max segment size handling in blk_queue_virt_boundary
2019-07-26 19:20:34 -07:00
Christoph Hellwig
c6c84f78e2 block: fix max segment size handling in blk_queue_virt_boundary
We should only set the max segment size to unlimited if we actually
have a virt boundary.  Otherwise we accidentally clear that limit
when called from the SCSI midlayer, which always calls
blk_queue_virt_boundary, even if that mask is 0.

Fixes: 7ad388d8e4 ("scsi: core: add a host / host template field for the virt boundary")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-26 12:50:57 -06:00
Linus Torvalds
0441281965 for-linus-20190726
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl07DGAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplf5EADOOvOdsz9N/Iw8ZHHHJXCqKR26zZv75G1z
 0h1PGC7p0JZQbYFo0Zo7mjiRBGlg6tlXc2d4Gyl94XJKDwjeYTcFDvbvERdYa+MH
 d2RiFkAfR967Ri4fb+FP5L3mYOQdMJ/zk0xCDHLv/DcxeFLa5a9EJS1+vBSR+AcB
 0JpJWuHypGqGmbTaL0z9q2pmx0mgA1ERlWQtkMLrsEr2Vqg/rrjGwe2bGFY00lXc
 vKtFkpfugKc4zVAPSzC1YZgojfDDpGNEA4QMtxMsEH4hqyMpHhrtUedNY5QrjC0B
 p9h6aPXXYr2KhGP0grrEytzaYUOzK2crK5h+q+1vu6nOgx2EgmnLM9tBu/LuRH1j
 uUzKJOa3/AE+bU7uZEsaUerTBsHrgEBa1x8G92obYRnjgW3aCD2CaSbjjBhNxTZ4
 1dXyr0DTHFXZmfcfWja5tO26JTPzjwVOrwiRyU0S727UsdVJupoHiYLr5fwaDfgn
 /Du2I/XWvFtflm5i0ND0sdcX1yRlFiGZ9e45z1QFaFmcteKKWzRBDlC6mQzI/lw3
 oc583mhDR3tRtJxow+wn6AuMUehFRh8wj0UhL/MEMjLW8GiqXU5aRtanT+22Xz4L
 saNDQieeEnV7raMYXMP0qIhkJtrNASmJQos+MOJAEGOWcS2ePIUUio2kSXie+071
 BphJd2RamQ==
 =HIzH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190726' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Several io_uring fixes/improvements:
     - Blocking fix for O_DIRECT (me)
     - Latter page slowness for registered buffers (me)
     - Fix poll hang under certain conditions (me)
     - Defer sequence check fix for wrapped rings (Zhengyuan)
     - Mismatch in async inc/dec accounting (Zhengyuan)
     - Memory ordering issue that could cause stall (Zhengyuan)
      - Track sequential defer in bytes, not pages (Zhengyuan)

 - NVMe pull request from Christoph

 - Set of hang fixes for wbt (Josef)

 - Redundant error message kill for libahci (Ding)

 - Remove unused blk_mq_sched_started_request() and related ops (Marcos)

 - drbd dynamic alloc shash descriptor to reduce stack use (Arnd)

 - blkcg ->pd_stat() non-debug print (Tejun)

 - bcache memory leak fix (Wei)

 - Comment fix (Akinobu)

 - BFQ perf regression fix (Paolo)

* tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits)
  io_uring: ensure ->list is initialized for poll commands
  Revert "nvme-pci: don't create a read hctx mapping without read queues"
  nvme: fix multipath crash when ANA is deactivated
  nvme: fix memory leak caused by incorrect subsystem free
  nvme: ignore subnqn for ADATA SX6000LNP
  drbd: dynamically allocate shash descriptor
  block: blk-mq: Remove blk_mq_sched_started_request and started_request
  bcache: fix possible memory leak in bch_cached_dev_run()
  io_uring: track io length in async_list based on bytes
  io_uring: don't use iov_iter_advance() for fixed buffers
  block: properly handle IOCB_NOWAIT for async O_DIRECT IO
  blk-mq: allow REQ_NOWAIT to return an error inline
  io_uring: add a memory barrier before atomic_read
  rq-qos: use a mb for got_token
  rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
  rq-qos: don't reset has_sleepers on spurious wakeups
  rq-qos: fix missed wake-ups in rq_qos_throttle
  wait: add wq_has_single_sleeper helper
  block, bfq: check also in-flight I/O in dispatch plugging
  block: fix sysfs module parameters directory path in comment
  ...
2019-07-26 10:32:12 -07:00
Marcos Paulo de Souza
327fe1d42b block: blk-mq: Remove blk_mq_sched_started_request and started_request
blk_mq_sched_completed_request is a function that checks if the elevator
related to the request has started_request implemented, but currently, none of
the available IO schedulers implement started_request, so remove both.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-23 07:25:09 -06:00
Jens Axboe
893a1c9720 blk-mq: allow REQ_NOWAIT to return an error inline
By default, if a caller sets REQ_NOWAIT and we need to block, we'll
return -EAGAIN through the bio->bi_end_io() callback. For some use
cases, this makes it hard to use.

Allow a caller to ask for inline return of errors related to
blocking by also setting REQ_NOWAIT_INLINE.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-21 21:46:23 -06:00
Josef Bacik
ac38297f70 rq-qos: use a mb for got_token
Oleg noticed that our checking of data.got_token is unsafe in the
cleanup case, and should really use a memory barrier.  Use a wmb on the
write side, and a rmb() on the read side.  We don't need one in the main
loop since we're saved by set_current_state().

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18 10:20:14 -06:00
Josef Bacik
d14a9b389a rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
In case we get a spurious wakeup we need to make sure to re-set
ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18 10:20:13 -06:00
Josef Bacik
64e7ea875e rq-qos: don't reset has_sleepers on spurious wakeups
If we raced with somebody else getting an inflight counter we could fail
to get an inflight counter with no sleepers on the list, and thus need
to go to sleep.  In this case has_sleepers should be true because we are
now relying on the waker to get our inflight counter for us.  And in the
case of spurious wakeups we'd still want this to be the case.  So set
has_sleepers to true if we went to sleep to make sure we're woken up the
proper way.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18 10:20:13 -06:00
Josef Bacik
545fbd0775 rq-qos: fix missed wake-ups in rq_qos_throttle
We saw a hang in production with WBT where there was only one waiter in
the throttle path and no outstanding IO.  This is because of the
has_sleepers optimization that is used to make sure we don't steal an
inflight counter for new submitters when there are people already on the
list.

We can race with our check to see if the waitqueue has any waiters (this
is done locklessly) and the time we actually add ourselves to the
waitqueue.  If this happens we'll go to sleep and never be woken up
because nobody is doing IO to wake us up.

Fix this by checking if the waitqueue has a single sleeper on the list
after we add ourselves, that way we have an uptodate view of the list.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18 10:20:13 -06:00
Paolo Valente
b5e02b484d block, bfq: check also in-flight I/O in dispatch plugging
Consider a sync bfq_queue Q that remains empty while in service, and
suppose that, when this happens, there is a fair amount of already
in-flight I/O not belonging to Q. In such a situation, I/O dispatching
may need to be plugged (until new I/O arrives for Q), for the
following reason.

The drive may decide to serve in-flight non-Q's I/O requests before
Q's ones, thereby delaying the arrival of new I/O requests for Q
(recall that Q is sync). If I/O-dispatching is not plugged, then,
while Q remains empty, a basically uncontrolled amount of I/O from
other queues may be dispatched too, possibly causing the service of
Q's I/O to be delayed even longer in the drive. This problem gets more
and more serious as the speed and the queue depth of the drive grow,
because, as these two quantities grow, the probability to find no
queue busy but many requests in flight grows too.

If Q has the same weight and priority as the other queues, then the
above delay is unlikely to cause any issue, because all queues tend to
undergo the same treatment. So, since not plugging I/O dispatching is
convenient for throughput, it is better not to plug. Things change in
case Q has a higher weight or priority than some other queue, because
Q's service guarantees may simply be violated. For this reason,
commit 1de0c4cd9e ("block, bfq: reduce idling only in symmetric
scenarios") does plug I/O in such an asymmetric scenario. Plugging
minimizes the delay induced by already in-flight I/O, and enables Q to
recover the bandwidth it may lose because of this delay.

Yet the above commit does not cover the case of weight-raised queues,
for efficiency concerns. For weight-raised queues, I/O-dispatch
plugging is activated simply if not all bfq_queues are
weight-raised. But this check does not handle the case of in-flight
requests, because a bfq_queue may become non busy *before* all its
in-flight requests are completed.

This commit performs I/O-dispatch plugging for weight-raised queues if
there are some in-flight requests.

As a practical example of the resulting recover of control, under
write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5
seconds after this fix, against 15 seconds before the fix (as a
reference, gnome-terminal takes about 35 seconds to start with any of
the other I/O schedulers).

Fixes: 1de0c4cd9e ("block, bfq: reduce idling only in symmetric scenarios")
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18 07:22:15 -06:00
Linus Torvalds
c309b6f242 docs conversion for v5.3-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl0tpocACgkQCF8+vY7k
 4RWoxA//b/fmDXP3WPzrjjSmpyB9ml0/epKzPbT5S2j0lftqKBmet29k+PCjVrTx
 Nq2QauehY9ug5h8UMVUCmzPr95F0tSIGRoqk1vrn7z0K3q6k1SHrtvqbY1Bgb2Uk
 Qvh2YFU4fQLJg8WAbExCjxCdbdmBKQVGKTwCtM+tP5OMxwAFOmQrjGaUaKCKIIA2
 7Wzrx8CpSji+bJ3uK/d36c+4M9oDly5eaxBhoboL3BI0y+GqwiSASGwTO7BxrPOg
 0wq5IZHnqS8+bprT9xQdDOqf+UOY9U1cxE/+sqsHxblfUEx9gfLy/R+FLmJn+SS9
 Z3yLy4SqVHQMpWBjEAGodohikF60PAuTdymSC11jqFaKCUxWrIZg5xO+0blMrxPF
 7vYIexutCkaBMHBlNaNsHIqB7B/2FGGKoN7QW64hwvwJCGvF7OmJcV+R4bROGvh4
 nFuis9/Nm66Fq7I3aw37ThyZ0aWZdaQ0QJTH9ksxU/ZCz2hhMNYu/rXggrDvkS4U
 nr77ZT5Gd7nj4b110zf8+99uiGiinY6hTfzPAuTCLBhaxwrv4/xDHAhpwdEB5T4j
 8gOkxV8c0XWtL7sKqhGJvs/RRe2za0Y9XH6fyxsYfWcfuLjEvug8ouXMad9gxFWH
 DL3WnKJEMGLScei2wux4kGOwEbkR1bUf2cHJfh3GpCB/y8vgLOc=
 =smxY
 -----END PGP SIGNATURE-----

Merge tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull rst conversion of docs from Mauro Carvalho Chehab:
 "As agreed with Jon, I'm sending this big series directly to you, c/c
  him, as this series required a special care, in order to avoid
  conflicts with other trees"

* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
  docs: kbuild: fix build with pdf and fix some minor issues
  docs: block: fix pdf output
  docs: arm: fix a breakage with pdf output
  docs: don't use nested tables
  docs: gpio: add sysfs interface to the admin-guide
  docs: locking: add it to the main index
  docs: add some directories to the main documentation index
  docs: add SPDX tags to new index files
  docs: add a memory-devices subdir to driver-api
  docs: phy: place documentation under driver-api
  docs: serial: move it to the driver-api
  docs: driver-api: add remaining converted dirs to it
  docs: driver-api: add xilinx driver API documentation
  docs: driver-api: add a series of orphaned documents
  docs: admin-guide: add a series of orphaned documents
  docs: cgroup-v1: add it to the admin-guide book
  docs: aoe: add it to the driver-api book
  docs: add some documentation dirs to the driver-api book
  docs: driver-model: move it to the driver-api book
  docs: lp855x-driver.rst: add it to the driver-api book
  ...
2019-07-16 12:21:41 -07:00
Akinobu Mita
1624b0b200 block: fix sysfs module parameters directory path in comment
The runtime configurable module parameter files are located under
/sys/module/MODULENAME/parameters, not /sys/module/MODULENAME.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-16 10:16:33 -06:00
Tejun Heo
07b0fdecb2 blkcg: allow blkcg_policy->pd_stat() to print non-debug info too
Currently, ->pd_stat() is called only when moduleparam
blkcg_debug_stats is set which prevents it from printing non-debug
policy-specific statistics.  Let's move debug testing down so that
->pd_stat() can print non-debug stat too.  This patch doesn't cause
any visible behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-16 10:06:39 -06:00
Mauro Carvalho Chehab
4f4cfa6c56 docs: admin-guide: add a series of orphaned documents
There are lots of documents that belong to the admin-guide but
are on random places (most under Documentation root dir).

Move them to the admin guide.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2019-07-15 11:03:02 -03:00
Mauro Carvalho Chehab
da82c92f11 docs: cgroup-v1: add it to the admin-guide book
Those files belong to the admin guide, so add them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 11:03:02 -03:00
Mauro Carvalho Chehab
898bd37a92 docs: block: convert to ReST
Rename the block documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 09:20:27 -03:00
Damien Le Moal
26202928fa block: Limit zone array allocation size
Limit the size of the struct blk_zone array used in
blk_revalidate_disk_zones() to avoid memory allocation failures leading
to disk revalidation failure. Also further reduce the likelyhood of
such failures by using kvcalloc() (that is vmalloc()) instead of
allocating contiguous pages with alloc_pages().

Fixes: 515ce60613 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a374 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11 20:04:40 -06:00
Damien Le Moal
bd976e5272 block: Kill gfp_t argument of blkdev_report_zones()
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11 20:04:37 -06:00
Damien Le Moal
b4c5875d36 block: Allow mapping of vmalloc-ed buffers
To allow the SCSI subsystem scsi_execute_req() function to issue
requests using large buffers that are better allocated with vmalloc()
rather than kmalloc(), modify bio_map_kern() to allow passing a buffer
allocated with vmalloc().

To do so, detect vmalloc-ed buffers using is_vmalloc_addr(). For
vmalloc-ed buffers, flush the buffer using flush_kernel_vmap_range(),
use vmalloc_to_page() instead of virt_to_page() to obtain the pages of
the buffer, and invalidate the buffer addresses with
invalidate_kernel_vmap_range() on completion of read BIOs. This last
point is executed using the function bio_invalidate_vmalloc_pages()
which is defined only if the architecture defines
ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, that is, if the architecture
actually needs the invalidation done.

Fixes: 515ce60613 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a374 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11 20:04:36 -06:00
Wenwen Wang
e7bf90e5af block/bio-integrity: fix a memory leak bug
In bio_integrity_prep(), a kernel buffer is allocated through kmalloc() to
hold integrity metadata. Later on, the buffer will be attached to the bio
structure through bio_integrity_add_page(), which returns the number of
bytes of integrity metadata attached. Due to unexpected situations,
bio_integrity_add_page() may return 0. As a result, bio_integrity_prep()
needs to be terminated with 'false' returned to indicate this error.
However, the allocated kernel buffer is not freed on this execution path,
leading to a memory leak.

To fix this issue, free the allocated buffer before returning from
bio_integrity_prep().

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11 20:01:21 -06:00
Damien Le Moal
b49773e7bc block: Disable write plugging for zoned block devices
Simultaneously writing to a sequential zone of a zoned block device
from multiple contexts requires mutual exclusion for BIO issuing to
ensure that writes happen sequentially. However, even for a well
behaved user correctly implementing such synchronization, BIO plugging
may interfere and result in BIOs from the different contextx to be
reordered if plugging is done outside of the mutual exclusion section,
e.g. the plug was started by a function higher in the call chain than
the function issuing BIOs.

         Context A                     Context B

   | blk_start_plug()
   | ...
   | seq_write_zone()
     | mutex_lock(zone)
     | bio-0->bi_iter.bi_sector = zone->wp
     | zone->wp += bio_sectors(bio-0)
     | submit_bio(bio-0)
     | bio-1->bi_iter.bi_sector = zone->wp
     | zone->wp += bio_sectors(bio-1)
     | submit_bio(bio-1)
     | mutex_unlock(zone)
     | return
   | -----------------------> | seq_write_zone()
  				| mutex_lock(zone)
     				| bio-2->bi_iter.bi_sector = zone->wp
     				| zone->wp += bio_sectors(bio-2)
				| submit_bio(bio-2)
				| mutex_unlock(zone)
   | <------------------------- |
   | blk_finish_plug()

In the above example, despite the mutex synchronization ensuring the
correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being
issued after BIO 2 of context B, when the plug is released with
blk_finish_plug().

While this problem can be addressed using the blk_flush_plug_list()
function (in the above example, the call must be inserted before the
zone mutex lock is released), a simple generic solution in the block
layer avoid this additional code in all zoned block device user code.
The simple generic solution implemented with this patch is to introduce
the internal helper function blk_mq_plug() to access the current
context plug on BIO submission. This helper returns the current plug
only if the target device is not a zoned block device or if the BIO to
be plugged is not a write operation. Otherwise, the caller context plug
is ignored and NULL returned, resulting is all writes to zoned block
device to never be plugged.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 14:18:01 -06:00
Konstantin Khlebnikov
3a10f999ff blk-throttle: fix zero wait time for iops throttled group
After commit 991f61fe7e ("Blk-throttle: reduce tail io latency when
iops limit is enforced") wait time could be zero even if group is
throttled and cannot issue requests right now. As a result
throtl_select_dispatch() turns into busy-loop under irq-safe queue
spinlock.

Fix is simple: always round up target time to the next throttle slice.

Fixes: 991f61fe7e ("Blk-throttle: reduce tail io latency when iops limit is enforced")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 09:00:57 -06:00
Damien Le Moal
113ab72ed4 block: Fix potential overflow in blk_report_zones()
For large values of the number of zones reported and/or large zone
sizes, the sector increment calculated with

blk_queue_zone_sectors(q) * n

in blk_report_zones() loop can overflow the unsigned int type used for
the calculation as both "n" and blk_queue_zone_sectors() value are
unsigned int. E.g. for a device with 256 MB zones (524288 sectors),
overflow happens with 8192 or more zones reported.

Changing the return type of blk_queue_zone_sectors() to sector_t, fixes
this problem and avoids overflow problem for all other callers of this
helper too. The same change is also applied to the bdev_zone_sectors()
helper.

Fixes: e76239a374 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 09:00:57 -06:00
Tejun Heo
d3f77dfdc7 blkcg: implement REQ_CGROUP_PUNT
When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup.  This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.

This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.

Cc: Chris Mason <clm@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 09:00:57 -06:00
Tejun Heo
9b0eb69b75 cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages
btrfs is going to use css_put() and wbc helpers to improve cgroup
writeback support.  Add dummy css_get() definition and export wbc
helpers to prepare for module and !CONFIG_CGROUP builds.

Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 09:00:57 -06:00
Josef Bacik
fd112c7465 blk-cgroup: turn on psi memstall stuff
With the psi stuff in place we can use the memstall flag to indicate
pressure that happens from throttling.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 09:00:57 -06:00
Josef Bacik
b554db147f block: init flush rq ref count to 1
We discovered a problem in newer kernels where a disconnect of a NBD
device while the flush request was pending would result in a hang.  This
is because the blk mq timeout handler does

        if (!refcount_inc_not_zero(&rq->ref))
                return true;

to determine if it's ok to run the timeout handler for the request.
Flush_rq's don't have a ref count set, so we'd skip running the timeout
handler for this request and it would just sit there in limbo forever.

Fix this by always setting the refcount of any request going through
blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
requests to verify that it hung, and then tested with this patch to
verify I got the timeout as expected and the error handling kicked in.
Thanks,

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10 09:00:57 -06:00
Linus Torvalds
3b99107f0e for-5.3/block-20190708
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0jrIMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptlFD/9CNsBX+Aap2lO6wKNr6QISwNAK76GMzEay
 s4LSY2kGkXvzv8i89mCuY+8UVNI8WH2/22WnU+8CBAJOjWyFQMsIwH/mrq0oZWRD
 J6STJE8rTr6Fc2MvJUWryp/xdBh3+eDIsAdIZVHVAkIzqYPBnpIAwEIeIw8t0xsm
 v9ngpQ3WD6ep8tOj9pnG1DGKFg1CmukZCC/Y4CQV1vZtmm2I935zUwNV/TB+Egfx
 G8JSC0cSV02LMK88HCnA6MnC/XSUC0qgfXbnmP+TpKlgjVX+P/fuB3oIYcZEu2Rk
 3YBpIkhsQytKYbF42KRLsmBH72u6oB9G+tNZTgB1STUDrZqdtD9xwX1rjDlY0ZzP
 EUDnk48jl/cxbs+VZrHoE2TcNonLiymV7Kb92juHXdIYmKFQStprGcQUbMaTkMfB
 6BYrYLifWx0leu1JJ1i7qhNmug94BYCSCxcRmH0p6kPazPcY9LXNmDWMfMuBPZT7
 z79VLZnHF2wNXJyT1cBluwRYYJRT4osWZ3XUaBWFKDgf1qyvXJfrN/4zmgkEIyW7
 ivXC+KLlGkhntDlWo2pLKbbyOIKY1HmU6aROaI11k5Zyh0ixKB7tHKavK39l+NOo
 YB41+4l6VEpQEyxyRk8tO0sbHpKaKB+evVIK3tTwbY+Q0qTExErxjfWUtOgRWhjx
 iXJssPRo4w==
 =VSYT
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "This is the main block updates for 5.3. Nothing earth shattering or
  major in here, just fixes, additions, and improvements all over the
  map. This contains:

   - Series of documentation fixes (Bart)

   - Optimization of the blk-mq ctx get/put (Bart)

   - null_blk removal race condition fix (Bob)

   - req/bio_op() cleanups (Chaitanya)

   - Series cleaning up the segment accounting, and request/bio mapping
     (Christoph)

   - Series cleaning up the page getting/putting for bios (Christoph)

   - block cgroup cleanups and moving it to where it is used (Christoph)

   - block cgroup fixes (Tejun)

   - Series of fixes and improvements to bcache, most notably a write
     deadlock fix (Coly)

   - blk-iolatency STS_AGAIN and accounting fixes (Dennis)

   - Series of improvements and fixes to BFQ (Douglas, Paolo)

   - debugfs_create() return value check removal for drbd (Greg)

   - Use struct_size(), where appropriate (Gustavo)

   - Two lighnvm fixes (Heiner, Geert)

   - MD fixes, including a read balance and corruption fix (Guoqing,
     Marcos, Xiao, Yufen)

   - block opal shadow mbr additions (Jonas, Revanth)

   - sbitmap compare-and-exhange improvemnts (Pavel)

   - Fix for potential bio->bi_size overflow (Ming)

   - NVMe pull requests:
       - improved PCIe suspent support (Keith Busch)
       - error injection support for the admin queue (Akinobu Mita)
       - Fibre Channel discovery improvements (James Smart)
       - tracing improvements including nvmetc tracing support (Minwoo Im)
       - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
         Kulkarni)"

   - Various little fixes and improvements to drivers and core"

* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
  blk-iolatency: fix STS_AGAIN handling
  block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
  blk-mq: simplify blk_mq_make_request()
  blk-mq: remove blk_mq_put_ctx()
  sbitmap: Replace cmpxchg with xchg
  block: fix .bi_size overflow
  block: sed-opal: check size of shadow mbr
  block: sed-opal: ioctl for writing to shadow mbr
  block: sed-opal: add ioctl for done-mark of shadow mbr
  block: never take page references for ITER_BVEC
  direct-io: use bio_release_pages in dio_bio_complete
  block_dev: use bio_release_pages in bio_unmap_user
  block_dev: use bio_release_pages in blkdev_bio_end_io
  iomap: use bio_release_pages in iomap_dio_bio_end_io
  block: use bio_release_pages in bio_map_user_iov
  block: use bio_release_pages in bio_unmap_user
  block: optionally mark pages dirty in bio_release_pages
  block: move the BIO_NO_PAGE_REF check into bio_release_pages
  block: skd_main.c: Remove call to memset after dma_alloc_coherent
  block: mtip32xx: Remove call to memset after dma_alloc_coherent
  ...
2019-07-09 10:45:06 -07:00
Linus Torvalds
92c1d65221 Merge branch 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Documentation updates and the addition of cgroup_parse_float() which
  will be used by new controllers including blk-iocost"

* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup-v1: convert docs to ReST and rename to *.rst
  cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS
  cgroup: add cgroup_parse_float()
2019-07-08 21:35:12 -07:00
Greg Kroah-Hartman
7e41c3c9b6 blk-mq: fix up placement of debugfs directory of queue files
When the blk-mq debugfs file creation logic was "cleaned up" it was
cleaned up too much, causing the queue file to not be created in the
correct location.  Turns out the check for the directory being present
is needed as if that has not happened yet, the files should not be
created, and the function will be called later on in the initialization
code so that the files can be created in the correct location.

Fixes: 6cfc0081b0 ("blk-mq: no need to check return value of debugfs_create functions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-block@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-06 10:07:38 -06:00
Dennis Zhou
c9b3007fec blk-iolatency: fix STS_AGAIN handling
The iolatency controller is based on rq_qos. It increments on
rq_qos_throttle() and decrements on either rq_qos_cleanup() or
rq_qos_done_bio(). a3fb01ba5a fixes the double accounting issue where
blk_mq_make_request() may call both rq_qos_cleanup() and
rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
double decrement.

The above works upstream as the only way we can get STS_AGAIN is from
blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
problem as bio_endio() skipping only happens on reserved tag allocation
failures which can only be caused by driver bugs and already triggers
WARN.

However, the fix creates a not so great dependency on how STS_AGAIN can
be propagated. Internally, we (Facebook) carry a patch that kills read
ahead if a cgroup is io congested or a fatal signal is pending. This
combined with chained bios progagate their bi_status to the parent is
not already set can can cause the parent bio to not clean up properly
even though it was successful. This consequently leaks the inflight
counter and can hang all IOs under that blkg.

To nip the adverse interaction early, this removes the rq_qos_cleanup()
callback in iolatency in favor of cleaning up always on the
rq_qos_done_bio() path.

Fixes: a3fb01ba5a ("blk-iolatency: only account submitted bios")
Debugged-by: Tejun Heo <tj@kernel.org>
Debugged-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-05 15:14:00 -06:00
Christoph Hellwig
d665e12aa7 block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
Fix a regression introduced when removing bi_phys_segments for Write Zeroes
requests, which need to have a segment count of zero, as they don't have a
payload.

Fixes: 14ccb66b3f ("block: remove the bi_phys_segments field in struct bio")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-03 07:20:40 -06:00
Bart Van Assche
970d168de6 blk-mq: simplify blk_mq_make_request()
Move the blk_mq_bio_to_request() call in front of the if-statement.

Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-02 21:03:38 -06:00
Bart Van Assche
c05f42206f blk-mq: remove blk_mq_put_ctx()
No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
on preemption being disabled for its correctness. Since removing the CPU
preemption calls does not measurably affect performance, simplify the
blk-mq code by removing the blk_mq_put_ctx() function and also by not
disabling preemption in blk_mq_get_ctx().

Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-02 21:03:27 -06:00
Ming Lei
79d08f89bb block: fix .bi_size overflow
'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
bytes.

Before 07173c3ec2 ("block: enable multipage bvecs"), one bio can
include very limited pages, and usually at most 256, so the fs bio
size won't be bigger than 1M bytes most of times.

Since we support multi-page bvec, in theory one fs bio really can
be added > 1M pages, especially in case of hugepage, or big writeback
with too many dirty pages. Then there is chance in which .bi_size
is overflowed.

Fixes this issue by using bio_full() to check if the added segment may
overflow .bi_size.

Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 07173c3ec2 ("block: enable multipage bvecs")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-01 08:18:54 -06:00
Jens Axboe
5be1f9d82f Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into for-5.3/block

Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
fix. Otherwise we end up having conflicts with future patches between
for-5.3/block and master that touch this area. In particular, it makes
the bio_full() fix hard to backport to stable.

* tag 'v5.2-rc6': (482 commits)
  Linux 5.2-rc6
  Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
  Bluetooth: Fix regression with minimum encryption key size alignment
  tcp: refine memory limit test in tcp_fragment()
  x86/vdso: Prevent segfaults due to hoisted vclock reads
  SUNRPC: Fix a credential refcount leak
  Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
  net :sunrpc :clnt :Fix xps refcount imbalance on the error path
  NFS4: Only set creation opendata if O_CREAT
  ARM: 8867/1: vdso: pass --be8 to linker if necessary
  KVM: nVMX: reorganize initial steps of vmx_set_nested_state
  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
  habanalabs: use u64_to_user_ptr() for reading user pointers
  nfsd: replace Jeff by Chuck as nfsd co-maintainer
  inet: clear num_timeout reqsk_alloc()
  PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
  net: mvpp2: debugfs: Add pmap to fs dump
  ipv6: Default fib6_type to RTN_UNICAST when not set
  net: hns3: Fix inconsistent indenting
  net/af_iucv: always register net_device notifier
  ...
2019-07-01 08:16:08 -06:00
Jonas Rabenstein
ff91064ea3 block: sed-opal: check size of shadow mbr
Check whether the shadow mbr does fit in the provided space on the
target. Also a proper firmware should handle this case and return an
error we may prevent problems or even damage with crappy firmwares.

Signed-off-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: David Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 10:34:08 -06:00
Jonas Rabenstein
a9b25b4cf2 block: sed-opal: ioctl for writing to shadow mbr
Allow modification of the shadow mbr. If the shadow mbr is not marked as
done, this data will be presented read only as the device content. Only
after marking the shadow mbr as done and unlocking a locking range the
actual content is accessible.

Co-authored-by: David Kozub <zub@linux.fjfi.cvut.cz>
Signed-off-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: David Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 10:33:57 -06:00
Jonas Rabenstein
c988844341 block: sed-opal: add ioctl for done-mark of shadow mbr
Enable users to mark the shadow mbr as done without completely
deactivating the shadow mbr feature. This may be useful on reboots,
when the power to the disk is not disconnected in between and the shadow
mbr stores the required boot files. Of course, this saves also the
(few) commands required to enable the feature if it is already enabled
and one only wants to mark the shadow mbr as done.

Co-authored-by: David Kozub <zub@linux.fjfi.cvut.cz>
Signed-off-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: David Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 10:31:33 -06:00
Christoph Hellwig
b620743077 block: never take page references for ITER_BVEC
If we pass pages through an iov_iter we always already have a reference
in the caller.  Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:32 -06:00
Christoph Hellwig
506e079847 block: use bio_release_pages in bio_map_user_iov
Use bio_release_pages instead of open coding it.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
163cc2d3cd block: use bio_release_pages in bio_unmap_user
Use bio_release_pages instead of open coding it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
d241a95f35 block: optionally mark pages dirty in bio_release_pages
A lot of callers of bio_release_pages also want to mark the released
pages as dirty.  Add a mark_dirty parameter to avoid a second
relatively expensive bio_for_each_segment_all loop.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
b2d0d99135 block: move the BIO_NO_PAGE_REF check into bio_release_pages
Move the BIO_NO_PAGE_REF check into bio_release_pages instead of
duplicating it in both callers.

Also make the function available outside of bio.c so that we can
reuse it in other direct I/O implementations.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Revanth Rajashekar
15ddffcb34 block: sed-opal: "Never True" conditions
'who' an unsigned variable in stucture opal_session_info
can never be lesser than zero. Hence, the condition
"who < OPAL_ADMIN1" can never be true.

Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:40:31 -06:00
Revanth Rajashekar
5e4c7cf60e block: sed-opal: PSID reverttper capability
PSID is a 32 character password printed on the drive label,
to prove its physical access. This PSID reverttper function
is very useful to regain the control over the drive when it
is locked and the user can no longer access it because of some
failures. However, *all the data on the drive is completely
erased*. This method is advisable only when the user is exhausted
of all other recovery methods.

PSID capabilities are described in:
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_PSID_v1.00_r1.00.pdf

Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:40:30 -06:00
Douglas Anderson
dbc3117d4c block, bfq: NULL out the bic when it's no longer valid
In reboot tests on several devices we were seeing a "use after free"
when slub_debug or KASAN was enabled.  The kernel complained about:

  Unable to handle kernel paging request at virtual address 6b6b6c2b

...which is a classic sign of use after free under slub_debug.  The
stack crawl in kgdb looked like:

 0  test_bit (addr=<optimized out>, nr=<optimized out>)
 1  bfq_bfqq_busy (bfqq=<optimized out>)
 2  bfq_select_queue (bfqd=<optimized out>)
 3  __bfq_dispatch_request (hctx=<optimized out>)
 4  bfq_dispatch_request (hctx=<optimized out>)
 5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
 6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
 7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
 8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
 9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
 10 0xc024cff4 in worker_thread (__worker=0xec6d4640)

Digging in kgdb, it could be found that, though bfqq looked fine,
bfqq->bic had been freed.

Through further digging, I postulated that perhaps it is illegal to
access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
because the "bic" can be freed at some point in time after this call
is made.  I confirmed that there certainly were cases where the exact
crashing code path would access the "bic" after bfq_exit_icq() had
been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
saw that the bic was 0x7 at the time of the crash.

To understand a bit more about why this crash was fairly uncommon (I
saw it only once in a few hundred reboots), you can see that much of
the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
access the ->bic anymore.  The only case it doesn't is if
bfq_put_queue() sees a reference still held.

However, even in the case when bfqq isn't freed, the crash is still
rare.  Why?  I tracked what happened to the "bic" after the exit
routine.  It doesn't get freed right away.  Rather,
put_io_context_active() eventually called put_io_context() which
queued up freeing on a workqueue.  The freeing then actually happened
later than that through call_rcu().  Despite all these delays, some
extra debugging showed that all the hoops could be jumped through in
time and the memory could be freed causing the original crash.  Phew!

To make a long story short, assuming it truly is illegal to access an
icq after the "exit_icq" callback is finished, this patch is needed.

Cc: stable@vger.kernel.org
Reviewed-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-28 07:44:19 -06:00
Damien Le Moal
a5b47a40be block: Remove unused code
bio_flush_dcache_pages() is unused. Remove it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-27 07:34:25 -06:00
Douglas Anderson
2b50f230f7 block, bfq: Init saved_wr_start_at_switch_to_srt in unlikely case
Some debug code suggested by Paolo was tripping when I did reboot
stress tests.  Specifically in bfq_bfqq_resume_state()
"bic->saved_wr_start_at_switch_to_srt" was later than the current
value of "jiffies".  A bit of debugging showed that
"bic->saved_wr_start_at_switch_to_srt" was actually 0 and a bit more
debugging showed that was because we had run through the "unlikely"
case in the bfq_bfqq_save_state() function.

Let's init "saved_wr_start_at_switch_to_srt" in the unlikely case to
something sane.

NOTE: this fixes no known real-world errors.

Reviewed-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-26 16:25:35 -06:00
Paolo Valente
e6feaf215f block, bfq: fix operator in BFQQ_TOTALLY_SEEKY
By mistake, there is a '&' instead of a '==' in the definition of the
macro BFQQ_TOTALLY_SEEKY. This commit replaces the wrong operator with
the correct one.

Fixes: 7074f076ff ("block, bfq: do not tag totally seeky queues as soft rt")
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 11:38:08 -06:00
Paolo Valente
3726112ec7 block, bfq: re-schedule empty queues if they deserve I/O plugging
Consider, on one side, a bfq_queue Q that remains empty while in
service, and, on the other side, the pending I/O of bfq_queues that,
according to their timestamps, have to be served after Q.  If an
uncontrolled amount of I/O from the latter bfq_queues were dispatched
while Q is waiting for its new I/O to arrive, then Q's bandwidth
guarantees would be violated. To prevent this, I/O dispatch is plugged
until Q receives new I/O (except for a properly controlled amount of
injected I/O). Unfortunately, preemption breaks I/O-dispatch plugging,
for the following reason.

Preemption is performed in two steps. First, Q is expired and
re-scheduled. Second, the new bfq_queue to serve is chosen. The first
step is needed by the second, as the second can be performed only
after Q's timestamps have been properly updated (done in the
expiration step), and Q has been re-queued for service. This
dependency is a consequence of the way how BFQ's scheduling algorithm
is currently implemented.

But Q is not re-scheduled at all in the first step, because Q is
empty. As a consequence, an uncontrolled amount of I/O may be
dispatched until Q becomes non empty again. This breaks Q's service
guarantees.

This commit addresses this issue by re-scheduling Q even if it is
empty. This in turn breaks the assumption that all scheduled queues
are non empty. Then a few extra checks are now needed.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:35 -06:00
Paolo Valente
96a291c38c block, bfq: preempt lower-weight or lower-priority queues
BFQ enqueues the I/O coming from each process into a separate
bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be
served for at most timeout_sync milliseconds (default: 125 ms). This
service scheme is prone to the following inaccuracy.

While a bfq_queue Q1 is in service, some empty bfq_queue Q2 may
receive I/O, and, according to BFQ's scheduling policy, may become the
right bfq_queue to serve, in place of the currently in-service
bfq_queue. In this respect, postponing the service of Q2 to after the
service of Q1 finishes may delay the completion of Q2's I/O, compared
with an ideal service in which all non-empty bfq_queues are served in
parallel, and every non-empty bfq_queue is served at a rate
proportional to the bfq_queue's weight. This additional delay is equal
at most to the time Q1 may unjustly remain in service before switching
to Q2.

If Q1 and Q2 have the same weight, then this time is most likely
negligible compared with the completion time to be guaranteed to Q2's
I/O. In addition, first, one of the reasons why BFQ may want to serve
Q1 for a while is that this boosts throughput and, second, serving Q1
longer reduces BFQ's overhead. As a conclusion, it is usually better
not to preempt Q1 if both Q1 and Q2 have the same weight.

In contrast, as Q2's weight or priority becomes higher and higher
compared with that of Q1, the above delay becomes larger and larger,
compared with the I/O completion times that have to be guaranteed to
Q2 according to Q2's weight. So reducing this delay may be more
important than avoiding the costs of preempting Q1.

Accordingly, this commit preempts Q1 if Q2 has a higher weight or a
higher priority than Q1. Preemption causes Q1 to be re-scheduled, and
triggers a new choice of the next bfq_queue to serve. If Q2 really is
the next bfq_queue to serve, then Q2 will be set in service
immediately.

This change reduces the component of the I/O latency caused by the
above delay by about 80%. For example, on an (old) PLEXTOR PX-256M5
SSD, the maximum latency reported by fio drops from 15.1 to 3.2 ms for
a process doing sporadic random reads while another process is doing
continuous sequential reads.

Signed-off-by: Nicola Bottura <bottura.nicola95@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:35 -06:00
Paolo Valente
13a857a4c4 block, bfq: detect wakers and unconditionally inject their I/O
A bfq_queue Q may happen to be synchronized with another
bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to
receive new I/O. We call Q2 "waker queue".

If I/O plugging is being performed for Q, and Q is not receiving any
more I/O because of the above synchronization, then, thanks to BFQ's
injection mechanism, the waker queue is likely to get served before
the I/O-plugging timeout fires.

Unfortunately, this fact may not be sufficient to guarantee a high
throughput during the I/O plugging, because the inject limit for Q may
be too low to guarantee a lot of injected I/O. In addition, the
duration of the plugging, i.e., the time before Q finally receives new
I/O, may not be minimized, because the waker queue may happen to be
served only after other queues.

To address these issues, this commit introduces the explicit detection
of the waker queue, and the unconditional injection of a pending I/O
request of the waker queue on each invocation of
bfq_dispatch_request().

One may be concerned that this systematic injection of I/O from the
waker queue delays the service of Q's I/O. Fortunately, it doesn't. On
the contrary, next Q's I/O is brought forward dramatically, for it is
not blocked for milliseconds.

Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:34 -06:00
Paolo Valente
a3f9bce369 block, bfq: bring forward seek&think time update
Until the base value for request service times gets finally computed
for a bfq_queue, the inject limit for that queue does depend on the
think-time state (short|long) of the queue. A timely update of the
think time then guarantees a quicker activation or deactivation of the
injection. Fortunately, the think time of a bfq_queue is updated in
the same code path as the inject limit; but after the inject limit.

This commits moves the update of the think time before the update of
the inject limit. For coherence, it moves the update of the seek time
too.

Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:34 -06:00
Paolo Valente
24792ad01c block, bfq: update base request service times when possible
I/O injection gets reduced if it increases the request service times
of the victim queue beyond a certain threshold.  The threshold, in its
turn, is computed as a function of the base service time enjoyed by
the queue when it undergoes no injection.

As a consequence, for injection to work properly, the above base value
has to be accurate. In this respect, such a value may vary over
time. For example, it varies if the size or the spatial locality of
the I/O requests in the queue change. It is then important to update
this value whenever possible. This commit performs this update.

Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:34 -06:00
Paolo Valente
db599f9ed9 block, bfq: fix rq_in_driver check in bfq_update_inject_limit
One of the cases where the parameters for injection may be updated is
when there are no more in-flight I/O requests. The number of in-flight
requests is stored in the field bfqd->rq_in_driver of the descriptor
bfqd of the device. So, the controlled condition is
bfqd->rq_in_driver == 0.

Unfortunately, this is wrong because, the instruction that checks this
condition is in the code path that handles the completion of a
request, and, in particular, the instruction is executed before
bfqd->rq_in_driver is decremented in such a code path.

This commit fixes this issue by just replacing 0 with 1 in the
comparison.

Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:34 -06:00
Paolo Valente
766d61412e block, bfq: reset inject limit when think-time state changes
Until the base value of the request service times gets finally
computed for a bfq_queue, the inject limit does depend on the
think-time state (short|long). The limit must be 0 or 1 if the think
time is deemed, respectively, as short or long. However, such a check
and possible limit update is performed only periodically, once per
second. So, to make the injection mechanism much more reactive, this
commit performs the update also every time the think-time state
changes.

In addition, in the following special case, this commit lets the
inject limit of a bfq_queue bfqq remain equal to 1 even if bfqq's
think time is short: bfqq's I/O is synchronized with that of some
other queue, i.e., bfqq may receive new I/O only after the I/O of the
other queue is completed. Keeping the inject limit to 1 allows the
blocking I/O to be served while bfqq is in service. And this is very
convenient both for bfqq and for the total throughput, as explained
in detail in the comments in bfq_update_has_short_ttime().

Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-25 09:07:34 -06:00
Chaitanya Kulkarni
b0e5168a77 block: update print_req_error()
Improve the print_req_error with additional request fields which are
helpful for debugging. Use newly introduced blk_op_str() to print the
REQ_OP_XXX in the string format.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20 13:03:51 -06:00
Chaitanya Kulkarni
874c893bf0 block: use blk_op_str() in blk-mq-debugfs.c
Now that we've a helper function blk_op_str() to convert the
REQ_OP_XXX to string XXX, adjust the code to use that. Get rid of
the duplicate array op_name which is now present in the blk-core.c
which we renamed it to "blk_op_name" and open coding in the
blk-mq-debugfs.c.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20 13:03:51 -06:00