Commit Graph

192 Commits

Author SHA1 Message Date
Rui Chen
5f45a7ef79 ANDROID: fs: add vendor hook to collect IO statistics
Add vendor hook to get metainfo of direct/buffered read and write.
Determine hot files in each performance-sensitive user scenario.

Bug: 380502059
Change-Id: Ie7604852df637d6664afd72e87bd6d4b14bbc2a2
Signed-off-by: Rui Chen <chenrui9@honor.com>
2024-12-02 19:22:28 +00:00
Greg Kroah-Hartman
d483eed85f ANDROID: GKI: set vfs-only exports into their own namespace
We have namespaces, so use them for all vfs-exported namespaces so that
filesystems can use them, but not anything else.

Some in-kernel drivers that do direct filesystem accesses (because they
serve up files) are also allowed access to these symbols to keep 'make
allmodconfig' builds working properly, but it is not needed for Android
kernel images.

Bug: 157965270
Bug: 210074446
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaf6140baf3a18a516ab2d5c3966235c42f3f70de
2022-01-11 09:30:47 +01:00
Greg Kroah-Hartman
9a705f0463 Merge 5.10.30 into android12-5.10
Changes in 5.10.30
	xfrm/compat: Cleanup WARN()s that can be user-triggered
	ALSA: aloop: Fix initialization of controls
	ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
	ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
	ASoC: intel: atom: Stop advertising non working S24LE support
	nfc: fix refcount leak in llcp_sock_bind()
	nfc: fix refcount leak in llcp_sock_connect()
	nfc: fix memory leak in llcp_sock_connect()
	nfc: Avoid endless loops caused by repeated llcp_sock_connect()
	selinux: make nslot handling in avtab more robust
	selinux: fix cond_list corruption when changing booleans
	selinux: fix race between old and new sidtab
	xen/evtchn: Change irq_info lock to raw_spinlock_t
	net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
	net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock
	net: dsa: lantiq_gswip: Don't use PHY auto polling
	net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
	drm/i915: Fix invalid access to ACPI _DSM objects
	ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m
	IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
	LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
	gcov: re-fix clang-11+ support
	ia64: fix user_stack_pointer() for ptrace()
	nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
	ocfs2: fix deadlock between setattr and dio_end_io_write
	fs: direct-io: fix missing sdio->boundary
	ethtool: fix incorrect datatype in set_eee ops
	of: property: fw_devlink: do not link ".*,nr-gpios"
	parisc: parisc-agp requires SBA IOMMU driver
	parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
	ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
	batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
	ice: Continue probe on link/PHY errors
	ice: Increase control queue timeout
	ice: prevent ice_open and ice_stop during reset
	ice: fix memory allocation call
	ice: remove DCBNL_DEVRESET bit from PF state
	ice: Fix for dereference of NULL pointer
	ice: Use port number instead of PF ID for WoL
	ice: Cleanup fltr list in case of allocation issues
	iwlwifi: pcie: properly set LTR workarounds on 22000 devices
	ice: fix memory leak of aRFS after resuming from suspend
	net: hso: fix null-ptr-deref during tty device unregistration
	libbpf: Fix bail out from 'ringbuf_process_ring()' on error
	bpf: Enforce that struct_ops programs be GPL-only
	bpf: link: Refuse non-O_RDWR flags in BPF_OBJ_GET
	ethernet/netronome/nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
	libbpf: Ensure umem pointer is non-NULL before dereferencing
	libbpf: Restore umem state after socket create failure
	libbpf: Only create rx and tx XDP rings when necessary
	bpf: Refcount task stack in bpf_get_task_stack
	bpf, sockmap: Fix sk->prot unhash op reset
	bpf, sockmap: Fix incorrect fwd_alloc accounting
	net: ensure mac header is set in virtio_net_hdr_to_skb()
	i40e: Fix sparse warning: missing error code 'err'
	i40e: Fix sparse error: 'vsi->netdev' could be null
	i40e: Fix sparse error: uninitialized symbol 'ring'
	i40e: Fix sparse errors in i40e_txrx.c
	vdpa/mlx5: Fix suspend/resume index restoration
	net: sched: sch_teql: fix null-pointer dereference
	net: sched: fix action overwrite reference counting
	nl80211: fix beacon head validation
	nl80211: fix potential leak of ACL params
	cfg80211: check S1G beacon compat element length
	mac80211: fix time-is-after bug in mlme
	mac80211: fix TXQ AC confusion
	net: hsr: Reset MAC header for Tx path
	net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
	net: let skb_orphan_partial wake-up waiters.
	thunderbolt: Fix a leak in tb_retimer_add()
	thunderbolt: Fix off by one in tb_port_find_retimer()
	usbip: add sysfs_lock to synchronize sysfs code paths
	usbip: stub-dev synchronize sysfs code paths
	usbip: vudc synchronize sysfs code paths
	usbip: synchronize event handler with sysfs code paths
	driver core: Fix locking bug in deferred_probe_timeout_work_func()
	scsi: pm80xx: Fix chip initialization failure
	scsi: target: iscsi: Fix zero tag inside a trace event
	percpu: make pcpu_nr_empty_pop_pages per chunk type
	i2c: turn recovery error on init to debug
	KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched
	KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched
	KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn
	KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter
	KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed
	KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
	KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
	KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
	KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
	net: sched: fix err handler in tcf_action_init()
	ice: Refactor DCB related variables out of the ice_port_info struct
	ice: Recognize 860 as iSCSI port in CEE mode
	xfrm: interface: fix ipv4 pmtu check to honor ip header df
	xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume
	remoteproc: qcom: pil_info: avoid 64-bit division
	regulator: bd9571mwv: Fix AVS and DVFS voltage range
	ARM: OMAP4: Fix PMIC voltage domains for bionic
	ARM: OMAP4: PM: update ROM return address for OSWR and OFF
	net: xfrm: Localize sequence counter per network namespace
	esp: delete NETIF_F_SCTP_CRC bit from features for esp offload
	ASoC: SOF: Intel: HDA: fix core status verification
	ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
	xfrm: Fix NULL pointer dereference on policy lookup
	virtchnl: Fix layout of RSS structures
	i40e: Added Asym_Pause to supported link modes
	i40e: Fix kernel oops when i40e driver removes VF's
	hostfs: fix memory handling in follow_link()
	amd-xgbe: Update DMA coherency values
	vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
	geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
	sch_red: fix off-by-one checks in red_check_params()
	drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit
	arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0
	xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets
	can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZE
	can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE
	mlxsw: spectrum: Fix ECN marking in tunnel decapsulation
	ethernet: myri10ge: Fix a use after free in myri10ge_sw_tso
	gianfar: Handle error code at MAC address change
	net: dsa: Fix type was not set for devlink port
	cxgb4: avoid collecting SGE_QBASE regs during traffic
	net:tipc: Fix a double free in tipc_sk_mcast_rcv
	ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
	net/ncsi: Avoid channel_monitor hrtimer deadlock
	net: qrtr: Fix memory leak on qrtr_tx_wait failure
	nfp: flower: ignore duplicate merge hints from FW
	net: phy: broadcom: Only advertise EEE for supported modes
	I2C: JZ4780: Fix bug for Ingenic X1000.
	ASoC: sunxi: sun4i-codec: fill ASoC card owner
	net/mlx5e: Fix mapping of ct_label zero
	net/mlx5e: Fix ethtool indication of connector type
	net/mlx5: Don't request more than supported EQs
	net/rds: Fix a use after free in rds_message_map_pages
	xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model
	soc/fsl: qbman: fix conflicting alignment attributes
	i40e: Fix display statistics for veb_tc
	RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
	drm/msm: Set drvdata to NULL when msm_drm_init() fails
	net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);
	mptcp: forbit mcast-related sockopt on MPTCP sockets
	scsi: ufs: core: Fix task management request completion timeout
	scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
	net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb
	net: macb: restore cmp registers on resume path
	clk: fix invalid usage of list cursor in register
	clk: fix invalid usage of list cursor in unregister
	workqueue: Move the position of debug_work_activate() in __queue_work()
	s390/cpcmd: fix inline assembly register clobbering
	perf inject: Fix repipe usage
	net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
	openvswitch: fix send of uninitialized stack memory in ct limit reply
	i2c: designware: Adjust bus_freq_hz when refuse high speed mode set
	iwlwifi: fix 11ax disabled bit in the regulatory capability flags
	can: mcp251x: fix support for half duplex SPI host controllers
	tipc: increment the tmp aead refcnt before attaching it
	net: hns3: clear VF down state bit before request link status
	net/mlx5: Fix placement of log_max_flow_counter
	net/mlx5: Fix PPLM register mapping
	net/mlx5: Fix PBMC register mapping
	RDMA/cxgb4: check for ipv6 address properly while destroying listener
	perf report: Fix wrong LBR block sorting
	RDMA/qedr: Fix kernel panic when trying to access recv_cq
	drm/vc4: crtc: Reduce PV fifo threshold on hvs4
	i40e: Fix parameters in aq_get_phy_register()
	RDMA/addr: Be strict with gid size
	vdpa/mlx5: should exclude header length and fcs from mtu
	vdpa/mlx5: Fix wrong use of bit numbers
	RAS/CEC: Correct ce_add_elem()'s returned values
	clk: socfpga: fix iomem pointer cast on 64-bit
	lockdep: Address clang -Wformat warning printing for %hd
	dt-bindings: net: ethernet-controller: fix typo in NVMEM
	net: sched: bump refcount for new action in ACT replace mode
	gpiolib: Read "gpio-line-names" from a firmware node
	cfg80211: remove WARN_ON() in cfg80211_sme_connect
	net: tun: set tun->dev->addr_len during TUNSETLINK processing
	drivers: net: fix memory leak in atusb_probe
	drivers: net: fix memory leak in peak_usb_create_dev
	net: mac802154: Fix general protection fault
	net: ieee802154: nl-mac: fix check on panid
	net: ieee802154: fix nl802154 del llsec key
	net: ieee802154: fix nl802154 del llsec dev
	net: ieee802154: fix nl802154 add llsec key
	net: ieee802154: fix nl802154 del llsec devkey
	net: ieee802154: forbid monitor for set llsec params
	net: ieee802154: forbid monitor for del llsec seclevel
	net: ieee802154: stop dump llsec params for monitors
	Revert "net: sched: bump refcount for new action in ACT replace mode"
	Linux 5.10.30

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie8754a2e4dfef03bf1f2b878843cde19a4adab21
2021-04-15 14:23:41 +02:00
Jack Qiu
3a675c1b50 fs: direct-io: fix missing sdio->boundary
commit df41872b68601059dd4a84858952dcae58acd331 upstream.

I encountered a hung task issue, but not a performance one.  I run DIO
on a device (need lba continuous, for example open channel ssd), maybe
hungtask in below case:

  DIO:						Checkpoint:
  get addr A(at boundary), merge into BIO,
  no submit because boundary missing
						flush dirty data(get addr A+1), wait IO(A+1)
						writeback timeout, because DIO(A) didn't submit
  get addr A+2 fail, because checkpoint is doing

dio_send_cur_page() may clear sdio->boundary, so prevent it from missing
a boundary.

Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com
Fixes: b1058b9812 ("direct-io: submit bio after boundary buffer is added to it")
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14 08:41:58 +02:00
Eric Biggers
e11a4eba4f FROMLIST: direct-io: add support for fscrypt using blk-crypto
Set bio crypt contexts on bios by calling into fscrypt when required,
and explicitly check for DUN continuity when adding pages to the bio.
(While DUN continuity is usually implied by logical block contiguity,
this is not the case when using certain fscrypt IV generation methods
like IV_INO_LBLK_32).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>

Bug: 162255927
Link: https://lore.kernel.org/r/20200724184501.1651378-3-satyat@google.com
Change-Id: I57ff74185004371c01ec35d806b0749583375c58
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-26 05:46:40 +00:00
Eric Biggers
d44ddbf417 ANDROID: revert fscrypt direct I/O support
Revert the direct I/O support for encrypted files so that we can bring
in the latest version of the patches from the mailing list.  This is
needed because in v5.5 and later, the ext4 support (via fs/iomap/) is
broken as-is -- not only is the second call to fscrypt_limit_dio_pages()
in the wrong place, but bios can exceed the intended nr_pages limit due
to multipage bvecs.  In order to fix this we need the v6 patches which
make fs/ext4/ handle the limiting instead of fs/iomap/.

On android-mainline, this fixes a failure in vts_kernel_encryption_test
(specifically, FBEPolicyTest#TestAesEmmcOptimizedPolicy) when run on a
device that uses the inlinecrypt mount option on ext4 (e.g. db845c).

Bug: 162255927
Bug: 171462575
Change-Id: I0da753dc9e0e7bc8d84bbcadfdfcdb9328cdb8d8
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
2021-02-26 05:46:07 +00:00
Greg Kroah-Hartman
8bc601f293 Merge 2d0f6b0aab ("Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux") into android-mainline
Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1a4319d3921c77e672ffa5a0efec4882736a74f7
2020-10-26 09:13:37 +01:00
Greg Kroah-Hartman
aec7085396 Merge 647412daeb ("Merge tag 'mmc-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc") into android-mainline
Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7ef0b81d31025e256bd85102013ed3ad038a54b1
2020-10-23 11:59:44 +02:00
Linus Torvalds
4a165feba2 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl+IUDwACgkQnJ2qBz9k
 QNnjoAf8CVCf/N2kblZywcy/Ks/2WzlZDU4S/dVr93O2VLXEb6kT+onZq4tc/+sY
 QIV8kUMGvt0Ez/KpwY6/wOUVEPZn8bSqM0vUsB+Xi8CwSz/CwcDKTbU/tm1667J0
 eSFX96OL2hT5BockTzuESFohtykiPU3CvW10ae02N6k6XKTS0+frkn3cebJKno2O
 NDt0j2HkVjvpO6DqS7lnceFxo4t6DV6YXJHMtkJqRk8hYItzjyjOB6Df+1WHJJOA
 4jxlobY4qR2MHh1y2ncryPvvQ91sMIxx1beMMXXUIXr+8Zu8aFVWBLgSwyoxBgRP
 G+ctJu3zMCmeeGnkpLfSlJK9ipYUMw==
 =nXw4
 -----END PGP SIGNATURE-----

Merge tag 'dio_for_v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull direct-io fix from Jan Kara:
 "Fix for unaligned direct IO read past EOF in legacy DIO code"

* tag 'dio_for_v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  direct-io: defer alignment check until after the EOF check
  direct-io: don't force writeback for reads beyond EOF
  direct-io: clean up error paths of do_blockdev_direct_IO
2020-10-15 15:03:10 -07:00
Gabriel Krisman Bertazi
41b21af388 direct-io: defer alignment check until after the EOF check
Prior to commit 9fe55eea7e ("Fix race when checking i_size on direct
i/o read"), an unaligned direct read past end of file would trigger EOF,
since generic_file_aio_read detected this read-at-EOF condition and
skipped the direct IO read entirely, returning 0. After that change, the
read now reaches dio_generic, which detects the misalignment and returns
EINVAL.

This consolidates the generic direct-io to follow the same behavior of
filesystems.  Apparently, this fix will only affect ocfs2 since other
filesystems do this verification before calling do_blockdev_direct_IO,
with the exception of f2fs, which has the same bug, but is fixed in the
next patch.

it can be verified by a read loop on a file that does a partial read
before EOF (On file that doesn't end at an aligned address).  The
following code fails on an unaligned file on filesystems without
prior validation without this patch, but not on btrfs, ext4, and xfs.

  while (done < total) {
    ssize_t delta = pread(fd, buf + done, total - done, off + done);
    if (!delta)
      break;
    ...
  }

Fix this regression by moving the misalignment check to after the EOF
check added by commit 74cedf9b6c ("direct-io: Fix negative return from
dio read beyond eof").

Based on a patch by Jamie Liu.

Link: https://lore.kernel.org/r/20201008062620.2928326-4-krisman@collabora.com
Reported-by: Jamie Liu <jamieliu@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-10-08 18:26:46 +02:00
Gabriel Krisman Bertazi
0a9164cb7f direct-io: don't force writeback for reads beyond EOF
If a DIO read starts past EOF, the kernel won't attempt it, so we don't
need to flush dirty pages before failing the syscall.

Link: https://lore.kernel.org/r/20201008062620.2928326-3-krisman@collabora.com
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-10-08 18:26:33 +02:00
Gabriel Krisman Bertazi
46d716025a direct-io: clean up error paths of do_blockdev_direct_IO
In preparation to resort DIO checks, reduce code duplication of error
handling in do_blockdev_direct_IO.

Link: https://lore.kernel.org/r/20201008062620.2928326-2-krisman@collabora.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-10-08 18:26:01 +02:00
Goldwyn Rodrigues
c33fe275b5 fs: remove no longer used dio_end_io()
Since we removed the last user of dio_end_io() when btrfs got converted
to iomap infrastructure ("btrfs: switch to iomap for direct IO"), remove
the helper function dio_end_io().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:17:59 +02:00
Eric Biggers
98777a7eb7 Merge commit 382625d0d4 ("Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block") into android-mainline
Conflicts:
	drivers/md/dm-bow.c
	drivers/md/dm-default-key.c
	drivers/md/dm.c
	fs/crypto/inline_crypt.c

Replace bdev->bd_queue with bdev_get_queue(bdev).

Bug: 129280212
Bug: 160883801
Bug: 160885805
Bug: 162257830
Change-Id: I9b0b295472080dfc0990dcb769205e68d706ce0e
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-08-06 10:07:17 -07:00
Christoph Hellwig
e556f6ba10 block: remove the bd_queue field from struct block_device
Just use bd_disk->queue instead.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01 08:08:20 -06:00
Greg Kroah-Hartman
a33191d441 Linux 5.8-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7mfkAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGTG4H/1iH1Psd32XKoO63
 XuOashbqnsYLCKckbFg0RgHIDTp6d0wypF5m3dSYqpifWUaEf8SkTGnm0geV9zc1
 axDUdaqr+OvhDPZiSLt4cFu2M5yvGb4/WR76qjKzxWd+LNUGOVpx1GvFXl/5wPdp
 /lYR3mqZqNffwK2QuZ1m8X1gy5fzr4esoyZDK71dP88uBlFmbQLJsYiCRfQ7GDsm
 r574gz7SKjPoPFPj7qJ3CLsnOruZiL36uxmYZWC2hnkNaFzLfMFiE7OqrEZK/YV6
 Gc8ZZGtN+Otr+UbojagjTjRM6uy3qWBYEMX4GcRt9M2jS8LGKsYcxV75h0U5qlyz
 m8QAA0o=
 =jy/I
 -----END PGP SIGNATURE-----

Merge 5.8-rc1 into android-mainline

Linux 5.8-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I00f2168bc9b6fd8e48c7c0776088d2c6cb8e1629
2020-06-25 14:25:32 +02:00
Greg Kroah-Hartman
035f08016d Merge 039aeb9deb ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") into android-mainline
Baby steps on the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5962e12546d3d215c73c3d74b00ad6263d96f64e
2020-06-20 09:49:29 +02:00
Eric Biggers
665ca1bf8e Merge commit 750a02ab8d ("Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block") into android-mainline
Conflicts:
	block/blk-core.c
	block/blk-crypto-fallback.c
	block/blk-crypto.c
	block/keyslot-manager.c
	drivers/md/dm.c
	include/linux/blk-crypto.h
	include/linux/blk_types.h
	include/linux/keyslot-manager.h

Change-Id: Ie757c41fa41e6a9aacdf123d82d4f681623a02a8
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-06-17 17:18:48 -07:00
Linus Torvalds
9d645db853 for-5.8-part2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl7lZwgACgkQxWXV+ddt
 WDuj6g/9E2JtqeO8zRMLb+Do/n5YX0dFHt+dM1AGY+nw8hb3U9Vlgc8KJa7UpZFX
 opl1i9QL+cJLoZMZL5xZhDouMQlum5cGVV3hLwqEPYetRF/ytw/kunWAg5o8OW1R
 sJxGcjyiiKpZLVx6nMjGnYjsrbOJv0HlaWfY3NCon4oQ8yQTzTPMPBevPWRM7Iqw
 Ssi8pA8zXCc2QoLgyk6Pe/IGeox8+z9RA2akHkJIdMWiPHm43RDF4Yx3Yl9NHHZA
 M+pLVKjZoejqwVaai8osBqWVw4Ypax1+CJit6iHGwJDkQyFPcMXMsOc5ZYBnT5or
 k/ceVMCs+ejvCK1+L30u7FQRiDqf5Fwhf/SGfq7+y83KbEjMfWOya3Lyk47fbDD4
 776rSaS6ejqVklWppbaPhntSrBtPR1NaDOfi55bc9TOe+yW7Du+AsQMlEE0bTJaW
 eHl+A4AP/nDlo8Etn1jTWd023bzzO+iySMn3YZfK0vw3vkj3JfrCGXx6DEYipOou
 uEUj0jDo/rdiB5S3GdUCujjaPgm/f0wkPudTRB9lpxJas2qFU+qo2TLJhEleELwj
 m4laz7W7S+nUFP0LRl8O82AzBfjm+oHjWTpfdloT6JW9Da8/iuZ/x9VBWQ8mFJwX
 U0cR3zVqUuWcK78fZa/FFgGPBxlwUv2j+OhRGsS0/orDRlrwcXo=
 =5S0s
 -----END PGP SIGNATURE-----

Merge tag 'for-5.8-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "This reverts the direct io port to iomap infrastructure of btrfs
  merged in the first pull request. We found problems in invalidate page
  that don't seem to be fixable as regressions or without changing iomap
  code that would not affect other filesystems.

  There are four reverts in total, but three of them are followup
  cleanups needed to revert a43a67a2d7 cleanly. The result is the
  buffer head based implementation of direct io.

  Reverts are not great, but under current circumstances I don't see
  better options"

* tag 'for-5.8-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: switch to iomap_dio_rw() for dio"
  Revert "fs: remove dio_end_io()"
  Revert "btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK"
  Revert "btrfs: split btrfs_direct_IO to read and write part"
2020-06-14 09:47:25 -07:00
David Sterba
f1084bc60a Revert "fs: remove dio_end_io()"
This reverts commit b75b7ca7c2.

The patch restores a helper that was not necessary after direct IO port
to iomap infrastructure, which gets reverted.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-06-09 19:23:18 +02:00
Linus Torvalds
f3cdc8ae11 for-5.8-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl7U50AACgkQxWXV+ddt
 WDtK1g//RXeNsTguYQr1N9R5eUPThjLEI0+4J0l4SYfCPU8Ou3C7nqpOEJJQgm8F
 ezZE+16cWi9U5uGueOc+w0rfyz4AuIXKgzoz+c0/GG2+yV5jp6DsAMbWqojAb96L
 V/N3HxEzR66jqwgVUBE/x5okb2SyY7//B1l/O0amc66XDO7KTMImpIwThere6zWZ
 o2SNpYpHAPQeUYJQx8h+FAW3w1CxrCZmnifazU9Jqe9J7QeQLg7rbUlJDV38jySm
 ZOA8ohKN9U1gPZy+dTU3kdyyuBIq1etkIaSPJANyTo5TczPKiC0IMg75cXtS4ae/
 NSxhccMpSIjVMcIHARzSFGYKNP3sGNRsmaTUg/2Cx/9GoHOhYMiCAVc8qtBBpwJO
 UI0siexrCe64RuTBMRRc128GdFv7IjmSImcdi8xaR62bCcUiNdEa3zvjRe/9tOEH
 ET7Z85oBnKpSzpC3MdhSUU4dtHY5XLawP8z3oUU1VSzSWM2DVjlHf79/VzbOfp18
 miCVpt94lCn/gUX7el6qcnbuvMAjDyeC6HmfD+TwzQgGwyV6TLgKN9lRXeH/Oy6/
 VgjGQSavGHMll3zIGURmrBCXKudjJg0J+IP4wN1TimmSEMfwKH+7tnekQd8y5qlF
 eXEIqlWNykKeDzEnmV9QJy+/cV83hVWM/mUslcTx39tLN/3B/Us=
 =qTt8
 -----END PGP SIGNATURE-----

Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "Highlights:

   - speedup dead root detection during orphan cleanup, eg. when there
     are many deleted subvolumes waiting to be cleaned, the trees are
     now looked up in radix tree instead of a O(N^2) search

   - snapshot creation with inherited qgroup will mark the qgroup
     inconsistent, requires a rescan

   - send will emit file capabilities after chown, this produces a
     stream that does not need postprocessing to set the capabilities
     again

   - direct io ported to iomap infrastructure, cleaned up and simplified
     code, notably removing last use of struct buffer_head in btrfs code

  Core changes:

   - factor out backreference iteration, to be used by ordinary
     backreferences and relocation code

   - improved global block reserve utilization
      * better logic to serialize requests
      * increased maximum available for unlink
      * improved handling on large pages (64K)

   - direct io cleanups and fixes
      * simplify layering, where cloned bios were unnecessarily created
        for some cases
      * error handling fixes (submit, endio)
      * remove repair worker thread, used to avoid deadlocks during
        repair

   - refactored block group reading code, preparatory work for new type
     of block group storage that should improve mount time on large
     filesystems

  Cleanups:

   - cleaned up (and slightly sped up) set/get helpers for metadata data
     structure members

   - root bit REF_COWS got renamed to SHAREABLE to reflect the that the
     blocks of the tree get shared either among subvolumes or with the
     relocation trees

  Fixes:

   - when subvolume deletion fails due to ENOSPC, the filesystem is not
     turned read-only

   - device scan deals with devices from other filesystems that changed
     ownership due to overwrite (mkfs)

   - fix a race between scrub and block group removal/allocation

   - fix long standing bug of a runaway balance operation, printing the
     same line to the syslog, caused by a stale status bit on a reloc
     tree that prevented progress

   - fix corrupt log due to concurrent fsync of inodes with shared
     extents

   - fix space underflow for NODATACOW and buffered writes when it for
     some reason needs to fallback to COW mode"

* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
  btrfs: fix space_info bytes_may_use underflow during space cache writeout
  btrfs: fix space_info bytes_may_use underflow after nocow buffered write
  btrfs: fix wrong file range cleanup after an error filling dealloc range
  btrfs: remove redundant local variable in read_block_for_search
  btrfs: open code key_search
  btrfs: split btrfs_direct_IO to read and write part
  btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  fs: remove dio_end_io()
  btrfs: switch to iomap_dio_rw() for dio
  iomap: remove lockdep_assert_held()
  iomap: add a filesystem hook for direct I/O bio submission
  fs: export generic_file_buffered_read()
  btrfs: turn space cache writeout failure messages into debug messages
  btrfs: include error on messages about failure to write space/inode caches
  btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
  btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
  btrfs: make checksum item extension more efficient
  btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
  btrfs: unexport btrfs_compress_set_level()
  btrfs: simplify iget helpers
  ...
2020-06-02 19:59:25 -07:00
Goldwyn Rodrigues
b75b7ca7c2 fs: remove dio_end_io()
Since we removed the last user of dio_end_io(), remove the helper
function dio_end_io().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-28 14:01:51 +02:00
Eric Biggers
8d6c90c9d6 ANDROID: fscrypt: handle direct I/O with IV_INO_LBLK_32
With the existing fscrypt IV generation methods, each file's data blocks
have contiguous DUNs.  Therefore the direct I/O code "just worked"
because it only submits logically contiguous bios.  But with
IV_INO_LBLK_32, the direct I/O code breaks because the DUN can wrap from
0xffffffff to 0.  We can't submit bios across such boundaries.

This is especially difficult to handle when block_size != PAGE_SIZE,
since in that case the DUN can wrap in the middle of a page.  Punt on
this case for now and just handle block_size == PAGE_SIZE.

Add and use a new function fscrypt_dio_supported() to check whether a
direct I/O request is unsupported due to encryption constraints.

Then, update fs/direct-io.c (used by f2fs, and by ext4 in kernel v5.4
and earlier) and fs/iomap/direct-io.c (used by ext4 in kernel v5.5 and
later) to avoid submitting I/O across a DUN discontinuity.

(This is needed in ACK now because ACK already supports direct I/O with
inline crypto.  I'll be sending this upstream along with the encrypted
direct I/O support itself once its prerequisites are closer to landing.)

Test: For now, just manually tested direct I/O on ext4 and f2fs in the
      DUN discontinuity case.
Bug: 144046242
Change-Id: I0c0b0b20a73ade35c3660cc6f9c09d49d3853ba5
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-21 10:09:43 -07:00
Ming Lei
e6249cdd46 block: add blk_io_schedule() for avoiding task hung in sync dio
Sync dio could be big, or may take long time in discard or in case of
IO failure.

We have prevented task hung in submit_bio_wait() and blk_execute_rq(),
so apply the same trick for prevent task hung from happening in sync dio.

Add helper of blk_io_schedule() and use io_schedule_timeout() to prevent
task hung warning.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-12 20:32:42 -06:00
Eric Biggers
141f59b911 ANDROID: ext4, f2fs: enable direct I/O with inline encryption
ext4 and f2fs have traditionally not supported direct I/O on encrypted
files, since it's difficult to implement with the traditional
filesystem-layer encryption.  But when inline encryption is used
instead, it's straightforward to support direct I/O, as long as the I/O
is fully filesystem-block-aligned.  Add support for it by:

- Making the two generic direct I/O implementations in the kernel,
  __blockdev_direct_IO() and iomap_dio_rw(), set the encryption context
  on bios for inline-encrypted files.  __blockdev_direct_IO() is used by
  f2fs, and was used by ext4 in kernel v5.4 and earlier.  iomap_dio_rw()
  is used by ext4 in kernel v5.5 and later.

- Making ext4 and f2fs allow direct I/O to encrypted files (rather the
  current behavior of falling back to buffered I/O) when the file is
  using inline encryption and the I/O is fully filesystem-block-aligned.

Bug: 137270441
Change-Id: I4c8f7497eb8f829d03611d24281113d68c21d4d1
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-24 10:53:45 -08:00
Eric Biggers
b16155a0b0 fs/direct-io.c: include fs/internal.h for missing prototype
Include fs/internal.h to address the following 'sparse' warning:

    fs/direct-io.c:591:5: warning: symbol 'sb_init_dio_done_wq' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234544.128302-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Konstantin Khlebnikov
a92853b674 fs/direct-io.c: keep dio_warn_stale_pagecache() when CONFIG_BLOCK=n
This helper prints warning if direct I/O write failed to invalidate cache,
and set EIO at inode to warn usersapce about possible data corruption.

See also commit 5a9d929d6e ("iomap: report collisions between directio
and buffered writes to userspace").

Direct I/O is supported by non-disk filesystems, for example NFS.  Thus
generic code needs this even in kernel without CONFIG_BLOCK.

Link: http://lkml.kernel.org/r/157270038074.4812.7980855544557488880.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 06:29:18 -08:00
Randy Dunlap
c70d868f27 fs/direct-io.c: fix kernel-doc warning
Fix kernel-doc warning in fs/direct-io.c:

  fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'

Also, don't mark this function as having kernel-doc notation since it is
not exported.

Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org
Fixes: 6d544bb4d9 ("dio: centralize completion in dio_complete()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Zach Brown <zab@zabbo.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Christoph Hellwig
d7c8aa85ed direct-io: use bio_release_pages in dio_bio_complete
Use bio_release_pages instead of duplicating it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Thomas Gleixner
457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Christoph Hellwig
2b070cfe58 block: remove the i argument to bio_for_each_segment_all
We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30 09:26:13 -06:00
Ming Lei
6dc4f100c1 block: allow bio_for_each_segment_all() to iterate over multi-page bvec
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ernesto A. Fernández
8b9433eb4d direct-io: allow direct writes to empty inodes
On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently
not allowed to create blocks for an empty inode.  This confusion comes
from trying to bit shift a negative number, so check the size of the
inode first.

The problem is most visible for hfsplus, because the fallback to
buffered I/O doesn't happen and the write fails with EIO.  This is in
part the fault of the module, because it gives a wrong return value on
->get_block(); that will be fixed in a separate patch.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22 08:26:44 -07:00
Jens Axboe
89d04ec349 Linux 4.20-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwEZdIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAlQH/19oax2Za3IPqF4X
 DM3lal5M6zlUVkoYstqzpbR3MqUwgEnMfvoeMDC6mI9N4/+r2LkV7cRR8HzqQCCS
 jDfD69IzRGb52VSeJmbOrkxBWsR1Nn0t4Z3rEeLPxwaOoNpRc8H973MbAQ2FKMpY
 S4Y3jIK1dNiRRxdh52NupVkQF+djAUwkBuVk/rrvRJmTDij4la03cuCDAO+Di9lt
 GHlVvygKw2SJhDR+z3ArwZNmE0ceCcE6+W7zPHzj2KeWuKrZg22kfUD454f2YEIw
 FG0hu9qecgtpYCkLSm2vr4jQzmpsDoyq3ZfwhjGrP4qtvPC3Db3vL3dbQnkzUcJu
 JtwhVCE=
 =O1q1
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc5' into for-4.21/block

Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
also getting the merge fix that went into mainline that users are
hitting testing for-4.21/block and/or for-next.

* tag 'v4.20-rc5': (664 commits)
  Linux 4.20-rc5
  PCI: Fix incorrect value returned from pcie_get_speed_cap()
  MAINTAINERS: Update linux-mips mailing list address
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  ...
2018-12-04 09:38:05 -07:00
Maximilian Heyne
41e817bca3 fs: fix lost error code in dio_complete
commit e259221763 ("fs: simplify the
generic_write_sync prototype") reworked callers of generic_write_sync(),
and ended up dropping the error return for the directio path. Prior to
that commit, in dio_complete(), an error would be bubbled up the stack,
but after that commit, errors passed on to dio_complete were eaten up.

This was reported on the list earlier, and a fix was proposed in
https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
never followed up with.  We recently hit this bug in our testing where
fencing io errors, which were previously erroring out with EIO, were
being returned as success operations after this commit.

The fix proposed on the list earlier was a little short -- it would have
still called generic_write_sync() in case `ret` already contained an
error. This fix ensures generic_write_sync() is only called when there's
no pending error in the write. Additionally, transferred is replaced
with ret to bring this code in line with other callers.

Fixes: e259221763 ("fs: simplify the generic_write_sync prototype")
Reported-by: Ravi Nankani <rnankani@amazon.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Torsten Mehlan <tomeh@amazon.de>
CC: Uwe Dannowski <uwed@amazon.de>
CC: Amit Shah <aams@amazon.de>
CC: David Woodhouse <dwmw@amazon.co.uk>
CC: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30 08:35:14 -07:00
Jens Axboe
0a1b8b87d0 block: make blk_poll() take a parameter on whether to spin or not
blk_poll() has always kept spinning until it found an IO. This is
fine for SYNC polling, since we need to find one request we have
pending, but in preparation for ASYNC polling it can be beneficial
to just check if we have any entries available or not.

Existing callers are converted to pass in 'spin == true', to retain
the old behavior.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-26 08:25:53 -07:00
Jens Axboe
d1e36282b0 block: add REQ_HIPRI and inherit it from IOCB_HIPRI
We use IOCB_HIPRI to poll for IO in the caller instead of scheduling.
This information is not available for (or after) IO submission. The
driver may make different queue choices based on the type of IO, so
make the fact that we will poll for this IO known to the lower layers
as well.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:45:00 -07:00
David Howells
00e2370744 iov_iter: Use accessor function
Use accessor functions to access an iterator's type and direction.  This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:40:44 +01:00
Christoph Hellwig
0eb0b63c1d block: consistently use GFP_NOIO instead of __GFP_NORECLAIM
Same numerical value (for now at least), but a much better documentation
of intent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14 08:55:18 -06:00
Linus Torvalds
3b54765cca Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc things

 - ocfs2 updates

 - the v9fs maintainers have been missing for a long time. I've taken
   over v9fs patch slinging.

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits)
  mm,oom_reaper: check for MMF_OOM_SKIP before complaining
  mm/ksm: fix interaction with THP
  mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
  headers: untangle kmemleak.h from mm.h
  include/linux/mmdebug.h: make VM_WARN* non-rvals
  mm/page_isolation.c: make start_isolate_page_range() fail if already isolated
  mm: change return type to vm_fault_t
  mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
  mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
  kernel/fork.c: detect early free of a live mm
  mm: make counting of list_lru_one::nr_items lockless
  mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static
  block_invalidatepage(): only release page if the full page was invalidated
  mm: kernel-doc: add missing parameter descriptions
  mm/swap.c: remove @cold parameter description for release_pages()
  mm/nommu: remove description of alloc_vm_area
  zram: drop max_zpage_size and use zs_huge_class_size()
  zsmalloc: introduce zs_huge_class_size()
  mm: fix races between swapoff and flush dcache
  fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
  ...
2018-04-06 14:19:26 -07:00
Nikolay Borisov
1c0ff0f1bd fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
We already get the block counts and calculate the end block at the
beginning of the function.  Let's use the local variables for
consistency and readability.  No functional changes

[akpm@linux-foundation.org: constify the locals to prevent future slipups]
Link: http://lkml.kernel.org/r/1519638870-17756-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:26 -07:00
Nikolay Borisov
ce3077ee80 direct-io: Remove unused DIO_SKIP_DIO_COUNT logic
This flag was added by fe0f07d08e ("direct-io: only inc/deci
inode->i_dio_count for file systems") as means to optimise the atomic
modificaiton of the variable for blockdevices. However with the advent
of 542ff7bf18 ("block: new direct I/O implementation") it became
unused. So let's remove it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-12 10:21:24 -06:00
Nikolay Borisov
c8f4c36f81 direct-io: Remove unused DIO_ASYNC_EXTEND flag
This flag was added by 6039257378 ("direct-io: add flag to allow aio
writes beyond i_size") to support XFS. However, with the rework of
XFS' DIO's path to use iomap in acdda3aae1 ("xfs: use iomap_dio_rw")
it became redundant. So let's remove it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-12 10:21:22 -06:00
Jan Kara
d9c10e5b88 direct-io: Fix sleep in atomic due to sync AIO
Commit e864f39569 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
way for direct IO to become synchronous and thus trigger fsync from the
IO completion handler. Then commit 9830f4be15 "fs: Use RWF_* flags for
AIO operations" allowed these flags to be set for AIO as well. However
that commit forgot to update the condition checking whether the IO
completion handling should be defered to a workqueue and thus AIO DIO
with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
sleep in atomic.

Fix the problem by checking directly iocb flags (the same way as it is
done in dio_complete()) instead of checking all conditions that could
lead to IO being synchronous.

CC: Christoph Hellwig <hch@lst.de>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
CC: stable@vger.kernel.org
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9830f4be15
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:05:35 -07:00
Darrick J. Wong
5a9d929d6e iomap: report collisions between directio and buffered writes to userspace
If two programs simultaneously try to write to the same part of a file
via direct IO and buffered IO, there's a chance that the post-diowrite
pagecache invalidation will fail on the dirty page.  When this happens,
the dio write succeeded, which means that the page cache is no longer
coherent with the disk!

Programs are not supposed to mix IO types and this is a clear case of
data corruption, so store an EIO which will be reflected to userspace
during the next fsync.  Replace the WARN_ON with a ratelimited pr_crit
so that the developers have /some/ kind of breadcrumb to track down the
offending program(s) and file(s) involved.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2018-01-08 10:41:39 -08:00
Linus Torvalds
e2c5923c34 Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the main pull request for block storage for 4.15-rc1.

  Nothing out of the ordinary in here, and no API changes or anything
  like that. Just various new features for drivers, core changes, etc.
  In particular, this pull request contains:

   - A patch series from Bart, closing the whole on blk/scsi-mq queue
     quescing.

   - A series from Christoph, building towards hidden gendisks (for
     multipath) and ability to move bio chains around.

   - NVMe
        - Support for native multipath for NVMe (Christoph).
        - Userspace notifications for AENs (Keith).
        - Command side-effects support (Keith).
        - SGL support (Chaitanya Kulkarni)
        - FC fixes and improvements (James Smart)
        - Lots of fixes and tweaks (Various)

   - bcache
        - New maintainer (Michael Lyle)
        - Writeback control improvements (Michael)
        - Various fixes (Coly, Elena, Eric, Liang, et al)

   - lightnvm updates, mostly centered around the pblk interface
     (Javier, Hans, and Rakesh).

   - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

   - Writeback series that fix the much discussed hundreds of millions
     of sync-all units. This goes all the way, as discussed previously
     (me).

   - Fix for missing wakeup on writeback timer adjustments (Yafang
     Shao).

   - Fix laptop mode on blk-mq (me).

   - {mq,name} tupple lookup for IO schedulers, allowing us to have
     alias names. This means you can use 'deadline' on both !mq and on
     mq (where it's called mq-deadline). (me).

   - blktrace race fix, oopsing on sg load (me).

   - blk-mq optimizations (me).

   - Obscure waitqueue race fix for kyber (Omar).

   - NBD fixes (Josef).

   - Disable writeback throttling by default on bfq, like we do on cfq
     (Luca Miccio).

   - Series from Ming that enable us to treat flush requests on blk-mq
     like any other request. This is a really nice cleanup.

   - Series from Ming that improves merging on blk-mq with schedulers,
     getting us closer to flipping the switch on scsi-mq again.

   - BFQ updates (Paolo).

   - blk-mq atomic flags memory ordering fixes (Peter Z).

   - Loop cgroup support (Shaohua).

   - Lots of minor fixes from lots of different folks, both for core and
     driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
  nvme: fix visibility of "uuid" ns attribute
  blk-mq: fixup some comment typos and lengths
  ide: ide-atapi: fix compile error with defining macro DEBUG
  blk-mq: improve tag waiting setup for non-shared tags
  brd: remove unused brd_mutex
  blk-mq: only run the hardware queue if IO is pending
  block: avoid null pointer dereference on null disk
  fs: guard_bio_eod() needs to consider partitions
  xtensa/simdisk: fix compile error
  nvme: expose subsys attribute to sysfs
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: also expose the namespace identification sysfs files for mpath nodes
  nvme: implement multipath access to nvme subsystems
  nvme: track shared namespaces
  nvme: introduce a nvme_ns_ids structure
  nvme: track subsystems
  block, nvme: Introduce blk_mq_req_flags_t
  block, scsi: Make SCSI quiesce and resume work reliably
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  ...
2017-11-14 15:32:19 -08:00
Christoph Hellwig
ea435e1b93 block: add a poll_fn callback to struct request_queue
That we we can also poll non blk-mq queues.  Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-03 10:31:48 -06:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Linus Torvalds
73d3393ada Changes since last update:
- fix some more CONFIG_XFS_RT related build problems
 - fix data loss when writeback at eof races eofblocks gc and loses
 - invalidate page cache after fs finishes a dio write
 - remove dirty page state when invalidating pages so releasepage does
   the right thing when handed a dirty page
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJZ5jqbAAoJEPh/dxk0SrTrtfMP/jcQ6lTDcpnQ7XEP2fg2dXjx
 2+z8uI7Mjr5wo2qfIWHc8nZHZ+8KRak4U28rTlrXkeVbJ79x3Z+SzeipP76dGHXB
 u9MD7uacTD6BDT7R8/bux7g7KrPATVJYJiT3PRHZ5ysUT6i9KnREdbaKpgOwhMcI
 Ivd9ROZHx62CmZhsbfLzD+Ccy9/mGBR5OmT8nQlsuD8cEcFU5u1afaJ2/YlCjNLN
 c16Q8dhGXed7tjduiYCzsxDiewJMzSfcGdyk6yCwXdR3zcI3RdhXUN5FRH0R9GB2
 xxG1n5Q4qgtgODGgcPUl9WG8mfhVvEcuZGioxChQrxCEcaHt1Waop0fOixLy9J3Q
 lUn4qjA5S+VBqa6XsKCSCkiZdDtncSedvMRQYef09q8DGAouwAtN/Z3BVM24oyWU
 k5888Gt4EHZK6V3lz3qPMmGFxfuPL6GeyEvIYUezpVIYsmp0sLQTeNFUW+XC7fb/
 tOBNom4ARHFmSb5da7uwJvesNZBVFSpFQtxkcx1OL0rhTqlKIfPP61dLznKhqUTL
 2NhaFjnznYenSEK2CsP+V3CtQrCxywdqDNnOEgTgKJbWPpsYMX63z/Cmtm0A7Qdz
 BAbGc+OSBLqelwsWNnNzTWPHk33SKxtIxGTe8gKbKbrzbR7mxyJxHKEwpZvWIqh+
 8eTdgJb1wgJyqtBsTSHN
 =UY00
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.14-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

 - fix some more CONFIG_XFS_RT related build problems

 - fix data loss when writeback at eof races eofblocks gc and loses

 - invalidate page cache after fs finishes a dio write

 - remove dirty page state when invalidating pages so releasepage does
   the right thing when handed a dirty page

* tag 'xfs-4.14-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: move two more RT specific functions into CONFIG_XFS_RT
  xfs: trim writepage mapping to within eof
  fs: invalidate page cache after end_io() in dio completion
  xfs: cancel dirty pages on invalidation
2017-10-18 14:51:50 -04:00
Linus Torvalds
020b302376 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three small fixes:

   - A fix for skd, it was using kfree() to free a structure allocate
     with kmem_cache_alloc().

   - Stable fix for nbd, fixing a regression using the normal ioctl
     based tools.

   - Fix for a previous fix in this series, that fixed up
     inconsistencies between buffered and direct IO"

* 'for-linus' of git://git.kernel.dk/linux-block:
  fs: Avoid invalidation in interrupt context in dio_complete()
  nbd: don't set the device size until we're connected
  skd: Use kmem_cache_free
2017-10-18 14:43:40 -04:00