android_kernel_xiaomi_sm8450/fs
Kirill Smelkov ad2ba64dd4 fuse: allow filesystems to have precise control over data cache
On networked filesystems file data can be changed externally.  FUSE
provides notification messages for filesystem to inform kernel that
metadata or data region of a file needs to be invalidated in local page
cache. That provides the basis for filesystem implementations to invalidate
kernel cache explicitly based on observed filesystem-specific events.

FUSE has also "automatic" invalidation mode(*) when the kernel
automatically invalidates data cache of a file if it sees mtime change.  It
also automatically invalidates whole data cache of a file if it sees file
size being changed.

The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA.
However, due to probably historical reason, that capability controls only
whether mtime change should be resulting in automatic invalidation or
not. A change in file size always results in invalidating whole data cache
of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+).

The filesystem I write[1] represents data arrays stored in networked
database as local files suitable for mmap. It is read-only filesystem -
changes to data are committed externally via database interfaces and the
filesystem only glues data into contiguous file streams suitable for mmap
and traditional array processing. The files are big - starting from
hundreds gigabytes and more. The files change regularly, and frequently by
data being appended to their end. The size of files thus changes
frequently.

If a file was accessed locally and some part of its data got into page
cache, we want that data to stay cached unless there is memory pressure, or
unless corresponding part of the file was actually changed. However current
FUSE behaviour - when it sees file size change - is to invalidate the whole
file. The data cache of the file is thus completely lost even on small size
change, and despite that the filesystem server is careful to accurately
translate database changes into FUSE invalidation messages to kernel.

Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA
capability, indicates to kernel that it is fully responsible for data cache
invalidation, then the kernel won't invalidate files data cache on size
change and only truncate that cache to new size in case the size decreased.

(*) see 72d0d248ca "fuse: add FUSE_AUTO_INVAL_DATA init flag",
eed2179efe "fuse: invalidate inode mapping if mtime changes"

(+) in writeback mode the kernel does not invalidate data cache on file
size change, but neither it allows the filesystem to set the size due to
external event (see 8373200b12 "fuse: Trust kernel i_size only")

[1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24 17:05:06 +02:00
..
9p Pull request for inlusion in 5.1 2019-03-17 09:10:56 -07:00
adfs adfs: use timespec64 for time conversion 2018-08-22 10:52:51 -07:00
affs
afs afs: Fix in-progess ops to ignore server-level callback invalidation 2019-04-13 08:37:37 +01:00
autofs autofs: clear O_NONBLOCK on the pipe 2019-03-07 18:32:01 -08:00
befs fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
bfs bfs: extra sanity checking and static inode bitmap 2019-01-04 13:13:47 -08:00
btrfs for-5.1-rc4-tag 2019-04-11 14:19:02 -07:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-11-30 16:00:58 +00:00
ceph ceph: fix use-after-free on symlink traversal 2019-03-27 19:00:37 +01:00
cifs CIFS: keep FileInfo handle live during oplock break 2019-04-16 09:38:38 -05:00
coda
configfs configfs: fix registered group removal 2018-07-17 06:14:07 -07:00
cramfs Make the Cramfs code more robust against filesystem corruptions, 2018-10-30 12:46:25 -07:00
crypto fscrypt updates for v5.1 2019-03-09 10:54:24 -08:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
devpts fs/devpts: always delete dcache dentry-s in dput() 2019-01-24 13:38:30 -05:00
dlm socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes 2019-02-03 11:17:31 -08:00
ecryptfs crypto: clarify name of WEAK_KEY request flag 2019-01-25 18:41:52 +08:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exportfs exportfs: do not read dentry after free 2018-11-23 09:08:17 -05:00
ext2 \n 2019-03-07 09:01:33 -08:00
ext4 Miscellaneous ext4 bug fixes for 5.1. 2019-03-24 13:41:37 -07:00
f2fs f2fs-for-5.1-rc1 2019-03-15 13:42:53 -07:00
fat fat: enable .splice_write to support splice on O_DIRECT file 2019-03-07 18:32:01 -08:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-11-30 15:57:31 +00:00
fuse fuse: allow filesystems to have precise control over data cache 2019-04-24 17:05:06 +02:00
gfs2 We've only got three patches ready for this merge window: 2019-03-09 11:52:11 -08:00
hfs hfs: do not free node before using 2018-11-30 14:56:14 -08:00
hfsplus hfsplus: return file attributes on statx 2019-01-04 13:13:47 -08:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: fix spelling mistake "partion" -> "partition" 2019-03-12 09:58:03 -07:00
hugetlbfs hugetlbfs: fix memory leak for resv_map 2019-04-05 16:02:31 -10:00
isofs Update email address 2018-09-29 22:47:48 -04:00
jbd2 jbd2: jbd2_get_transaction does not need to return a value 2019-03-01 00:36:57 -05:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
jfs jfs: remove redundant dquot_initialize() in jfs_evict_inode() 2018-09-20 09:28:49 -05:00
kernfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
lockd NFS: fix mount/umount race in nlmclnt. 2019-03-18 22:39:34 -04:00
minix
nfs NFSv4.1 fix incorrect return value in copy_file_range 2019-04-11 15:23:48 -04:00
nfs_common
nfsd Miscellaneous NFS server fixes. Probably the most visible bug is one 2019-03-12 15:06:54 -07:00
nilfs2 XArray: Change xa_insert to return -EBUSY 2019-02-06 13:12:15 -05:00
nls
notify fanotify: Allow copying of file handle to userspace 2019-03-19 09:29:07 +01:00
ntfs mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
ocfs2 ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock 2019-03-29 10:01:37 -07:00
omfs
openpromfs fs/openpromfs: Use of_node_name_eq for node name comparisons 2018-11-18 13:35:19 -08:00
orangefs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 13:27:20 -07:00
overlayfs ovl: Do not lose security.capability xattr over metadata file copy-up 2019-02-13 11:14:46 +01:00
proc coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-19 09:46:05 -07:00
pstore pstore/ram: Avoid needless alloc during header write 2019-02-12 13:45:53 -08:00
qnx4
qnx6
quota quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. 2018-12-18 18:29:15 +01:00
ramfs
reiserfs reiserfs: remove workaround code for GCC 3.x 2018-10-31 08:54:14 -07:00
romfs
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-16 10:31:02 -07:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-11-10 08:02:40 -05:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs ubifs: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
udf udf: Propagate errors from udf_truncate_extents() 2019-03-18 16:30:02 +01:00
ufs fs/ufs: use ktime_get_real_seconds for sb and cg timestamps 2018-08-17 16:20:27 -07:00
xfs xfs: serialize unaligned dio writes against all other dio writes 2019-03-26 08:37:55 -07:00
aio.c aio: use kmem_cache_free() instead of kfree() 2019-04-04 20:13:59 -04:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c a.out: remove core dumping support 2019-03-05 10:00:35 -08:00
binfmt_elf_fdpic.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
binfmt_elf.c fs/binfmt_elf.c: spread const a little 2019-03-07 18:32:01 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c exec: load_script: Do not exec truncated interpreter path 2019-02-18 16:49:36 -08:00
block_dev.c block: fix the return errno for direct IO 2019-04-11 21:22:21 -06:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-02-28 13:59:41 -07:00
char_dev.c
compat_binfmt_elf.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
compat_ioctl.c media updates for v4.20-rc1 2018-10-29 14:29:58 -07:00
compat.c
coredump.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
d_path.c
dax.c fs/dax: Deposit pagetable even when installing zero page 2019-03-13 13:58:46 -07:00
dcache.c fs/dcache: Track & report number of negative dentries 2019-01-30 11:02:11 -08:00
dcookies.c
direct-io.c block: allow bio_for_each_segment_all() to iterate over multi-page bvec 2019-02-15 08:40:11 -07:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-02-01 15:46:24 -08:00
eventfd.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
eventpoll.c epoll: use rwlock in order to reduce ep_poll_callback() contention 2019-03-07 18:32:01 -08:00
exec.c exec: increase BINPRM_BUF_SIZE to 256 2019-03-07 18:32:01 -08:00
fcntl.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
fhandle.c
file_table.c fs: add fget_many() and fput_many() 2019-02-28 08:24:23 -07:00
file.c io_uring-2019-03-06 2019-03-08 14:48:40 -08:00
filesystems.c vfs: Implement a filesystem superblock creation/configuration context 2019-02-28 03:29:26 -05:00
fs_context.c vfs: Implement logging through fs_context 2019-02-28 03:29:37 -05:00
fs_parser.c fs: fs_parser: fix printk format warning 2019-03-29 10:01:38 -07:00
fs_pin.c
fs_struct.c
fs_types.c fs: common implementation of file type 2019-01-21 17:48:13 +01:00
fs-writeback.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-01-22 14:39:38 -07:00
inode.c fs/inode.c: inode_set_flags(): replace opencoded set_mask_bits() 2019-03-05 21:07:13 -08:00
internal.h vfs: Add configuration parser helpers 2019-02-28 03:28:53 -05:00
io_uring.c io_uring: fix CQ overflow condition 2019-04-17 11:41:49 -06:00
ioctl.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
iomap.c block: add BIO_NO_PAGE_REF flag 2019-03-18 10:44:48 -06:00
Kconfig Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c
locks.c locks: wake any locks blocked on request before deadlock check 2019-03-25 08:36:24 -04:00
Makefile Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
mbcache.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
mount.h saner handling of temporary namespaces 2019-01-30 17:44:07 -05:00
mpage.c block: allow bio_for_each_segment_all() to iterate over multi-page bvec 2019-02-15 08:40:11 -07:00
namei.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
namespace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
no-block.c
nsfs.c
open.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
pipe.c Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
pnode.c separate copying and locking mount tree on cross-userns copies 2019-01-30 17:14:50 -05:00
pnode.h separate copying and locking mount tree on cross-userns copies 2019-01-30 17:14:50 -05:00
posix_acl.c
proc_namespace.c
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
readdir.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
select.c y2038: syscalls: rename y2038 compat syscalls 2019-02-07 00:13:27 +01:00
seq_file.c fs/seq_file.c: simplify seq_file iteration code and interface 2018-08-17 16:20:28 -07:00
signalfd.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
splice.c Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
stack.c
stat.c fs: move generic stat response attr handling to vfs_getattr_nosec 2019-02-01 01:55:45 -05:00
statfs.c vfs: add vfs_get_fsid() helper 2019-02-07 16:38:35 +01:00
super.c vfs: Add some logging to the core users of the fs_context log 2019-02-28 03:29:38 -05:00
sync.c
timerfd.c y2038: syscalls: rename y2038 compat syscalls 2019-02-07 00:13:27 +01:00
userfaultfd.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-19 09:46:05 -07:00
utimes.c y2038: syscalls: rename y2038 compat syscalls 2019-02-07 00:13:27 +01:00
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00