android_kernel_samsung_sm8650/fs
Dave Chinner b1c5ebb213 xfs: allocate log vector buffers outside CIL context lock
One of the problems we currently have with delayed logging is that
under serious memory pressure we can deadlock memory reclaim. THis
occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
inodes and issues a log force to unpin inodes that are dirty in the
CIL.

The CIL is pushed, but this will only occur once it gets the CIL
context lock to ensure that all committing transactions are complete
and no new transactions start being committed to the CIL while the
push switches to a new context.

The deadlock occurs when the CIL context lock is held by a
committing process that is doing memory allocation for log vector
buffers, and that allocation is then blocked on memory reclaim
making progress. Memory reclaim, however, is blocked waiting for
a log force to make progress, and so we effectively deadlock at this
point.

To solve this problem, we have to move the CIL log vector buffer
allocation outside of the context lock so that memory reclaim can
always make progress when it needs to force the log. The problem
with doing this is that a CIL push can take place while we are
determining if we need to allocate a new log vector buffer for
an item and hence the current log vector may go away without
warning. That means we canot rely on the existing log vector being
present when we finally grab the context lock and so we must have a
replacement buffer ready to go at all times.

To ensure this, introduce a "shadow log vector" buffer that is
always guaranteed to be present when we gain the CIL context lock
and format the item. This shadow buffer may or may not be used
during the formatting, but if the log item does not have an existing
log vector buffer or that buffer is too small for the new
modifications, we swap it for the new shadow buffer and format
the modifications into that new log vector buffer.

The result of this is that for any object we modify more than once
in a given CIL checkpoint, we double the memory required
to track dirty regions in the log. For single modifications then
we consume the shadow log vectorwe allocate on commit, and that gets
consumed by the checkpoint. However, if we make multiple
modifications, then the second transaction commit will allocate a
shadow log vector and hence we will end up with double the memory
usage as only one of the log vectors is consumed by the CIL
checkpoint. The remaining shadow vector will be freed when th elog
item is freed.

This can probably be optimised in future - access to the shadow log
vector is serialised by the object lock (as opposited to the active
log vector, which is controlled by the CIL context lock) and so we
can probably free shadow log vector from some objects when the log
item is marked clean on removal from the AIL.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22 09:52:35 +10:00
..
9p switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
adfs fs/adfs/adfs.h: tidy up comments 2016-01-20 17:09:18 -08:00
affs affs: fix remount failure when there are no options changed 2016-05-28 16:50:24 -07:00
afs remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
autofs4 dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared 2016-05-02 19:49:32 -04:00
befs fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL 2016-05-23 17:04:14 -07:00
bfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
cachefiles mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
cifs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
coda introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
configfs configfs_readdir(): make safe under shared lock 2016-05-09 11:41:13 -04:00
cramfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
crypto fscrypto/f2fs: allow fs-specific key prefix for fs encryption 2016-05-07 10:32:33 -07:00
debugfs Merge 4.6-rc4 into driver-core-next 2016-04-19 04:28:28 +09:00
devpts devpts: more pty driver interface cleanups 2016-04-26 15:47:32 -07:00
dlm mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
ecryptfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
efivarfs fs/efivarfs/inode.c: use generic UUID library 2016-05-20 17:58:30 -07:00
efs fs/efs/super.c: fix return value 2016-05-20 17:58:30 -07:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-05-17 17:05:30 -07:00
exportfs introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
f2fs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
fat Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
freevxfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
fscache mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
fuse switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
hfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
hfsplus switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
hostfs hostfs: switch to ->iterate_shared() 2016-05-12 19:49:30 -04:00
hpfs hpfs: implement the show_options method 2016-05-28 16:50:24 -07:00
hugetlbfs mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
isofs Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
jbd2 Fix a number of bugs, most notably a potential stale data exposure 2016-05-24 12:55:26 -07:00
jffs2 switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
jfs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
kernfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
lockd lockd: constify nlmsvc_binding structure 2016-01-07 10:10:50 -05:00
logfs logfs: no need to lock directory in lseek 2016-05-09 11:42:19 -04:00
minix simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
ncpfs mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
nfs nfs: fix anonymous member initializer build failure with older compilers 2016-05-27 17:20:27 -07:00
nfs_common
nfsd A very quiet cycle for nfsd, mainly just an RDMA update from Chuck Lever. 2016-05-24 14:39:20 -07:00
nilfs2 nilfs2: fix block comments 2016-05-23 17:04:14 -07:00
nls
notify fsnotify: avoid spurious EMFILE errors from inotify_init() 2016-05-19 19:12:14 -07:00
ntfs fs: simplify the generic_write_sync prototype 2016-05-01 19:58:39 -04:00
ocfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
omfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
openpromfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
orangefs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
overlayfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
proc mm, proc: make clear_refs killable 2016-05-23 17:04:14 -07:00
pstore mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
quota fs/quota: use nla_put_u64_64bit() 2016-04-26 12:00:48 -04:00
ramfs tmpfs/ramfs: fix VM_MAYSHARE mappings for NOMMU 2016-05-20 17:58:30 -07:00
reiserfs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
romfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
squashfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
sysfs
sysv simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
tracefs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ubifs This pull request contains mostly cleanups and minor 2016-05-27 18:49:29 -07:00
udf Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
ufs simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
xfs xfs: allocate log vector buffers outside CIL context lock 2016-07-22 09:52:35 +10:00
aio.c aio: make aio_setup_ring killable 2016-05-23 17:04:14 -07:00
anon_inodes.c
attr.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
bad_inode.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
binfmt_aout.c fs: fix binfmt_aout.c build error 2016-05-28 16:34:59 -07:00
binfmt_elf_fdpic.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:51:59 -07:00
binfmt_elf.c mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
binfmt_em86.c
binfmt_flat.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
binfmt_misc.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
binfmt_script.c
block_dev.c DAX error handling for 4.7 2016-05-26 19:34:26 -07:00
buffer.c mm, page_alloc: avoid looking up the first zone in a zonelist twice 2016-05-19 19:12:14 -07:00
char_dev.c chrdev: emit a warning when we go below dynamic major range 2016-03-29 10:11:44 -07:00
compat_binfmt_elf.c
compat_ioctl.c Merge 4.5-rc4 into char-misc-next 2016-02-14 14:25:59 -08:00
compat.c Fix a number of bugs, most notably a potential stale data exposure 2016-05-24 12:55:26 -07:00
coredump.c coredump: make coredump_wait wait for mmap_sem for write killable 2016-05-23 17:04:14 -07:00
dax.c Filesystem DAX locking for 4.7 2016-05-26 20:00:28 -07:00
dcache.c Merge branch 'hash' of git://ftp.sciencehorizons.net/linux 2016-05-28 16:15:25 -07:00
dcookies.c
direct-io.c direct-io: fix direct write stale data exposure from concurrent buffered read 2016-05-27 14:49:37 -07:00
drop_caches.c
eventfd.c eventfd: document lockless access in eventfd_poll 2016-03-22 15:36:02 -07:00
eventpoll.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
exec.c exec: make exec path waiting for mmap_sem killable 2016-05-23 17:04:14 -07:00
fcntl.c fcntl: allow to set O_DIRECT flag on pipe 2016-01-09 02:55:37 -05:00
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
file_table.c
file.c give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() 2016-05-02 19:49:28 -04:00
filesystems.c find_filesystem(): simplify comparison 2016-01-19 12:02:23 -05:00
fs_pin.c
fs_struct.c
fs-writeback.c mm,writeback: don't use memory reserves for wb_start_writeback 2016-05-20 17:58:30 -07:00
inode.c parallel lookups: actual switch to rwsem 2016-05-02 19:49:28 -04:00
internal.h Merge branch 'for-linus' into work.misc 2016-01-08 21:20:11 -05:00
ioctl.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
Kconfig dax: Make huge page handling depend of CONFIG_BROKEN 2016-05-19 15:13:17 -06:00
Kconfig.binfmt ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
libfs.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
locks.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
Makefile Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
mbcache.c mbcache: add reusable flag to cache entries 2016-02-22 22:44:04 -05:00
mount.h
mpage.c mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
namei.c hash_string: Fix zero-length case for !DCACHE_WORD_ACCESS 2016-05-29 07:33:47 -07:00
namespace.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
no-block.c
nsfs.c
open.c Merge branch 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 14:41:03 -07:00
pipe.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
pnode.c propogate_mnt: Handle the first propogated copy being a slave 2016-05-05 09:54:45 -05:00
pnode.h
posix_acl.c switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-03-16 13:09:08 -04:00
read_write.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:46:23 -07:00
readdir.c restore killability of old mutex_lock_killable(&inode->i_mutex) users 2016-05-26 00:13:25 -04:00
select.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
seq_file.c Make file credentials available to the seqfile interfaces 2016-04-14 12:56:09 -07:00
signalfd.c
splice.c Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
stack.c
stat.c fs/stat.c: drop the last new_valid_dev check 2016-01-16 11:17:23 -08:00
statfs.c
super.c Merge branch 'master' into for-next 2016-04-18 11:18:55 +02:00
sync.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
timerfd.c timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-01-17 11:13:55 +01:00
userfaultfd.c userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-20 17:58:30 -07:00
utimes.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
xattr.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00