android_kernel_samsung_sm8650/fs/xfs
Dave Chinner 0d227466be xfs: intent item whiteouts
When we log modifications based on intents, we add both intent
and intent done items to the modification being made. These get
written to the log to ensure that the operation is re-run if the
intent done is not found in the log.

However, for operations that complete wholly within a single
checkpoint, the change in the checkpoint is atomic and will never
need replay. In this case, we don't need to actually write the
intent and intent done items to the journal because log recovery
will never need to manually restart this modification.

Log recovery currently handles intent/intent done matching by
inserting the intent into the AIL, then removing it when a matching
intent done item is found. Hence for all the intent-based operations
that complete within a checkpoint, we spend all that time parsing
the intent/intent done items just to cancel them and do nothing with
them.

Hence it follows that the only time we actually need intents in the
log is when the modification crosses checkpoint boundaries in the
log and so may only be partially complete in the journal. Hence if
we commit and intent done item to the CIL and the intent item is in
the same checkpoint, we don't actually have to write them to the
journal because log recovery will always cancel the intents.

We've never really worried about the overhead of logging intents
unnecessarily like this because the intents we log are generally
very much smaller than the change being made. e.g. freeing an extent
involves modifying at lease two freespace btree blocks and the AGF,
so the EFI/EFD overhead is only a small increase in space and
processing time compared to the overall cost of freeing an extent.

However, delayed attributes change this cost equation dramatically,
especially for inline attributes. In the case of adding an inline
attribute, we only log the inode core and attribute fork at present.
With delayed attributes, we now log the attr intent which includes
the name and value, the inode core adn attr fork, and finally the
attr intent done item. We increase the number of items we log from 1
to 3, and the number of log vectors (regions) goes up from 3 to 7.
Hence we tripple the number of objects that the CIL has to process,
and more than double the number of log vectors that need to be
written to the journal.

At scale, this means delayed attributes cause a non-pipelined CIL to
become CPU bound processing all the extra items, resulting in a > 40%
performance degradation on 16-way file+xattr create worklaods.
Pipelining the CIL (as per 5.15) reduces the performance degradation
to 20%, but now the limitation is the rate at which the log items
can be written to the iclogs and iclogs be dispatched for IO and
completed.

Even log IO completion is slowed down by these intents, because it
now has to process 3x the number of items in the checkpoint.
Processing completed intents is especially inefficient here, because
we first insert the intent into the AIL, then remove it from the AIL
when the intent done is processed. IOWs, we are also doing expensive
operations in log IO completion we could completely avoid if we
didn't log completed intent/intent done pairs.

Enter log item whiteouts.

When an intent done is committed, we can check to see if the
associated intent is in the same checkpoint as we are currently
committing the intent done to. If so, we can mark the intent log
item with a whiteout and immediately free the intent done item
rather than committing it to the CIL. We can basically skip the
entire formatting and CIL insertion steps for the intent done item.

However, we cannot remove the intent item from the CIL at this point
because the unlocked per-cpu CIL item lists do not permit removal
without holding the CIL context lock exclusively. Transaction commit
only holds the context lock shared, hence the best we can do is mark
the intent item with a whiteout so that the CIL push can release it
rather than writing it to the log.

This means we never write the intent to the log if the intent done
has also been committed to the same checkpoint, but we'll always
write the intent if the intent done has not been committed or has
been committed to a different checkpoint. This will result in
correct log recovery behaviour in all cases, without the overhead of
logging unnecessary intents.

This intent whiteout concept is generic - we can apply it to all
intent/intent done pairs that have a direct 1:1 relationship. The
way deferred ops iterate and relog intents mean that all intents
currently have a 1:1 relationship with their done intent, and hence
we can apply this cancellation to all existing intent/intent done
implementations.

For delayed attributes with a 16-way 64kB xattr create workload,
whiteouts reduce the amount of journalled metadata from ~2.5GB/s
down to ~600MB/s and improve the creation rate from 9000/s to
14000/s.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:50:29 +10:00
..
libxfs xfs: tag transactions that contain intent done items 2022-05-04 11:46:21 +10:00
scrub Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
Kconfig xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n 2020-10-16 15:34:28 -07:00
kmem.c mm: introduce memalloc_retry_wait() 2022-01-15 16:30:29 +02:00
kmem.h xfs: remove kmem_zone typedef 2021-10-22 16:00:31 -07:00
Makefile xfs: refactor log recovery item sorting into a generic dispatch structure 2020-05-08 08:49:58 -07:00
mrlock.h xfs: convert to SPDX license tags 2018-06-06 14:17:53 -07:00
xfs_acl.c overlayfs update for 5.15 2021-09-02 09:21:27 -07:00
xfs_acl.h vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
xfs_aops.c fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio 2022-03-16 13:37:05 -04:00
xfs_aops.h xfs: add a xfs_inode_buftarg helper 2019-10-28 08:37:54 -07:00
xfs_attr_inactive.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_attr_list.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_bio_io.c Bug fixes for 5.18: 2022-04-01 19:30:44 -07:00
xfs_bmap_item.c xfs: whiteouts release intents that are not in the AIL 2022-05-04 11:46:47 +10:00
xfs_bmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_bmap_util.c xfs: Conditionally upgrade existing inodes to use large extent counters 2022-04-13 07:02:44 +00:00
xfs_bmap_util.h xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_buf_item_recover.c xfs: check sb_meta_uuid for dabuf buffer recovery 2021-12-21 09:49:41 -08:00
xfs_buf_item.c xfs: log items should have a xlog pointer, not a mount 2022-03-20 08:59:49 -07:00
xfs_buf_item.h xfs: convert buffer log item flags to unsigned. 2022-04-21 10:46:40 +10:00
xfs_buf.c xfs: convert buffer flags to unsigned. 2022-04-21 08:44:59 +10:00
xfs_buf.h xfs: convert buffer flags to unsigned. 2022-04-21 08:44:59 +10:00
xfs_dir2_readdir.c xfs: take the ILOCK when readdir inspects directory mapping data 2022-01-11 15:11:04 -08:00
xfs_discard.c xfs: convert mount flags to features 2021-08-19 10:07:12 -07:00
xfs_discard.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs_dquot_item_recover.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_dquot_item.c xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot_item.h xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot.c xfs: Conditionally upgrade existing inodes to use large extent counters 2022-04-13 07:02:44 +00:00
xfs_dquot.h xfs: queue inactivation immediately when quota is nearing enforcement 2021-08-09 10:52:18 -07:00
xfs_error.c xfs: sysfs: use default_groups in kobj_type 2022-01-06 10:43:30 -08:00
xfs_error.h xfs: convert ptag flags to unsigned. 2022-04-21 10:47:25 +10:00
xfs_export.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_export.h xfs: convert to SPDX license tags 2018-06-06 14:17:53 -07:00
xfs_extent_busy.c xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extent_busy.h xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extfree_item.c xfs: whiteouts release intents that are not in the AIL 2022-05-04 11:46:47 +10:00
xfs_extfree_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_file.c Merge branch 'guilt/xfs-unsigned-flags-5.18' into xfs-5.19-for-next 2022-04-21 16:45:03 +10:00
xfs_filestream.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_filestream.h xfs: convert mount flags to features 2021-08-19 10:07:12 -07:00
xfs_fsmap.c xfs: pass explicit mount pointer to rtalloc query functions 2022-04-12 06:49:41 +10:00
xfs_fsmap.h xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsops.c Merge branch 'guilt/xfs-unsigned-flags-5.18' into xfs-5.19-for-next 2022-04-21 16:45:03 +10:00
xfs_fsops.h xfs: get rid of xfs_growfs_{data,log}_t 2021-02-03 09:18:50 -08:00
xfs_globals.c xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_health.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_icache.c xfs: use a separate frextents counter for rt extent reservations 2022-04-12 06:49:42 +10:00
xfs_icache.h xfs: throttle inode inactivation queuing on memory reclaim 2021-08-09 11:13:17 -07:00
xfs_icreate_item.c xfs: fix potential log item leak 2022-05-04 11:45:11 +10:00
xfs_icreate_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_inode_item_recover.c xfs: hide log iovec alignment constraints 2022-05-04 11:45:50 +10:00
xfs_inode_item.c xfs: hide log iovec alignment constraints 2022-05-04 11:45:50 +10:00
xfs_inode_item.h xfs: aborting inodes on shutdown may need buffer lock 2022-03-29 18:21:59 -07:00
xfs_inode.c Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_inode.h Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_ioctl32.c x86: Remove toolchain check for X32 ABI capability 2022-03-15 10:32:48 +01:00
xfs_ioctl32.h xfs: remove unused xfs_ioctl32.h declarations 2022-01-18 10:18:36 -08:00
xfs_ioctl.c xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters 2022-04-13 07:02:45 +00:00
xfs_ioctl.h xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_iomap.c xfs: Conditionally upgrade existing inodes to use large extent counters 2022-04-13 07:02:44 +00:00
xfs_iomap.h iomap: add a IOMAP_DAX flag 2021-12-04 08:58:53 -08:00
xfs_iops.c xfs: refactor user/group quota chown in xfs_setattr_nonsize 2022-03-14 10:23:17 -07:00
xfs_iops.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_itable.c xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters 2022-04-13 07:02:45 +00:00
xfs_itable.h xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters 2022-04-13 07:02:45 +00:00
xfs_iwalk.c xfs: avoid buffer deadlocks when walking fs inodes 2021-08-09 11:13:16 -07:00
xfs_iwalk.h xfs: Decouple XFS_IBULK flags from XFS_IWALK flags 2022-04-13 07:02:44 +00:00
xfs_linux.h xfs: drop async cache flushes from CIL commits. 2022-03-29 18:22:02 -07:00
xfs_log_cil.c xfs: intent item whiteouts 2022-05-04 11:50:29 +10:00
xfs_log_priv.h Merge branch 'guilt/xlog-write-rework' into xfs-5.19-for-next 2022-04-21 16:45:52 +10:00
xfs_log_recover.c xfs: log shutdown triggers should only shut down the log 2022-03-29 18:22:01 -07:00
xfs_log.c Merge branch 'guilt/xlog-write-rework' into xfs-5.19-for-next 2022-04-21 16:45:52 +10:00
xfs_log.h xfs: hide log iovec alignment constraints 2022-05-04 11:45:50 +10:00
xfs_message.c Merge branch 'guilt/xfs-unsigned-flags-5.18' into xfs-5.19-for-next 2022-04-21 16:45:03 +10:00
xfs_message.h Merge branch 'guilt/xfs-unsigned-flags-5.18' into xfs-5.19-for-next 2022-04-21 16:45:03 +10:00
xfs_mount.c xfs: use a separate frextents counter for rt extent reservations 2022-04-12 06:49:42 +10:00
xfs_mount.h Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_mru_cache.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_mru_cache.h xfs: convert to SPDX license tags 2018-06-06 14:17:53 -07:00
xfs_ondisk.h xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_pnfs.c xfs: use setattr_copy to set vfs inode attributes 2022-03-14 10:23:16 -07:00
xfs_pnfs.h xfs: prepare xfs_break_layouts() for another layout type 2018-05-22 07:19:08 -07:00
xfs_pwork.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_pwork.h xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_qm_bhv.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_qm_syscalls.c xfs: fix quotaoff mutex usage now that we don't support disabling it 2021-12-21 09:49:41 -08:00
xfs_qm.c xfs: xfs_is_shutdown vs xlog_is_shutdown cage fight 2022-03-20 08:59:50 -07:00
xfs_qm.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_quota.h xfs: queue inactivation immediately when quota is nearing enforcement 2021-08-09 10:52:18 -07:00
xfs_quotaops.c xfs: remove the active vs running quota differentiation 2021-08-06 11:05:37 -07:00
xfs_refcount_item.c xfs: whiteouts release intents that are not in the AIL 2022-05-04 11:46:47 +10:00
xfs_refcount_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_reflink.c xfs: Conditionally upgrade existing inodes to use large extent counters 2022-04-13 07:02:44 +00:00
xfs_reflink.h xfs: convert xfs_sb_version_has checks to use mount features 2021-08-19 10:07:14 -07:00
xfs_rmap_item.c xfs: whiteouts release intents that are not in the AIL 2022-05-04 11:46:47 +10:00
xfs_rmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_rtalloc.c Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_rtalloc.h xfs: recalculate free rt extents after log recovery 2022-04-12 06:49:42 +10:00
xfs_stats.c xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_stats.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_super.c Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_super.h xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_symlink.c xfs: Directory's data fork extent counter can never overflow 2022-04-13 07:02:07 +00:00
xfs_symlink.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_sysctl.c xfs: restore speculative_cow_prealloc_lifetime sysctl 2021-02-24 10:16:08 -08:00
xfs_sysctl.h xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_sysfs.c xfs: sysfs: use default_groups in kobj_type 2022-01-06 10:43:30 -08:00
xfs_sysfs.h xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init 2020-08-07 11:50:17 -07:00
xfs_trace.c xfs: add trace point for fs shutdown 2021-08-18 18:46:00 -07:00
xfs_trace.h xfs: intent item whiteouts 2022-05-04 11:50:29 +10:00
xfs_trans_ail.c xfs: log shutdown triggers should only shut down the log 2022-03-29 18:22:01 -07:00
xfs_trans_buf.c xfs: introduce xfs_buf_daddr() 2021-08-19 10:07:14 -07:00
xfs_trans_dquot.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_trans_priv.h xfs: AIL should be log centric 2022-03-20 08:59:49 -07:00
xfs_trans.c Merge branch 'guilt/xlog-write-rework' into xfs-5.19-for-next 2022-04-21 16:45:52 +10:00
xfs_trans.h xfs: intent item whiteouts 2022-05-04 11:50:29 +10:00
xfs_xattr.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00
xfs.h xfs: remove b_last_holder & associated macros 2018-08-12 08:37:31 -07:00