android_kernel_asus_sm8350/fs/btrfs
Filipe Manana bd3599a0e1 Btrfs: fix file data corruption after cloning a range and fsync
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).

The following example, turned into a test for fstests, reproduces the
issue:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
  $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar

  $ xfs_io -c "fsync" /mnt/bar
  # reflink destination offset corresponds to the size of file bar,
  # 2728Kb minus 4Kb.
  $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  $ md5sum /mnt/bar
  95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar

  <power fail>

  $ mount /dev/sdb /mnt
  $ md5sum /mnt/bar
  207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
  # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
  # power failure

In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.

Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).

CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-19 15:36:31 +02:00
..
tests btrfs: tests: drop newline from test_msg strings 2018-05-29 18:12:52 +02:00
acl.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
async-thread.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
async-thread.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
backref.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
backref.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
btrfs_inode.h Btrfs: renumber BTRFS_INODE_ runtime flags and switch to enums 2018-05-28 18:23:59 +02:00
check-integrity.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
check-integrity.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
compression.c btrfs: replace waitqueue_actvie with cond_wake_up 2018-05-28 18:23:09 +02:00
compression.h btrfs: compression: Add linux/sizes.h for compression.h 2018-05-29 18:13:00 +02:00
ctree.c Btrfs: remove unused check of skip_locking 2018-05-30 16:46:52 +02:00
ctree.h btrfs: change return type of btrfs_page_mkwrite to vm_fault_t 2018-06-07 17:27:45 +02:00
dedupe.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
delayed-inode.c btrfs: replace waitqueue_actvie with cond_wake_up 2018-05-28 18:23:09 +02:00
delayed-inode.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
delayed-ref.c btrfs: split delayed ref head initialization and addition 2018-05-28 18:07:32 +02:00
delayed-ref.h btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs 2018-05-28 18:07:20 +02:00
dev-replace.c btrfs: drop uuid_mutex in btrfs_dev_replace_finishing 2018-05-28 18:23:16 +02:00
dev-replace.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
dir-item.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
disk-io.c Btrfs: add parent_transid parameter to veirfy_level_key 2018-05-30 16:46:43 +02:00
disk-io.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
export.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
export.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
extent_io.c Btrfs: fix file data corruption after cloning a range and fsync 2018-07-19 15:36:31 +02:00
extent_io.h btrfs: Remove tree argument from extent_writepages 2018-05-28 18:07:20 +02:00
extent_map.c btrfs: use fs_info for btrfs_handle_em_exist tracepoint 2018-05-28 18:07:17 +02:00
extent_map.h btrfs: use fs_info for btrfs_handle_em_exist tracepoint 2018-05-28 18:07:17 +02:00
extent-tree.c btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_exist 2018-05-30 17:33:58 +02:00
file-item.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
file.c btrfs: Fix wrong btrfs_delalloc_release_extents parameter 2018-04-18 16:46:57 +02:00
free-space-cache.c Btrfs: stop creating orphan items for truncate 2018-05-28 18:23:46 +02:00
free-space-cache.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
free-space-tree.c btrfs: Remove fs_info argument from populate_free_space_tree 2018-05-28 18:07:36 +02:00
free-space-tree.h btrfs: Remove fs_info argument from add_to_free_space_tree 2018-05-28 18:07:36 +02:00
inode-item.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
inode-map.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
inode-map.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
inode.c Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion 2018-06-28 11:30:50 +02:00
ioctl.c btrfs: fix use-after-free of cmp workspace pages 2018-07-13 17:31:35 +02:00
Kconfig btrfs: add SPDX header to Kconfig 2018-04-12 16:29:55 +02:00
locking.c btrfs: replace waitqueue_actvie with cond_wake_up 2018-05-28 18:23:09 +02:00
locking.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
lzo.c btrfs: lzo: Harden inline lzo compressed extent decompression 2018-05-30 16:46:43 +02:00
Makefile btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
math.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
ordered-data.c btrfs: replace waitqueue_actvie with cond_wake_up 2018-05-28 18:23:09 +02:00
ordered-data.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
orphan.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
print-tree.c btrfs: print-tree: Add eb locking status output for debug build 2018-05-28 18:07:26 +02:00
print-tree.h btrfs: print-tree: debugging output enhancement 2018-04-20 19:18:16 +02:00
props.c btrfs: property: Set incompat flag if lzo/zstd compression is set 2018-05-17 14:18:25 +02:00
props.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
qgroup.c Btrfs: fix mount failure when qgroup rescan is in progress 2018-06-28 11:30:57 +02:00
qgroup.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
raid56.c btrfs: raid56: Remove VLA usage 2018-05-30 17:15:43 +02:00
raid56.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
rcu-string.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
reada.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ref-verify.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ref-verify.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
relocation.c btrfs: fix describe_relocation when printing unknown flags 2018-05-28 18:24:11 +02:00
root-tree.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
scrub.c btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() 2018-07-17 13:56:30 +02:00
send.c btrfs: incremental send, improve rmdir performance for large directory 2018-05-28 18:07:32 +02:00
send.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
struct-funcs.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
super.c Btrfs: allow empty subvol= again 2018-05-28 18:24:13 +02:00
sysfs.c btrfs: sysfs: Add entry which shows if rmdir can work on subvolumes 2018-05-28 18:23:41 +02:00
sysfs.h btrfs: sysfs: Use enum/define value for feature array definitions 2018-05-28 18:23:39 +02:00
transaction.c btrfs: Remove fs_info argument from btrfs_uuid_tree_add 2018-05-30 16:46:52 +02:00
transaction.h btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshot 2018-05-30 16:46:54 +02:00
tree-checker.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
tree-checker.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
tree-defrag.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
tree-log.c btrfs: replace waitqueue_actvie with cond_wake_up 2018-05-28 18:23:09 +02:00
tree-log.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
ulist.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ulist.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
uuid-tree.c btrfs: Remove fs_info argument from btrfs_uuid_tree_rem 2018-05-30 16:46:53 +02:00
volumes.c btrfs: restore uuid_mutex in btrfs_open_devices 2018-07-13 14:55:46 +02:00
volumes.h btrfs: remove redundant btrfs_balance_control::fs_info 2018-05-28 18:07:30 +02:00
xattr.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
xattr.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
zlib.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
zstd.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00