Merge 96485e4462
("Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4") into android-mainline
Steps on the way to 5.10-rc1 Resolves conflicts in: fs/ext4/dir.c fs/ext4/ext4.h fs/ext4/namei.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I397787046920175eb183fa5342d2923f819bb2f4
This commit is contained in:
commit
862e667eb7
@ -28,6 +28,17 @@ metadata are written to disk through the journal. This is slower but
|
||||
safest. If ``data=writeback``, dirty data blocks are not flushed to the
|
||||
disk before the metadata are written to disk through the journal.
|
||||
|
||||
In case of ``data=ordered`` mode, Ext4 also supports fast commits which
|
||||
help reduce commit latency significantly. The default ``data=ordered``
|
||||
mode works by logging metadata blocks to the journal. In fast commit
|
||||
mode, Ext4 only stores the minimal delta needed to recreate the
|
||||
affected metadata in fast commit space that is shared with JBD2.
|
||||
Once the fast commit area fills in or if fast commit is not possible
|
||||
or if JBD2 commit timer goes off, Ext4 performs a traditional full commit.
|
||||
A full commit invalidates all the fast commits that happened before
|
||||
it and thus it makes the fast commit area empty for further fast
|
||||
commits. This feature needs to be enabled at mkfs time.
|
||||
|
||||
The journal inode is typically inode 8. The first 68 bytes of the
|
||||
journal inode are replicated in the ext4 superblock. The journal itself
|
||||
is normal (but hidden) file within the filesystem. The file usually
|
||||
@ -609,3 +620,58 @@ bytes long (but uses a full block):
|
||||
- h\_commit\_nsec
|
||||
- Nanoseconds component of the above timestamp.
|
||||
|
||||
Fast commits
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Fast commit area is organized as a log of tag length values. Each TLV has
|
||||
a ``struct ext4_fc_tl`` in the beginning which stores the tag and the length
|
||||
of the entire field. It is followed by variable length tag specific value.
|
||||
Here is the list of supported tags and their meanings:
|
||||
|
||||
.. list-table::
|
||||
:widths: 8 20 20 32
|
||||
:header-rows: 1
|
||||
|
||||
* - Tag
|
||||
- Meaning
|
||||
- Value struct
|
||||
- Description
|
||||
* - EXT4_FC_TAG_HEAD
|
||||
- Fast commit area header
|
||||
- ``struct ext4_fc_head``
|
||||
- Stores the TID of the transaction after which these fast commits should
|
||||
be applied.
|
||||
* - EXT4_FC_TAG_ADD_RANGE
|
||||
- Add extent to inode
|
||||
- ``struct ext4_fc_add_range``
|
||||
- Stores the inode number and extent to be added in this inode
|
||||
* - EXT4_FC_TAG_DEL_RANGE
|
||||
- Remove logical offsets to inode
|
||||
- ``struct ext4_fc_del_range``
|
||||
- Stores the inode number and the logical offset range that needs to be
|
||||
removed
|
||||
* - EXT4_FC_TAG_CREAT
|
||||
- Create directory entry for a newly created file
|
||||
- ``struct ext4_fc_dentry_info``
|
||||
- Stores the parent inode number, inode number and directory entry of the
|
||||
newly created file
|
||||
* - EXT4_FC_TAG_LINK
|
||||
- Link a directory entry to an inode
|
||||
- ``struct ext4_fc_dentry_info``
|
||||
- Stores the parent inode number, inode number and directory entry
|
||||
* - EXT4_FC_TAG_UNLINK
|
||||
- Unlink a directory entry of an inode
|
||||
- ``struct ext4_fc_dentry_info``
|
||||
- Stores the parent inode number, inode number and directory entry
|
||||
|
||||
* - EXT4_FC_TAG_PAD
|
||||
- Padding (unused area)
|
||||
- None
|
||||
- Unused bytes in the fast commit area.
|
||||
|
||||
* - EXT4_FC_TAG_TAIL
|
||||
- Mark the end of a fast commit
|
||||
- ``struct ext4_fc_tail``
|
||||
- Stores the TID of the commit, CRC of the fast commit of which this tag
|
||||
represents the end of
|
||||
|
||||
|
@ -132,6 +132,39 @@ The opportunities for abuse and DOS attacks with this should be obvious,
|
||||
if you allow unprivileged userspace to trigger codepaths containing
|
||||
these calls.
|
||||
|
||||
Fast commits
|
||||
~~~~~~~~~~~~
|
||||
|
||||
JBD2 to also allows you to perform file-system specific delta commits known as
|
||||
fast commits. In order to use fast commits, you first need to call
|
||||
:c:func:`jbd2_fc_init` and tell how many blocks at the end of journal
|
||||
area should be reserved for fast commits. Along with that, you will also need
|
||||
to set following callbacks that perform correspodning work:
|
||||
|
||||
`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and
|
||||
fast commit.
|
||||
|
||||
`journal->j_fc_replay_cb`: Replay function called for replay of fast commit
|
||||
blocks.
|
||||
|
||||
File system is free to perform fast commits as and when it wants as long as it
|
||||
gets permission from JBD2 to do so by calling the function
|
||||
:c:func:`jbd2_fc_begin_commit()`. Once a fast commit is done, the client
|
||||
file system should tell JBD2 about it by calling
|
||||
:c:func:`jbd2_fc_end_commit()`. If file system wants JBD2 to perform a full
|
||||
commit immediately after stopping the fast commit it can do so by calling
|
||||
:c:func:`jbd2_fc_end_commit_fallback()`. This is useful if fast commit operation
|
||||
fails for some reason and the only way to guarantee consistency is for JBD2 to
|
||||
perform the full traditional commit.
|
||||
|
||||
JBD2 helper functions to manage fast commit buffers. File system can use
|
||||
:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate
|
||||
and wait on IO completion of fast commit buffers.
|
||||
|
||||
Currently, only Ext4 implements fast commits. For details of its implementation
|
||||
of fast commits, please refer to the top level comments in
|
||||
fs/ext4/fast_commit.c.
|
||||
|
||||
Summary
|
||||
~~~~~~~
|
||||
|
||||
|
@ -10,7 +10,7 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
|
||||
indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
|
||||
mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
|
||||
super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \
|
||||
xattr_user.o
|
||||
xattr_user.o fast_commit.o
|
||||
|
||||
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
|
||||
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
|
||||
|
@ -242,6 +242,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
|
||||
handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
|
||||
if (IS_ERR(handle))
|
||||
return PTR_ERR(handle);
|
||||
ext4_fc_start_update(inode);
|
||||
|
||||
if ((type == ACL_TYPE_ACCESS) && acl) {
|
||||
error = posix_acl_update_mode(inode, &mode, &acl);
|
||||
@ -259,6 +260,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
|
||||
}
|
||||
out_stop:
|
||||
ext4_journal_stop(handle);
|
||||
ext4_fc_stop_update(inode);
|
||||
if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
|
||||
goto retry;
|
||||
return error;
|
||||
|
@ -368,7 +368,12 @@ static int ext4_validate_block_bitmap(struct super_block *sb,
|
||||
struct buffer_head *bh)
|
||||
{
|
||||
ext4_fsblk_t blk;
|
||||
struct ext4_group_info *grp = ext4_get_group_info(sb, block_group);
|
||||
struct ext4_group_info *grp;
|
||||
|
||||
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return 0;
|
||||
|
||||
grp = ext4_get_group_info(sb, block_group);
|
||||
|
||||
if (buffer_verified(bh))
|
||||
return 0;
|
||||
@ -495,10 +500,9 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
|
||||
*/
|
||||
set_buffer_new(bh);
|
||||
trace_ext4_read_block_bitmap_load(sb, block_group, ignore_locked);
|
||||
bh->b_end_io = ext4_end_bitmap_read;
|
||||
get_bh(bh);
|
||||
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO |
|
||||
(ignore_locked ? REQ_RAHEAD : 0), bh);
|
||||
ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO |
|
||||
(ignore_locked ? REQ_RAHEAD : 0),
|
||||
ext4_end_bitmap_read);
|
||||
return bh;
|
||||
verify:
|
||||
err = ext4_validate_block_bitmap(sb, desc, block_group, bh);
|
||||
|
@ -131,7 +131,7 @@ static void debug_print_tree(struct ext4_sb_info *sbi)
|
||||
|
||||
printk(KERN_INFO "System zones: ");
|
||||
rcu_read_lock();
|
||||
system_blks = rcu_dereference(sbi->system_blks);
|
||||
system_blks = rcu_dereference(sbi->s_system_blks);
|
||||
node = rb_first(&system_blks->root);
|
||||
while (node) {
|
||||
entry = rb_entry(node, struct ext4_system_zone, node);
|
||||
@ -261,7 +261,7 @@ int ext4_setup_system_zone(struct super_block *sb)
|
||||
* with ext4_data_block_valid() accessing the rbtree at the same
|
||||
* time.
|
||||
*/
|
||||
rcu_assign_pointer(sbi->system_blks, system_blks);
|
||||
rcu_assign_pointer(sbi->s_system_blks, system_blks);
|
||||
|
||||
if (test_opt(sb, DEBUG))
|
||||
debug_print_tree(sbi);
|
||||
@ -286,9 +286,9 @@ void ext4_release_system_zone(struct super_block *sb)
|
||||
{
|
||||
struct ext4_system_blocks *system_blks;
|
||||
|
||||
system_blks = rcu_dereference_protected(EXT4_SB(sb)->system_blks,
|
||||
system_blks = rcu_dereference_protected(EXT4_SB(sb)->s_system_blks,
|
||||
lockdep_is_held(&sb->s_umount));
|
||||
rcu_assign_pointer(EXT4_SB(sb)->system_blks, NULL);
|
||||
rcu_assign_pointer(EXT4_SB(sb)->s_system_blks, NULL);
|
||||
|
||||
if (system_blks)
|
||||
call_rcu(&system_blks->rcu, ext4_destroy_system_zone);
|
||||
@ -319,7 +319,7 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
|
||||
* mount option.
|
||||
*/
|
||||
rcu_read_lock();
|
||||
system_blks = rcu_dereference(sbi->system_blks);
|
||||
system_blks = rcu_dereference(sbi->s_system_blks);
|
||||
if (system_blks == NULL)
|
||||
goto out_rcu;
|
||||
|
||||
|
136
fs/ext4/ext4.h
136
fs/ext4/ext4.h
@ -27,7 +27,6 @@
|
||||
#include <linux/seqlock.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/timer.h>
|
||||
#include <linux/version.h>
|
||||
#include <linux/wait.h>
|
||||
#include <linux/sched/signal.h>
|
||||
#include <linux/blockgroup_lock.h>
|
||||
@ -492,7 +491,7 @@ struct flex_groups {
|
||||
|
||||
/* Flags which are mutually exclusive to DAX */
|
||||
#define EXT4_DAX_MUT_EXCL (EXT4_VERITY_FL | EXT4_ENCRYPT_FL |\
|
||||
EXT4_JOURNAL_DATA_FL)
|
||||
EXT4_JOURNAL_DATA_FL | EXT4_INLINE_DATA_FL)
|
||||
|
||||
/* Mask out flags that are inappropriate for the given type of inode. */
|
||||
static inline __u32 ext4_mask_flags(umode_t mode, __u32 flags)
|
||||
@ -964,6 +963,7 @@ do { \
|
||||
#endif /* defined(__KERNEL__) || defined(__linux__) */
|
||||
|
||||
#include "extents_status.h"
|
||||
#include "fast_commit.h"
|
||||
|
||||
/*
|
||||
* Lock subclasses for i_data_sem in the ext4_inode_info structure.
|
||||
@ -1021,6 +1021,31 @@ struct ext4_inode_info {
|
||||
|
||||
struct list_head i_orphan; /* unlinked but open inodes */
|
||||
|
||||
/* Fast commit related info */
|
||||
|
||||
struct list_head i_fc_list; /*
|
||||
* inodes that need fast commit
|
||||
* protected by sbi->s_fc_lock.
|
||||
*/
|
||||
|
||||
/* Fast commit subtid when this inode was committed */
|
||||
unsigned int i_fc_committed_subtid;
|
||||
|
||||
/* Start of lblk range that needs to be committed in this fast commit */
|
||||
ext4_lblk_t i_fc_lblk_start;
|
||||
|
||||
/* End of lblk range that needs to be committed in this fast commit */
|
||||
ext4_lblk_t i_fc_lblk_len;
|
||||
|
||||
/* Number of ongoing updates on this inode */
|
||||
atomic_t i_fc_updates;
|
||||
|
||||
/* Fast commit wait queue for this inode */
|
||||
wait_queue_head_t i_fc_wait;
|
||||
|
||||
/* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
|
||||
struct mutex i_fc_lock;
|
||||
|
||||
/*
|
||||
* i_disksize keeps track of what the inode size is ON DISK, not
|
||||
* in memory. During truncate, i_size is set to the new size by
|
||||
@ -1141,6 +1166,11 @@ struct ext4_inode_info {
|
||||
#define EXT4_VALID_FS 0x0001 /* Unmounted cleanly */
|
||||
#define EXT4_ERROR_FS 0x0002 /* Errors detected */
|
||||
#define EXT4_ORPHAN_FS 0x0004 /* Orphans being recovered */
|
||||
#define EXT4_FC_INELIGIBLE 0x0008 /* Fast commit ineligible */
|
||||
#define EXT4_FC_COMMITTING 0x0010 /* File system underoing a fast
|
||||
* commit.
|
||||
*/
|
||||
#define EXT4_FC_REPLAY 0x0020 /* Fast commit replay ongoing */
|
||||
|
||||
/*
|
||||
* Misc. filesystem flags
|
||||
@ -1214,6 +1244,8 @@ struct ext4_inode_info {
|
||||
#define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
|
||||
specified journal checksum */
|
||||
|
||||
#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */
|
||||
|
||||
#define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &= \
|
||||
~EXT4_MOUNT_##opt
|
||||
#define set_opt(sb, opt) EXT4_SB(sb)->s_mount_opt |= \
|
||||
@ -1469,14 +1501,14 @@ struct ext4_sb_info {
|
||||
unsigned long s_commit_interval;
|
||||
u32 s_max_batch_time;
|
||||
u32 s_min_batch_time;
|
||||
struct block_device *journal_bdev;
|
||||
struct block_device *s_journal_bdev;
|
||||
#ifdef CONFIG_QUOTA
|
||||
/* Names of quota files with journalled quota */
|
||||
char __rcu *s_qf_names[EXT4_MAXQUOTAS];
|
||||
int s_jquota_fmt; /* Format of quota to use */
|
||||
#endif
|
||||
unsigned int s_want_extra_isize; /* New inodes should reserve # bytes */
|
||||
struct ext4_system_blocks __rcu *system_blks;
|
||||
struct ext4_system_blocks __rcu *s_system_blks;
|
||||
|
||||
#ifdef EXTENTS_STATS
|
||||
/* ext4 extents stats */
|
||||
@ -1599,6 +1631,34 @@ struct ext4_sb_info {
|
||||
/* Record the errseq of the backing block device */
|
||||
errseq_t s_bdev_wb_err;
|
||||
spinlock_t s_bdev_wb_lock;
|
||||
|
||||
/* Ext4 fast commit stuff */
|
||||
atomic_t s_fc_subtid;
|
||||
atomic_t s_fc_ineligible_updates;
|
||||
/*
|
||||
* After commit starts, the main queue gets locked, and the further
|
||||
* updates get added in the staging queue.
|
||||
*/
|
||||
#define FC_Q_MAIN 0
|
||||
#define FC_Q_STAGING 1
|
||||
struct list_head s_fc_q[2]; /* Inodes staged for fast commit
|
||||
* that have data changes in them.
|
||||
*/
|
||||
struct list_head s_fc_dentry_q[2]; /* directory entry updates */
|
||||
unsigned int s_fc_bytes;
|
||||
/*
|
||||
* Main fast commit lock. This lock protects accesses to the
|
||||
* following fields:
|
||||
* ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh.
|
||||
*/
|
||||
spinlock_t s_fc_lock;
|
||||
struct buffer_head *s_fc_bh;
|
||||
struct ext4_fc_stats s_fc_stats;
|
||||
u64 s_fc_avg_commit_time;
|
||||
#ifdef CONFIG_EXT4_DEBUG
|
||||
int s_fc_debug_max_replay;
|
||||
#endif
|
||||
struct ext4_fc_replay_state s_fc_replay_state;
|
||||
};
|
||||
|
||||
static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
|
||||
@ -1709,6 +1769,7 @@ enum {
|
||||
EXT4_STATE_EXT_PRECACHED, /* extents have been precached */
|
||||
EXT4_STATE_LUSTRE_EA_INODE, /* Lustre-style ea_inode */
|
||||
EXT4_STATE_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
|
||||
EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */
|
||||
};
|
||||
|
||||
#define EXT4_INODE_BIT_FNS(name, field, offset) \
|
||||
@ -1802,6 +1863,7 @@ static inline bool ext4_verity_in_progress(struct inode *inode)
|
||||
#define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010
|
||||
#define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020
|
||||
#define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
|
||||
#define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400
|
||||
#define EXT4_FEATURE_COMPAT_STABLE_INODES 0x0800
|
||||
|
||||
#define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001
|
||||
@ -1904,6 +1966,7 @@ EXT4_FEATURE_COMPAT_FUNCS(xattr, EXT_ATTR)
|
||||
EXT4_FEATURE_COMPAT_FUNCS(resize_inode, RESIZE_INODE)
|
||||
EXT4_FEATURE_COMPAT_FUNCS(dir_index, DIR_INDEX)
|
||||
EXT4_FEATURE_COMPAT_FUNCS(sparse_super2, SPARSE_SUPER2)
|
||||
EXT4_FEATURE_COMPAT_FUNCS(fast_commit, FAST_COMMIT)
|
||||
EXT4_FEATURE_COMPAT_FUNCS(stable_inodes, STABLE_INODES)
|
||||
|
||||
EXT4_FEATURE_RO_COMPAT_FUNCS(sparse_super, SPARSE_SUPER)
|
||||
@ -2683,6 +2746,7 @@ extern int ext4fs_dirhash(const struct inode *dir, const char *name, int len,
|
||||
struct dx_hash_info *hinfo);
|
||||
|
||||
/* ialloc.c */
|
||||
extern int ext4_mark_inode_used(struct super_block *sb, int ino);
|
||||
extern struct inode *__ext4_new_inode(handle_t *, struct inode *, umode_t,
|
||||
const struct qstr *qstr, __u32 goal,
|
||||
uid_t *owner, __u32 i_flags,
|
||||
@ -2708,6 +2772,27 @@ extern int ext4_init_inode_table(struct super_block *sb,
|
||||
ext4_group_t group, int barrier);
|
||||
extern void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate);
|
||||
|
||||
/* fast_commit.c */
|
||||
int ext4_fc_info_show(struct seq_file *seq, void *v);
|
||||
void ext4_fc_init(struct super_block *sb, journal_t *journal);
|
||||
void ext4_fc_init_inode(struct inode *inode);
|
||||
void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start,
|
||||
ext4_lblk_t end);
|
||||
void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry);
|
||||
void ext4_fc_track_link(struct inode *inode, struct dentry *dentry);
|
||||
void ext4_fc_track_create(struct inode *inode, struct dentry *dentry);
|
||||
void ext4_fc_track_inode(struct inode *inode);
|
||||
void ext4_fc_mark_ineligible(struct super_block *sb, int reason);
|
||||
void ext4_fc_start_ineligible(struct super_block *sb, int reason);
|
||||
void ext4_fc_stop_ineligible(struct super_block *sb);
|
||||
void ext4_fc_start_update(struct inode *inode);
|
||||
void ext4_fc_stop_update(struct inode *inode);
|
||||
void ext4_fc_del(struct inode *inode);
|
||||
bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t block);
|
||||
void ext4_fc_replay_cleanup(struct super_block *sb);
|
||||
int ext4_fc_commit(journal_t *journal, tid_t commit_tid);
|
||||
int __init ext4_fc_init_dentry_cache(void);
|
||||
|
||||
/* mballoc.c */
|
||||
extern const struct seq_operations ext4_mb_seq_groups_ops;
|
||||
extern long ext4_mb_stats;
|
||||
@ -2737,8 +2822,12 @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
|
||||
ext4_fsblk_t block, unsigned long count);
|
||||
extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);
|
||||
extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid);
|
||||
extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
|
||||
int len, int state);
|
||||
|
||||
/* inode.c */
|
||||
void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
|
||||
struct ext4_inode_info *ei);
|
||||
int ext4_inode_is_fast_symlink(struct inode *inode);
|
||||
struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int);
|
||||
struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int);
|
||||
@ -2785,6 +2874,8 @@ extern int ext4_sync_inode(handle_t *, struct inode *);
|
||||
extern void ext4_dirty_inode(struct inode *, int);
|
||||
extern int ext4_change_inode_journal_flag(struct inode *, int);
|
||||
extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
|
||||
extern int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
|
||||
struct ext4_iloc *iloc);
|
||||
extern int ext4_inode_attach_jinode(struct inode *inode);
|
||||
extern int ext4_can_truncate(struct inode *inode);
|
||||
extern int ext4_truncate(struct inode *);
|
||||
@ -2818,12 +2909,15 @@ extern int ext4_ind_remove_space(handle_t *handle, struct inode *inode,
|
||||
/* ioctl.c */
|
||||
extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
|
||||
extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);
|
||||
extern void ext4_reset_inode_seed(struct inode *inode);
|
||||
|
||||
/* migrate.c */
|
||||
extern int ext4_ext_migrate(struct inode *);
|
||||
extern int ext4_ind_migrate(struct inode *inode);
|
||||
|
||||
/* namei.c */
|
||||
extern int ext4_init_new_dir(handle_t *handle, struct inode *dir,
|
||||
struct inode *inode);
|
||||
extern int ext4_dirblock_csum_verify(struct inode *inode,
|
||||
struct buffer_head *bh);
|
||||
extern int ext4_orphan_add(handle_t *, struct inode *);
|
||||
@ -2858,6 +2952,14 @@ extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);
|
||||
/* super.c */
|
||||
extern struct buffer_head *ext4_sb_bread(struct super_block *sb,
|
||||
sector_t block, int op_flags);
|
||||
extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
|
||||
sector_t block);
|
||||
extern void ext4_read_bh_nowait(struct buffer_head *bh, int op_flags,
|
||||
bh_end_io_t *end_io);
|
||||
extern int ext4_read_bh(struct buffer_head *bh, int op_flags,
|
||||
bh_end_io_t *end_io);
|
||||
extern int ext4_read_bh_lock(struct buffer_head *bh, int op_flags, bool wait);
|
||||
extern void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block);
|
||||
extern int ext4_seq_options_show(struct seq_file *seq, void *offset);
|
||||
extern int ext4_calculate_overhead(struct super_block *sb);
|
||||
extern void ext4_superblock_csum_set(struct super_block *sb);
|
||||
@ -3046,22 +3148,24 @@ static inline int ext4_has_group_desc_csum(struct super_block *sb)
|
||||
return ext4_has_feature_gdt_csum(sb) || ext4_has_metadata_csum(sb);
|
||||
}
|
||||
|
||||
#define ext4_read_incompat_64bit_val(es, name) \
|
||||
(((es)->s_feature_incompat & cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT) \
|
||||
? (ext4_fsblk_t)le32_to_cpu(es->name##_hi) << 32 : 0) | \
|
||||
le32_to_cpu(es->name##_lo))
|
||||
|
||||
static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
|
||||
{
|
||||
return ((ext4_fsblk_t)le32_to_cpu(es->s_blocks_count_hi) << 32) |
|
||||
le32_to_cpu(es->s_blocks_count_lo);
|
||||
return ext4_read_incompat_64bit_val(es, s_blocks_count);
|
||||
}
|
||||
|
||||
static inline ext4_fsblk_t ext4_r_blocks_count(struct ext4_super_block *es)
|
||||
{
|
||||
return ((ext4_fsblk_t)le32_to_cpu(es->s_r_blocks_count_hi) << 32) |
|
||||
le32_to_cpu(es->s_r_blocks_count_lo);
|
||||
return ext4_read_incompat_64bit_val(es, s_r_blocks_count);
|
||||
}
|
||||
|
||||
static inline ext4_fsblk_t ext4_free_blocks_count(struct ext4_super_block *es)
|
||||
{
|
||||
return ((ext4_fsblk_t)le32_to_cpu(es->s_free_blocks_count_hi) << 32) |
|
||||
le32_to_cpu(es->s_free_blocks_count_lo);
|
||||
return ext4_read_incompat_64bit_val(es, s_free_blocks_count);
|
||||
}
|
||||
|
||||
static inline void ext4_blocks_count_set(struct ext4_super_block *es,
|
||||
@ -3187,6 +3291,9 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
|
||||
|
||||
struct ext4_group_info {
|
||||
unsigned long bb_state;
|
||||
#ifdef AGGRESSIVE_CHECK
|
||||
unsigned long bb_check_counter;
|
||||
#endif
|
||||
struct rb_root bb_free_root;
|
||||
ext4_grpblk_t bb_first_free; /* first free block */
|
||||
ext4_grpblk_t bb_free; /* total free blocks */
|
||||
@ -3388,6 +3495,10 @@ extern void ext4_initialize_dirent_tail(struct buffer_head *bh,
|
||||
unsigned int blocksize);
|
||||
extern int ext4_handle_dirty_dirblock(handle_t *handle, struct inode *inode,
|
||||
struct buffer_head *bh);
|
||||
extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
|
||||
struct inode *inode);
|
||||
extern int __ext4_link(struct inode *dir, struct inode *inode,
|
||||
struct dentry *dentry);
|
||||
|
||||
#define S_SHIFT 12
|
||||
static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
|
||||
@ -3488,6 +3599,11 @@ extern int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu);
|
||||
extern int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode,
|
||||
int check_cred, int restart_cred,
|
||||
int revoke_cred);
|
||||
extern void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end);
|
||||
extern int ext4_ext_replay_set_iblocks(struct inode *inode);
|
||||
extern int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
|
||||
int len, int unwritten, ext4_fsblk_t pblk);
|
||||
extern int ext4_ext_clear_bb(struct inode *inode);
|
||||
|
||||
|
||||
/* move_extent.c */
|
||||
|
@ -100,7 +100,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
|
||||
return ERR_PTR(err);
|
||||
|
||||
journal = EXT4_SB(sb)->s_journal;
|
||||
if (!journal)
|
||||
if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
|
||||
return ext4_get_nojournal();
|
||||
return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
|
||||
GFP_NOFS, type, line);
|
||||
|
@ -501,7 +501,7 @@ __read_extent_tree_block(const char *function, unsigned int line,
|
||||
|
||||
if (!bh_uptodate_or_lock(bh)) {
|
||||
trace_ext4_ext_load_extent(inode, pblk, _RET_IP_);
|
||||
err = bh_submit_read(bh);
|
||||
err = ext4_read_bh(bh, 0, NULL);
|
||||
if (err < 0)
|
||||
goto errout;
|
||||
}
|
||||
@ -3723,6 +3723,7 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
|
||||
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
|
||||
out:
|
||||
ext4_ext_show_leaf(inode, path);
|
||||
ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1);
|
||||
return err;
|
||||
}
|
||||
|
||||
@ -3794,6 +3795,7 @@ convert_initialized_extent(handle_t *handle, struct inode *inode,
|
||||
if (*allocated > map->m_len)
|
||||
*allocated = map->m_len;
|
||||
map->m_len = *allocated;
|
||||
ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -4023,7 +4025,7 @@ static int get_implied_cluster_alloc(struct super_block *sb,
|
||||
* down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
|
||||
* (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
|
||||
*
|
||||
* return > 0, number of of blocks already mapped/allocated
|
||||
* return > 0, number of blocks already mapped/allocated
|
||||
* if create == 0 and these are pre-allocated blocks
|
||||
* buffer head is unmapped
|
||||
* otherwise blocks are mapped
|
||||
@ -4327,7 +4329,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
|
||||
map->m_len = ar.len;
|
||||
allocated = map->m_len;
|
||||
ext4_ext_show_leaf(inode, path);
|
||||
|
||||
ext4_fc_track_range(inode, map->m_lblk, map->m_lblk + map->m_len - 1);
|
||||
out:
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
@ -4600,7 +4602,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
|
||||
ret = ext4_mark_inode_dirty(handle, inode);
|
||||
if (unlikely(ret))
|
||||
goto out_handle;
|
||||
|
||||
ext4_fc_track_range(inode, offset >> inode->i_sb->s_blocksize_bits,
|
||||
(offset + len - 1) >> inode->i_sb->s_blocksize_bits);
|
||||
/* Zero out partial block at the edges of the range */
|
||||
ret = ext4_zero_partial_blocks(handle, inode, offset, len);
|
||||
if (ret >= 0)
|
||||
@ -4648,23 +4651,34 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
|
||||
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
|
||||
FALLOC_FL_INSERT_RANGE))
|
||||
return -EOPNOTSUPP;
|
||||
ext4_fc_track_range(inode, offset >> blkbits,
|
||||
(offset + len - 1) >> blkbits);
|
||||
|
||||
if (mode & FALLOC_FL_PUNCH_HOLE)
|
||||
return ext4_punch_hole(inode, offset, len);
|
||||
ext4_fc_start_update(inode);
|
||||
|
||||
if (mode & FALLOC_FL_PUNCH_HOLE) {
|
||||
ret = ext4_punch_hole(inode, offset, len);
|
||||
goto exit;
|
||||
}
|
||||
|
||||
ret = ext4_convert_inline_data(inode);
|
||||
if (ret)
|
||||
return ret;
|
||||
goto exit;
|
||||
|
||||
if (mode & FALLOC_FL_COLLAPSE_RANGE)
|
||||
return ext4_collapse_range(inode, offset, len);
|
||||
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
|
||||
ret = ext4_collapse_range(inode, offset, len);
|
||||
goto exit;
|
||||
}
|
||||
|
||||
if (mode & FALLOC_FL_INSERT_RANGE)
|
||||
return ext4_insert_range(inode, offset, len);
|
||||
|
||||
if (mode & FALLOC_FL_ZERO_RANGE)
|
||||
return ext4_zero_range(file, offset, len, mode);
|
||||
if (mode & FALLOC_FL_INSERT_RANGE) {
|
||||
ret = ext4_insert_range(inode, offset, len);
|
||||
goto exit;
|
||||
}
|
||||
|
||||
if (mode & FALLOC_FL_ZERO_RANGE) {
|
||||
ret = ext4_zero_range(file, offset, len, mode);
|
||||
goto exit;
|
||||
}
|
||||
trace_ext4_fallocate_enter(inode, offset, len, mode);
|
||||
lblk = offset >> blkbits;
|
||||
|
||||
@ -4698,12 +4712,14 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
|
||||
goto out;
|
||||
|
||||
if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
|
||||
ret = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal,
|
||||
EXT4_I(inode)->i_sync_tid);
|
||||
ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
|
||||
EXT4_I(inode)->i_sync_tid);
|
||||
}
|
||||
out:
|
||||
inode_unlock(inode);
|
||||
trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
|
||||
exit:
|
||||
ext4_fc_stop_update(inode);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -4769,7 +4785,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
|
||||
|
||||
int ext4_convert_unwritten_io_end_vec(handle_t *handle, ext4_io_end_t *io_end)
|
||||
{
|
||||
int ret, err = 0;
|
||||
int ret = 0, err = 0;
|
||||
struct ext4_io_end_vec *io_end_vec;
|
||||
|
||||
/*
|
||||
@ -5291,6 +5307,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
|
||||
ret = PTR_ERR(handle);
|
||||
goto out_mmap;
|
||||
}
|
||||
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE);
|
||||
|
||||
down_write(&EXT4_I(inode)->i_data_sem);
|
||||
ext4_discard_preallocations(inode, 0);
|
||||
@ -5329,6 +5346,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
|
||||
|
||||
out_stop:
|
||||
ext4_journal_stop(handle);
|
||||
ext4_fc_stop_ineligible(sb);
|
||||
out_mmap:
|
||||
up_write(&EXT4_I(inode)->i_mmap_sem);
|
||||
out_mutex:
|
||||
@ -5429,6 +5447,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
|
||||
ret = PTR_ERR(handle);
|
||||
goto out_mmap;
|
||||
}
|
||||
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE);
|
||||
|
||||
/* Expand file to avoid data loss if there is error while shifting */
|
||||
inode->i_size += len;
|
||||
@ -5503,6 +5522,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
|
||||
|
||||
out_stop:
|
||||
ext4_journal_stop(handle);
|
||||
ext4_fc_stop_ineligible(sb);
|
||||
out_mmap:
|
||||
up_write(&EXT4_I(inode)->i_mmap_sem);
|
||||
out_mutex:
|
||||
@ -5784,3 +5804,264 @@ int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu)
|
||||
|
||||
return err ? err : mapped;
|
||||
}
|
||||
|
||||
/*
|
||||
* Updates physical block address and unwritten status of extent
|
||||
* starting at lblk start and of len. If such an extent doesn't exist,
|
||||
* this function splits the extent tree appropriately to create an
|
||||
* extent like this. This function is called in the fast commit
|
||||
* replay path. Returns 0 on success and error on failure.
|
||||
*/
|
||||
int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
|
||||
int len, int unwritten, ext4_fsblk_t pblk)
|
||||
{
|
||||
struct ext4_ext_path *path = NULL, *ppath;
|
||||
struct ext4_extent *ex;
|
||||
int ret;
|
||||
|
||||
path = ext4_find_extent(inode, start, NULL, 0);
|
||||
if (!path)
|
||||
return -EINVAL;
|
||||
ex = path[path->p_depth].p_ext;
|
||||
if (!ex) {
|
||||
ret = -EFSCORRUPTED;
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (le32_to_cpu(ex->ee_block) != start ||
|
||||
ext4_ext_get_actual_len(ex) != len) {
|
||||
/* We need to split this extent to match our extent first */
|
||||
ppath = path;
|
||||
down_write(&EXT4_I(inode)->i_data_sem);
|
||||
ret = ext4_force_split_extent_at(NULL, inode, &ppath, start, 1);
|
||||
up_write(&EXT4_I(inode)->i_data_sem);
|
||||
if (ret)
|
||||
goto out;
|
||||
kfree(path);
|
||||
path = ext4_find_extent(inode, start, NULL, 0);
|
||||
if (IS_ERR(path))
|
||||
return -1;
|
||||
ppath = path;
|
||||
ex = path[path->p_depth].p_ext;
|
||||
WARN_ON(le32_to_cpu(ex->ee_block) != start);
|
||||
if (ext4_ext_get_actual_len(ex) != len) {
|
||||
down_write(&EXT4_I(inode)->i_data_sem);
|
||||
ret = ext4_force_split_extent_at(NULL, inode, &ppath,
|
||||
start + len, 1);
|
||||
up_write(&EXT4_I(inode)->i_data_sem);
|
||||
if (ret)
|
||||
goto out;
|
||||
kfree(path);
|
||||
path = ext4_find_extent(inode, start, NULL, 0);
|
||||
if (IS_ERR(path))
|
||||
return -EINVAL;
|
||||
ex = path[path->p_depth].p_ext;
|
||||
}
|
||||
}
|
||||
if (unwritten)
|
||||
ext4_ext_mark_unwritten(ex);
|
||||
else
|
||||
ext4_ext_mark_initialized(ex);
|
||||
ext4_ext_store_pblock(ex, pblk);
|
||||
down_write(&EXT4_I(inode)->i_data_sem);
|
||||
ret = ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
|
||||
up_write(&EXT4_I(inode)->i_data_sem);
|
||||
out:
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
ext4_mark_inode_dirty(NULL, inode);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/* Try to shrink the extent tree */
|
||||
void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end)
|
||||
{
|
||||
struct ext4_ext_path *path = NULL;
|
||||
struct ext4_extent *ex;
|
||||
ext4_lblk_t old_cur, cur = 0;
|
||||
|
||||
while (cur < end) {
|
||||
path = ext4_find_extent(inode, cur, NULL, 0);
|
||||
if (IS_ERR(path))
|
||||
return;
|
||||
ex = path[path->p_depth].p_ext;
|
||||
if (!ex) {
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
ext4_mark_inode_dirty(NULL, inode);
|
||||
return;
|
||||
}
|
||||
old_cur = cur;
|
||||
cur = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
||||
if (cur <= old_cur)
|
||||
cur = old_cur + 1;
|
||||
ext4_ext_try_to_merge(NULL, inode, path, ex);
|
||||
down_write(&EXT4_I(inode)->i_data_sem);
|
||||
ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
|
||||
up_write(&EXT4_I(inode)->i_data_sem);
|
||||
ext4_mark_inode_dirty(NULL, inode);
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
}
|
||||
}
|
||||
|
||||
/* Check if *cur is a hole and if it is, skip it */
|
||||
static void skip_hole(struct inode *inode, ext4_lblk_t *cur)
|
||||
{
|
||||
int ret;
|
||||
struct ext4_map_blocks map;
|
||||
|
||||
map.m_lblk = *cur;
|
||||
map.m_len = ((inode->i_size) >> inode->i_sb->s_blocksize_bits) - *cur;
|
||||
|
||||
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
||||
if (ret != 0)
|
||||
return;
|
||||
*cur = *cur + map.m_len;
|
||||
}
|
||||
|
||||
/* Count number of blocks used by this inode and update i_blocks */
|
||||
int ext4_ext_replay_set_iblocks(struct inode *inode)
|
||||
{
|
||||
struct ext4_ext_path *path = NULL, *path2 = NULL;
|
||||
struct ext4_extent *ex;
|
||||
ext4_lblk_t cur = 0, end;
|
||||
int numblks = 0, i, ret = 0;
|
||||
ext4_fsblk_t cmp1, cmp2;
|
||||
struct ext4_map_blocks map;
|
||||
|
||||
/* Determin the size of the file first */
|
||||
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
|
||||
EXT4_EX_NOCACHE);
|
||||
if (IS_ERR(path))
|
||||
return PTR_ERR(path);
|
||||
ex = path[path->p_depth].p_ext;
|
||||
if (!ex) {
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
goto out;
|
||||
}
|
||||
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
|
||||
/* Count the number of data blocks */
|
||||
cur = 0;
|
||||
while (cur < end) {
|
||||
map.m_lblk = cur;
|
||||
map.m_len = end - cur;
|
||||
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
||||
if (ret < 0)
|
||||
break;
|
||||
if (ret > 0)
|
||||
numblks += ret;
|
||||
cur = cur + map.m_len;
|
||||
}
|
||||
|
||||
/*
|
||||
* Count the number of extent tree blocks. We do it by looking up
|
||||
* two successive extents and determining the difference between
|
||||
* their paths. When path is different for 2 successive extents
|
||||
* we compare the blocks in the path at each level and increment
|
||||
* iblocks by total number of differences found.
|
||||
*/
|
||||
cur = 0;
|
||||
skip_hole(inode, &cur);
|
||||
path = ext4_find_extent(inode, cur, NULL, 0);
|
||||
if (IS_ERR(path))
|
||||
goto out;
|
||||
numblks += path->p_depth;
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
while (cur < end) {
|
||||
path = ext4_find_extent(inode, cur, NULL, 0);
|
||||
if (IS_ERR(path))
|
||||
break;
|
||||
ex = path[path->p_depth].p_ext;
|
||||
if (!ex) {
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
return 0;
|
||||
}
|
||||
cur = max(cur + 1, le32_to_cpu(ex->ee_block) +
|
||||
ext4_ext_get_actual_len(ex));
|
||||
skip_hole(inode, &cur);
|
||||
|
||||
path2 = ext4_find_extent(inode, cur, NULL, 0);
|
||||
if (IS_ERR(path2)) {
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
break;
|
||||
}
|
||||
ex = path2[path2->p_depth].p_ext;
|
||||
for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) {
|
||||
cmp1 = cmp2 = 0;
|
||||
if (i <= path->p_depth)
|
||||
cmp1 = path[i].p_bh ?
|
||||
path[i].p_bh->b_blocknr : 0;
|
||||
if (i <= path2->p_depth)
|
||||
cmp2 = path2[i].p_bh ?
|
||||
path2[i].p_bh->b_blocknr : 0;
|
||||
if (cmp1 != cmp2 && cmp2 != 0)
|
||||
numblks++;
|
||||
}
|
||||
ext4_ext_drop_refs(path);
|
||||
ext4_ext_drop_refs(path2);
|
||||
kfree(path);
|
||||
kfree(path2);
|
||||
}
|
||||
|
||||
out:
|
||||
inode->i_blocks = numblks << (inode->i_sb->s_blocksize_bits - 9);
|
||||
ext4_mark_inode_dirty(NULL, inode);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int ext4_ext_clear_bb(struct inode *inode)
|
||||
{
|
||||
struct ext4_ext_path *path = NULL;
|
||||
struct ext4_extent *ex;
|
||||
ext4_lblk_t cur = 0, end;
|
||||
int j, ret = 0;
|
||||
struct ext4_map_blocks map;
|
||||
|
||||
/* Determin the size of the file first */
|
||||
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
|
||||
EXT4_EX_NOCACHE);
|
||||
if (IS_ERR(path))
|
||||
return PTR_ERR(path);
|
||||
ex = path[path->p_depth].p_ext;
|
||||
if (!ex) {
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
return 0;
|
||||
}
|
||||
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
|
||||
cur = 0;
|
||||
while (cur < end) {
|
||||
map.m_lblk = cur;
|
||||
map.m_len = end - cur;
|
||||
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
||||
if (ret < 0)
|
||||
break;
|
||||
if (ret > 0) {
|
||||
path = ext4_find_extent(inode, map.m_lblk, NULL, 0);
|
||||
if (!IS_ERR_OR_NULL(path)) {
|
||||
for (j = 0; j < path->p_depth; j++) {
|
||||
|
||||
ext4_mb_mark_bb(inode->i_sb,
|
||||
path[j].p_block, 1, 0);
|
||||
}
|
||||
ext4_ext_drop_refs(path);
|
||||
kfree(path);
|
||||
}
|
||||
ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
|
||||
}
|
||||
cur = cur + map.m_len;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -311,6 +311,9 @@ void ext4_es_find_extent_range(struct inode *inode,
|
||||
ext4_lblk_t lblk, ext4_lblk_t end,
|
||||
struct extent_status *es)
|
||||
{
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return;
|
||||
|
||||
trace_ext4_es_find_extent_range_enter(inode, lblk);
|
||||
|
||||
read_lock(&EXT4_I(inode)->i_es_lock);
|
||||
@ -361,6 +364,9 @@ bool ext4_es_scan_range(struct inode *inode,
|
||||
{
|
||||
bool ret;
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return false;
|
||||
|
||||
read_lock(&EXT4_I(inode)->i_es_lock);
|
||||
ret = __es_scan_range(inode, matching_fn, lblk, end);
|
||||
read_unlock(&EXT4_I(inode)->i_es_lock);
|
||||
@ -404,6 +410,9 @@ bool ext4_es_scan_clu(struct inode *inode,
|
||||
{
|
||||
bool ret;
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return false;
|
||||
|
||||
read_lock(&EXT4_I(inode)->i_es_lock);
|
||||
ret = __es_scan_clu(inode, matching_fn, lblk);
|
||||
read_unlock(&EXT4_I(inode)->i_es_lock);
|
||||
@ -812,6 +821,9 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
|
||||
int err = 0;
|
||||
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return 0;
|
||||
|
||||
es_debug("add [%u/%u) %llu %x to extent status tree of inode %lu\n",
|
||||
lblk, len, pblk, status, inode->i_ino);
|
||||
|
||||
@ -873,6 +885,9 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
|
||||
struct extent_status newes;
|
||||
ext4_lblk_t end = lblk + len - 1;
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return;
|
||||
|
||||
newes.es_lblk = lblk;
|
||||
newes.es_len = len;
|
||||
ext4_es_store_pblock_status(&newes, pblk, status);
|
||||
@ -908,6 +923,9 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
|
||||
struct rb_node *node;
|
||||
int found = 0;
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return 0;
|
||||
|
||||
trace_ext4_es_lookup_extent_enter(inode, lblk);
|
||||
es_debug("lookup extent in block %u\n", lblk);
|
||||
|
||||
@ -1419,6 +1437,9 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
|
||||
int err = 0;
|
||||
int reserved = 0;
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return 0;
|
||||
|
||||
trace_ext4_es_remove_extent(inode, lblk, len);
|
||||
es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
|
||||
lblk, len, inode->i_ino);
|
||||
@ -1969,6 +1990,9 @@ int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
|
||||
struct extent_status newes;
|
||||
int err = 0;
|
||||
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return 0;
|
||||
|
||||
es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
|
||||
lblk, inode->i_ino);
|
||||
|
||||
|
2139
fs/ext4/fast_commit.c
Normal file
2139
fs/ext4/fast_commit.c
Normal file
File diff suppressed because it is too large
Load Diff
159
fs/ext4/fast_commit.h
Normal file
159
fs/ext4/fast_commit.h
Normal file
@ -0,0 +1,159 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
|
||||
#ifndef __FAST_COMMIT_H__
|
||||
#define __FAST_COMMIT_H__
|
||||
|
||||
/* Number of blocks in journal area to allocate for fast commits */
|
||||
#define EXT4_NUM_FC_BLKS 256
|
||||
|
||||
/* Fast commit tags */
|
||||
#define EXT4_FC_TAG_ADD_RANGE 0x0001
|
||||
#define EXT4_FC_TAG_DEL_RANGE 0x0002
|
||||
#define EXT4_FC_TAG_CREAT 0x0003
|
||||
#define EXT4_FC_TAG_LINK 0x0004
|
||||
#define EXT4_FC_TAG_UNLINK 0x0005
|
||||
#define EXT4_FC_TAG_INODE 0x0006
|
||||
#define EXT4_FC_TAG_PAD 0x0007
|
||||
#define EXT4_FC_TAG_TAIL 0x0008
|
||||
#define EXT4_FC_TAG_HEAD 0x0009
|
||||
|
||||
#define EXT4_FC_SUPPORTED_FEATURES 0x0
|
||||
|
||||
/* On disk fast commit tlv value structures */
|
||||
|
||||
/* Fast commit on disk tag length structure */
|
||||
struct ext4_fc_tl {
|
||||
__le16 fc_tag;
|
||||
__le16 fc_len;
|
||||
};
|
||||
|
||||
/* Value structure for tag EXT4_FC_TAG_HEAD. */
|
||||
struct ext4_fc_head {
|
||||
__le32 fc_features;
|
||||
__le32 fc_tid;
|
||||
};
|
||||
|
||||
/* Value structure for EXT4_FC_TAG_ADD_RANGE. */
|
||||
struct ext4_fc_add_range {
|
||||
__le32 fc_ino;
|
||||
__u8 fc_ex[12];
|
||||
};
|
||||
|
||||
/* Value structure for tag EXT4_FC_TAG_DEL_RANGE. */
|
||||
struct ext4_fc_del_range {
|
||||
__le32 fc_ino;
|
||||
__le32 fc_lblk;
|
||||
__le32 fc_len;
|
||||
};
|
||||
|
||||
/*
|
||||
* This is the value structure for tags EXT4_FC_TAG_CREAT, EXT4_FC_TAG_LINK
|
||||
* and EXT4_FC_TAG_UNLINK.
|
||||
*/
|
||||
struct ext4_fc_dentry_info {
|
||||
__le32 fc_parent_ino;
|
||||
__le32 fc_ino;
|
||||
u8 fc_dname[0];
|
||||
};
|
||||
|
||||
/* Value structure for EXT4_FC_TAG_INODE and EXT4_FC_TAG_INODE_PARTIAL. */
|
||||
struct ext4_fc_inode {
|
||||
__le32 fc_ino;
|
||||
__u8 fc_raw_inode[0];
|
||||
};
|
||||
|
||||
/* Value structure for tag EXT4_FC_TAG_TAIL. */
|
||||
struct ext4_fc_tail {
|
||||
__le32 fc_tid;
|
||||
__le32 fc_crc;
|
||||
};
|
||||
|
||||
/*
|
||||
* In memory list of dentry updates that are performed on the file
|
||||
* system used by fast commit code.
|
||||
*/
|
||||
struct ext4_fc_dentry_update {
|
||||
int fcd_op; /* Type of update create / unlink / link */
|
||||
int fcd_parent; /* Parent inode number */
|
||||
int fcd_ino; /* Inode number */
|
||||
struct qstr fcd_name; /* Dirent name */
|
||||
unsigned char fcd_iname[DNAME_INLINE_LEN]; /* Dirent name string */
|
||||
struct list_head fcd_list;
|
||||
};
|
||||
|
||||
/*
|
||||
* Fast commit reason codes
|
||||
*/
|
||||
enum {
|
||||
/*
|
||||
* Commit status codes:
|
||||
*/
|
||||
EXT4_FC_REASON_OK = 0,
|
||||
EXT4_FC_REASON_INELIGIBLE,
|
||||
EXT4_FC_REASON_ALREADY_COMMITTED,
|
||||
EXT4_FC_REASON_FC_START_FAILED,
|
||||
EXT4_FC_REASON_FC_FAILED,
|
||||
|
||||
/*
|
||||
* Fast commit ineligiblity reasons:
|
||||
*/
|
||||
EXT4_FC_REASON_XATTR = 0,
|
||||
EXT4_FC_REASON_CROSS_RENAME,
|
||||
EXT4_FC_REASON_JOURNAL_FLAG_CHANGE,
|
||||
EXT4_FC_REASON_MEM,
|
||||
EXT4_FC_REASON_SWAP_BOOT,
|
||||
EXT4_FC_REASON_RESIZE,
|
||||
EXT4_FC_REASON_RENAME_DIR,
|
||||
EXT4_FC_REASON_FALLOC_RANGE,
|
||||
EXT4_FC_COMMIT_FAILED,
|
||||
EXT4_FC_REASON_MAX
|
||||
};
|
||||
|
||||
struct ext4_fc_stats {
|
||||
unsigned int fc_ineligible_reason_count[EXT4_FC_REASON_MAX];
|
||||
unsigned long fc_num_commits;
|
||||
unsigned long fc_ineligible_commits;
|
||||
unsigned long fc_numblks;
|
||||
};
|
||||
|
||||
#define EXT4_FC_REPLAY_REALLOC_INCREMENT 4
|
||||
|
||||
/*
|
||||
* Physical block regions added to different inodes due to fast commit
|
||||
* recovery. These are set during the SCAN phase. During the replay phase,
|
||||
* our allocator excludes these from its allocation. This ensures that
|
||||
* we don't accidentally allocating a block that is going to be used by
|
||||
* another inode.
|
||||
*/
|
||||
struct ext4_fc_alloc_region {
|
||||
ext4_lblk_t lblk;
|
||||
ext4_fsblk_t pblk;
|
||||
int ino, len;
|
||||
};
|
||||
|
||||
/*
|
||||
* Fast commit replay state.
|
||||
*/
|
||||
struct ext4_fc_replay_state {
|
||||
int fc_replay_num_tags;
|
||||
int fc_replay_expected_off;
|
||||
int fc_current_pass;
|
||||
int fc_cur_tag;
|
||||
int fc_crc;
|
||||
struct ext4_fc_alloc_region *fc_regions;
|
||||
int fc_regions_size, fc_regions_used, fc_regions_valid;
|
||||
int *fc_modified_inodes;
|
||||
int fc_modified_inodes_used, fc_modified_inodes_size;
|
||||
};
|
||||
|
||||
#define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1)
|
||||
|
||||
#define fc_for_each_tl(__start, __end, __tl) \
|
||||
for (tl = (struct ext4_fc_tl *)start; \
|
||||
(u8 *)tl < (u8 *)end; \
|
||||
tl = (struct ext4_fc_tl *)((u8 *)tl + \
|
||||
sizeof(struct ext4_fc_tl) + \
|
||||
+ le16_to_cpu(tl->fc_len)))
|
||||
|
||||
|
||||
#endif /* __FAST_COMMIT_H__ */
|
@ -262,6 +262,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
|
||||
if (iocb->ki_flags & IOCB_NOWAIT)
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
ext4_fc_start_update(inode);
|
||||
inode_lock(inode);
|
||||
ret = ext4_write_checks(iocb, from);
|
||||
if (ret <= 0)
|
||||
@ -273,6 +274,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
|
||||
|
||||
out:
|
||||
inode_unlock(inode);
|
||||
ext4_fc_stop_update(inode);
|
||||
if (likely(ret > 0)) {
|
||||
iocb->ki_pos += ret;
|
||||
ret = generic_write_sync(iocb, ret);
|
||||
@ -536,7 +538,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
|
||||
goto out;
|
||||
}
|
||||
|
||||
ext4_fc_start_update(inode);
|
||||
ret = ext4_orphan_add(handle, inode);
|
||||
ext4_fc_stop_update(inode);
|
||||
if (ret) {
|
||||
ext4_journal_stop(handle);
|
||||
goto out;
|
||||
@ -658,8 +662,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
|
||||
#endif
|
||||
if (iocb->ki_flags & IOCB_DIRECT)
|
||||
return ext4_dio_write_iter(iocb, from);
|
||||
|
||||
return ext4_buffered_write_iter(iocb, from);
|
||||
else
|
||||
return ext4_buffered_write_iter(iocb, from);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_FS_DAX
|
||||
@ -759,6 +763,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
|
||||
if (!daxdev_mapping_supported(vma, dax_dev))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
ext4_fc_start_update(inode);
|
||||
file_accessed(file);
|
||||
if (IS_DAX(file_inode(file))) {
|
||||
vma->vm_ops = &ext4_dax_vm_ops;
|
||||
@ -766,6 +771,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
|
||||
} else {
|
||||
vma->vm_ops = &ext4_file_vm_ops;
|
||||
}
|
||||
ext4_fc_stop_update(inode);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -846,7 +852,7 @@ static int ext4_file_open(struct inode *inode, struct file *filp)
|
||||
return ret;
|
||||
}
|
||||
|
||||
filp->f_mode |= FMODE_NOWAIT;
|
||||
filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
|
||||
return dquot_file_open(inode, filp);
|
||||
}
|
||||
|
||||
|
@ -108,6 +108,9 @@ static int ext4_getfsmap_helper(struct super_block *sb,
|
||||
|
||||
/* Are we just counting mappings? */
|
||||
if (info->gfi_head->fmh_count == 0) {
|
||||
if (info->gfi_head->fmh_entries == UINT_MAX)
|
||||
return EXT4_QUERY_RANGE_ABORT;
|
||||
|
||||
if (rec_fsblk > info->gfi_next_fsblk)
|
||||
info->gfi_head->fmh_entries++;
|
||||
|
||||
@ -571,8 +574,8 @@ static bool ext4_getfsmap_is_valid_device(struct super_block *sb,
|
||||
if (fm->fmr_device == 0 || fm->fmr_device == UINT_MAX ||
|
||||
fm->fmr_device == new_encode_dev(sb->s_bdev->bd_dev))
|
||||
return true;
|
||||
if (EXT4_SB(sb)->journal_bdev &&
|
||||
fm->fmr_device == new_encode_dev(EXT4_SB(sb)->journal_bdev->bd_dev))
|
||||
if (EXT4_SB(sb)->s_journal_bdev &&
|
||||
fm->fmr_device == new_encode_dev(EXT4_SB(sb)->s_journal_bdev->bd_dev))
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
@ -642,9 +645,9 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
|
||||
memset(handlers, 0, sizeof(handlers));
|
||||
handlers[0].gfd_dev = new_encode_dev(sb->s_bdev->bd_dev);
|
||||
handlers[0].gfd_fn = ext4_getfsmap_datadev;
|
||||
if (EXT4_SB(sb)->journal_bdev) {
|
||||
if (EXT4_SB(sb)->s_journal_bdev) {
|
||||
handlers[1].gfd_dev = new_encode_dev(
|
||||
EXT4_SB(sb)->journal_bdev->bd_dev);
|
||||
EXT4_SB(sb)->s_journal_bdev->bd_dev);
|
||||
handlers[1].gfd_fn = ext4_getfsmap_logdev;
|
||||
}
|
||||
|
||||
|
@ -112,7 +112,7 @@ static int ext4_fsync_journal(struct inode *inode, bool datasync,
|
||||
!jbd2_trans_will_send_data_barrier(journal, commit_tid))
|
||||
*needs_barrier = true;
|
||||
|
||||
return jbd2_complete_transaction(journal, commit_tid);
|
||||
return ext4_fc_commit(journal, commit_tid);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -150,7 +150,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
|
||||
|
||||
ret = file_write_and_wait_range(file, start, end);
|
||||
if (ret)
|
||||
return ret;
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* data=writeback,ordered:
|
||||
|
171
fs/ext4/ialloc.c
171
fs/ext4/ialloc.c
@ -82,7 +82,12 @@ static int ext4_validate_inode_bitmap(struct super_block *sb,
|
||||
struct buffer_head *bh)
|
||||
{
|
||||
ext4_fsblk_t blk;
|
||||
struct ext4_group_info *grp = ext4_get_group_info(sb, block_group);
|
||||
struct ext4_group_info *grp;
|
||||
|
||||
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return 0;
|
||||
|
||||
grp = ext4_get_group_info(sb, block_group);
|
||||
|
||||
if (buffer_verified(bh))
|
||||
return 0;
|
||||
@ -189,10 +194,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
|
||||
* submit the buffer_head for reading
|
||||
*/
|
||||
trace_ext4_load_inode_bitmap(sb, block_group);
|
||||
bh->b_end_io = ext4_end_bitmap_read;
|
||||
get_bh(bh);
|
||||
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
|
||||
wait_on_buffer(bh);
|
||||
ext4_read_bh(bh, REQ_META | REQ_PRIO, ext4_end_bitmap_read);
|
||||
ext4_simulate_fail_bh(sb, bh, EXT4_SIM_IBITMAP_EIO);
|
||||
if (!buffer_uptodate(bh)) {
|
||||
put_bh(bh);
|
||||
@ -284,15 +286,17 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
|
||||
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
|
||||
bitmap_bh = ext4_read_inode_bitmap(sb, block_group);
|
||||
/* Don't bother if the inode bitmap is corrupt. */
|
||||
grp = ext4_get_group_info(sb, block_group);
|
||||
if (IS_ERR(bitmap_bh)) {
|
||||
fatal = PTR_ERR(bitmap_bh);
|
||||
bitmap_bh = NULL;
|
||||
goto error_return;
|
||||
}
|
||||
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
|
||||
fatal = -EFSCORRUPTED;
|
||||
goto error_return;
|
||||
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
||||
grp = ext4_get_group_info(sb, block_group);
|
||||
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
|
||||
fatal = -EFSCORRUPTED;
|
||||
goto error_return;
|
||||
}
|
||||
}
|
||||
|
||||
BUFFER_TRACE(bitmap_bh, "get_write_access");
|
||||
@ -745,6 +749,122 @@ static int find_inode_bit(struct super_block *sb, ext4_group_t group,
|
||||
return 1;
|
||||
}
|
||||
|
||||
int ext4_mark_inode_used(struct super_block *sb, int ino)
|
||||
{
|
||||
unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count);
|
||||
struct buffer_head *inode_bitmap_bh = NULL, *group_desc_bh = NULL;
|
||||
struct ext4_group_desc *gdp;
|
||||
ext4_group_t group;
|
||||
int bit;
|
||||
int err = -EFSCORRUPTED;
|
||||
|
||||
if (ino < EXT4_FIRST_INO(sb) || ino > max_ino)
|
||||
goto out;
|
||||
|
||||
group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
|
||||
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
|
||||
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
|
||||
if (IS_ERR(inode_bitmap_bh))
|
||||
return PTR_ERR(inode_bitmap_bh);
|
||||
|
||||
if (ext4_test_bit(bit, inode_bitmap_bh->b_data)) {
|
||||
err = 0;
|
||||
goto out;
|
||||
}
|
||||
|
||||
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
|
||||
if (!gdp || !group_desc_bh) {
|
||||
err = -EINVAL;
|
||||
goto out;
|
||||
}
|
||||
|
||||
ext4_set_bit(bit, inode_bitmap_bh->b_data);
|
||||
|
||||
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
|
||||
err = ext4_handle_dirty_metadata(NULL, NULL, inode_bitmap_bh);
|
||||
if (err) {
|
||||
ext4_std_error(sb, err);
|
||||
goto out;
|
||||
}
|
||||
err = sync_dirty_buffer(inode_bitmap_bh);
|
||||
if (err) {
|
||||
ext4_std_error(sb, err);
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* We may have to initialize the block bitmap if it isn't already */
|
||||
if (ext4_has_group_desc_csum(sb) &&
|
||||
gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
|
||||
struct buffer_head *block_bitmap_bh;
|
||||
|
||||
block_bitmap_bh = ext4_read_block_bitmap(sb, group);
|
||||
if (IS_ERR(block_bitmap_bh)) {
|
||||
err = PTR_ERR(block_bitmap_bh);
|
||||
goto out;
|
||||
}
|
||||
|
||||
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
|
||||
err = ext4_handle_dirty_metadata(NULL, NULL, block_bitmap_bh);
|
||||
sync_dirty_buffer(block_bitmap_bh);
|
||||
|
||||
/* recheck and clear flag under lock if we still need to */
|
||||
ext4_lock_group(sb, group);
|
||||
if (ext4_has_group_desc_csum(sb) &&
|
||||
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
|
||||
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
|
||||
ext4_free_group_clusters_set(sb, gdp,
|
||||
ext4_free_clusters_after_init(sb, group, gdp));
|
||||
ext4_block_bitmap_csum_set(sb, group, gdp,
|
||||
block_bitmap_bh);
|
||||
ext4_group_desc_csum_set(sb, group, gdp);
|
||||
}
|
||||
ext4_unlock_group(sb, group);
|
||||
brelse(block_bitmap_bh);
|
||||
|
||||
if (err) {
|
||||
ext4_std_error(sb, err);
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
|
||||
/* Update the relevant bg descriptor fields */
|
||||
if (ext4_has_group_desc_csum(sb)) {
|
||||
int free;
|
||||
|
||||
ext4_lock_group(sb, group); /* while we modify the bg desc */
|
||||
free = EXT4_INODES_PER_GROUP(sb) -
|
||||
ext4_itable_unused_count(sb, gdp);
|
||||
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
|
||||
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
|
||||
free = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check the relative inode number against the last used
|
||||
* relative inode number in this group. if it is greater
|
||||
* we need to update the bg_itable_unused count
|
||||
*/
|
||||
if (bit >= free)
|
||||
ext4_itable_unused_set(sb, gdp,
|
||||
(EXT4_INODES_PER_GROUP(sb) - bit - 1));
|
||||
} else {
|
||||
ext4_lock_group(sb, group);
|
||||
}
|
||||
|
||||
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
|
||||
if (ext4_has_group_desc_csum(sb)) {
|
||||
ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
|
||||
EXT4_INODES_PER_GROUP(sb) / 8);
|
||||
ext4_group_desc_csum_set(sb, group, gdp);
|
||||
}
|
||||
|
||||
ext4_unlock_group(sb, group);
|
||||
err = ext4_handle_dirty_metadata(NULL, NULL, group_desc_bh);
|
||||
sync_dirty_buffer(group_desc_bh);
|
||||
out:
|
||||
return err;
|
||||
}
|
||||
|
||||
static int ext4_xattr_credits_for_new_inode(struct inode *dir, mode_t mode,
|
||||
bool encrypt)
|
||||
{
|
||||
@ -821,7 +941,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
|
||||
struct inode *ret;
|
||||
ext4_group_t i;
|
||||
ext4_group_t flex_group;
|
||||
struct ext4_group_info *grp;
|
||||
struct ext4_group_info *grp = NULL;
|
||||
bool encrypt = false;
|
||||
|
||||
/* Cannot create files in a deleted directory */
|
||||
@ -921,15 +1041,21 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
|
||||
if (ext4_free_inodes_count(sb, gdp) == 0)
|
||||
goto next_group;
|
||||
|
||||
grp = ext4_get_group_info(sb, group);
|
||||
/* Skip groups with already-known suspicious inode tables */
|
||||
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
|
||||
goto next_group;
|
||||
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
||||
grp = ext4_get_group_info(sb, group);
|
||||
/*
|
||||
* Skip groups with already-known suspicious inode
|
||||
* tables
|
||||
*/
|
||||
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
|
||||
goto next_group;
|
||||
}
|
||||
|
||||
brelse(inode_bitmap_bh);
|
||||
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
|
||||
/* Skip groups with suspicious inode tables */
|
||||
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp) ||
|
||||
if (((!(sbi->s_mount_state & EXT4_FC_REPLAY))
|
||||
&& EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) ||
|
||||
IS_ERR(inode_bitmap_bh)) {
|
||||
inode_bitmap_bh = NULL;
|
||||
goto next_group;
|
||||
@ -948,7 +1074,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
|
||||
goto next_group;
|
||||
}
|
||||
|
||||
if (!handle) {
|
||||
if ((!(sbi->s_mount_state & EXT4_FC_REPLAY)) && !handle) {
|
||||
BUG_ON(nblocks <= 0);
|
||||
handle = __ext4_journal_start_sb(dir->i_sb, line_no,
|
||||
handle_type, nblocks, 0,
|
||||
@ -1052,9 +1178,15 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
|
||||
/* Update the relevant bg descriptor fields */
|
||||
if (ext4_has_group_desc_csum(sb)) {
|
||||
int free;
|
||||
struct ext4_group_info *grp = ext4_get_group_info(sb, group);
|
||||
struct ext4_group_info *grp = NULL;
|
||||
|
||||
down_read(&grp->alloc_sem); /* protect vs itable lazyinit */
|
||||
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
||||
grp = ext4_get_group_info(sb, group);
|
||||
down_read(&grp->alloc_sem); /*
|
||||
* protect vs itable
|
||||
* lazyinit
|
||||
*/
|
||||
}
|
||||
ext4_lock_group(sb, group); /* while we modify the bg desc */
|
||||
free = EXT4_INODES_PER_GROUP(sb) -
|
||||
ext4_itable_unused_count(sb, gdp);
|
||||
@ -1070,7 +1202,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
|
||||
if (ino > free)
|
||||
ext4_itable_unused_set(sb, gdp,
|
||||
(EXT4_INODES_PER_GROUP(sb) - ino));
|
||||
up_read(&grp->alloc_sem);
|
||||
if (!(sbi->s_mount_state & EXT4_FC_REPLAY))
|
||||
up_read(&grp->alloc_sem);
|
||||
} else {
|
||||
ext4_lock_group(sb, group);
|
||||
}
|
||||
|
@ -163,7 +163,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
|
||||
}
|
||||
|
||||
if (!bh_uptodate_or_lock(bh)) {
|
||||
if (bh_submit_read(bh) < 0) {
|
||||
if (ext4_read_bh(bh, 0, NULL) < 0) {
|
||||
put_bh(bh);
|
||||
goto failure;
|
||||
}
|
||||
@ -593,7 +593,8 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
|
||||
if (ext4_has_feature_bigalloc(inode->i_sb)) {
|
||||
EXT4_ERROR_INODE(inode, "Can't allocate blocks for "
|
||||
"non-extent mapped inodes with bigalloc");
|
||||
return -EFSCORRUPTED;
|
||||
err = -EFSCORRUPTED;
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* Set up for the direct block allocation */
|
||||
@ -1012,14 +1013,14 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
|
||||
}
|
||||
|
||||
/* Go read the buffer for the next level down */
|
||||
bh = sb_bread(inode->i_sb, nr);
|
||||
bh = ext4_sb_bread(inode->i_sb, nr, 0);
|
||||
|
||||
/*
|
||||
* A read failure? Report error and clear slot
|
||||
* (should be rare).
|
||||
*/
|
||||
if (!bh) {
|
||||
ext4_error_inode_block(inode, nr, EIO,
|
||||
if (IS_ERR(bh)) {
|
||||
ext4_error_inode_block(inode, nr, -PTR_ERR(bh),
|
||||
"Read failure");
|
||||
continue;
|
||||
}
|
||||
@ -1033,7 +1034,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
|
||||
brelse(bh);
|
||||
|
||||
/*
|
||||
* Everything below this this pointer has been
|
||||
* Everything below this pointer has been
|
||||
* released. Now let this top-of-subtree go.
|
||||
*
|
||||
* We want the freeing of this indirect block to be
|
||||
|
@ -355,7 +355,7 @@ static int ext4_update_inline_data(handle_t *handle, struct inode *inode,
|
||||
if (error)
|
||||
goto out;
|
||||
|
||||
/* Update the xttr entry. */
|
||||
/* Update the xattr entry. */
|
||||
i.value = value;
|
||||
i.value_len = len;
|
||||
|
||||
|
292
fs/ext4/inode.c
292
fs/ext4/inode.c
@ -102,8 +102,8 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw,
|
||||
return provided == calculated;
|
||||
}
|
||||
|
||||
static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
|
||||
struct ext4_inode_info *ei)
|
||||
void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
|
||||
struct ext4_inode_info *ei)
|
||||
{
|
||||
__u32 csum;
|
||||
|
||||
@ -515,7 +515,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
|
||||
return -EFSCORRUPTED;
|
||||
|
||||
/* Lookup extent status tree firstly */
|
||||
if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
|
||||
if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) &&
|
||||
ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
|
||||
if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) {
|
||||
map->m_pblk = ext4_es_pblock(&es) +
|
||||
map->m_lblk - es.es_lblk;
|
||||
@ -730,6 +731,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
ext4_fc_track_range(inode, map->m_lblk,
|
||||
map->m_lblk + map->m_len - 1);
|
||||
}
|
||||
|
||||
if (retval < 0)
|
||||
@ -826,7 +829,8 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
|
||||
int create = map_flags & EXT4_GET_BLOCKS_CREATE;
|
||||
int err;
|
||||
|
||||
J_ASSERT(handle != NULL || create == 0);
|
||||
J_ASSERT((EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
|| handle != NULL || create == 0);
|
||||
|
||||
map.m_lblk = block;
|
||||
map.m_len = 1;
|
||||
@ -842,7 +846,8 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
|
||||
return ERR_PTR(-ENOMEM);
|
||||
if (map.m_flags & EXT4_MAP_NEW) {
|
||||
J_ASSERT(create != 0);
|
||||
J_ASSERT(handle != NULL);
|
||||
J_ASSERT((EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
|| (handle != NULL));
|
||||
|
||||
/*
|
||||
* Now that we do not always journal data, we should
|
||||
@ -879,18 +884,20 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
|
||||
ext4_lblk_t block, int map_flags)
|
||||
{
|
||||
struct buffer_head *bh;
|
||||
int ret;
|
||||
|
||||
bh = ext4_getblk(handle, inode, block, map_flags);
|
||||
if (IS_ERR(bh))
|
||||
return bh;
|
||||
if (!bh || ext4_buffer_uptodate(bh))
|
||||
return bh;
|
||||
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &bh);
|
||||
wait_on_buffer(bh);
|
||||
if (buffer_uptodate(bh))
|
||||
return bh;
|
||||
put_bh(bh);
|
||||
return ERR_PTR(-EIO);
|
||||
|
||||
ret = ext4_read_bh_lock(bh, REQ_META | REQ_PRIO, true);
|
||||
if (ret) {
|
||||
put_bh(bh);
|
||||
return ERR_PTR(ret);
|
||||
}
|
||||
return bh;
|
||||
}
|
||||
|
||||
/* Read a contiguous batch of blocks. */
|
||||
@ -911,8 +918,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,
|
||||
for (i = 0; i < bh_count; i++)
|
||||
/* Note that NULL bhs[i] is valid because of holes. */
|
||||
if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
|
||||
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1,
|
||||
&bhs[i]);
|
||||
ext4_read_bh_lock(bhs[i], REQ_META | REQ_PRIO, false);
|
||||
|
||||
if (!wait)
|
||||
return 0;
|
||||
@ -1082,7 +1088,7 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len,
|
||||
if (!buffer_uptodate(bh) && !buffer_delay(bh) &&
|
||||
!buffer_unwritten(bh) &&
|
||||
(block_start < from || block_end > to)) {
|
||||
ll_rw_block(REQ_OP_READ, 0, 1, &bh);
|
||||
ext4_read_bh_lock(bh, 0, false);
|
||||
wait[nr_wait++] = bh;
|
||||
}
|
||||
}
|
||||
@ -1923,6 +1929,9 @@ static int __ext4_journalled_writepage(struct page *page,
|
||||
err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL,
|
||||
write_end_fn);
|
||||
}
|
||||
if (ret == 0)
|
||||
ret = err;
|
||||
err = ext4_jbd2_inode_add_write(handle, inode, 0, len);
|
||||
if (ret == 0)
|
||||
ret = err;
|
||||
EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
|
||||
@ -2267,7 +2276,7 @@ static int mpage_process_page(struct mpage_da_data *mpd, struct page *page,
|
||||
err = PTR_ERR(io_end_vec);
|
||||
goto out;
|
||||
}
|
||||
io_end_vec->offset = mpd->map.m_lblk << blkbits;
|
||||
io_end_vec->offset = (loff_t)mpd->map.m_lblk << blkbits;
|
||||
}
|
||||
*map_bh = true;
|
||||
goto out;
|
||||
@ -2798,7 +2807,7 @@ static int ext4_writepages(struct address_space *mapping,
|
||||
* ext4_journal_stop() can wait for transaction commit
|
||||
* to finish which may depend on writeback of pages to
|
||||
* complete or on page lock to be released. In that
|
||||
* case, we have to wait until after after we have
|
||||
* case, we have to wait until after we have
|
||||
* submitted all the IO, released page locks we hold,
|
||||
* and dropped io_end reference (for extent conversion
|
||||
* to be able to complete) before stopping the handle.
|
||||
@ -3320,9 +3329,14 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
|
||||
{
|
||||
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
|
||||
|
||||
if (journal)
|
||||
return !jbd2_transaction_committed(journal,
|
||||
EXT4_I(inode)->i_datasync_tid);
|
||||
if (journal) {
|
||||
if (jbd2_transaction_committed(journal,
|
||||
EXT4_I(inode)->i_datasync_tid))
|
||||
return true;
|
||||
return atomic_read(&EXT4_SB(inode->i_sb)->s_fc_subtid) >=
|
||||
EXT4_I(inode)->i_fc_committed_subtid;
|
||||
}
|
||||
|
||||
/* Any metadata buffers to write? */
|
||||
if (!list_empty(&inode->i_mapping->private_list))
|
||||
return true;
|
||||
@ -3460,14 +3474,26 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
|
||||
map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits,
|
||||
EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1;
|
||||
|
||||
if (flags & IOMAP_WRITE)
|
||||
if (flags & IOMAP_WRITE) {
|
||||
/*
|
||||
* We check here if the blocks are already allocated, then we
|
||||
* don't need to start a journal txn and we can directly return
|
||||
* the mapping information. This could boost performance
|
||||
* especially in multi-threaded overwrite requests.
|
||||
*/
|
||||
if (offset + length <= i_size_read(inode)) {
|
||||
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
||||
if (ret > 0 && (map.m_flags & EXT4_MAP_MAPPED))
|
||||
goto out;
|
||||
}
|
||||
ret = ext4_iomap_alloc(inode, &map, flags);
|
||||
else
|
||||
} else {
|
||||
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
||||
}
|
||||
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
out:
|
||||
ext4_set_iomap(inode, iomap, &map, offset, length);
|
||||
|
||||
return 0;
|
||||
@ -3625,6 +3651,13 @@ static int ext4_set_page_dirty(struct page *page)
|
||||
return __set_page_dirty_buffers(page);
|
||||
}
|
||||
|
||||
static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
|
||||
struct file *file, sector_t *span)
|
||||
{
|
||||
return iomap_swapfile_activate(sis, file, span,
|
||||
&ext4_iomap_report_ops);
|
||||
}
|
||||
|
||||
static const struct address_space_operations ext4_aops = {
|
||||
.readpage = ext4_readpage,
|
||||
.readahead = ext4_readahead,
|
||||
@ -3640,6 +3673,7 @@ static const struct address_space_operations ext4_aops = {
|
||||
.migratepage = buffer_migrate_page,
|
||||
.is_partially_uptodate = block_is_partially_uptodate,
|
||||
.error_remove_page = generic_error_remove_page,
|
||||
.swap_activate = ext4_iomap_swap_activate,
|
||||
};
|
||||
|
||||
static const struct address_space_operations ext4_journalled_aops = {
|
||||
@ -3656,6 +3690,7 @@ static const struct address_space_operations ext4_journalled_aops = {
|
||||
.direct_IO = noop_direct_IO,
|
||||
.is_partially_uptodate = block_is_partially_uptodate,
|
||||
.error_remove_page = generic_error_remove_page,
|
||||
.swap_activate = ext4_iomap_swap_activate,
|
||||
};
|
||||
|
||||
static const struct address_space_operations ext4_da_aops = {
|
||||
@ -3673,6 +3708,7 @@ static const struct address_space_operations ext4_da_aops = {
|
||||
.migratepage = buffer_migrate_page,
|
||||
.is_partially_uptodate = block_is_partially_uptodate,
|
||||
.error_remove_page = generic_error_remove_page,
|
||||
.swap_activate = ext4_iomap_swap_activate,
|
||||
};
|
||||
|
||||
static const struct address_space_operations ext4_dax_aops = {
|
||||
@ -3681,6 +3717,7 @@ static const struct address_space_operations ext4_dax_aops = {
|
||||
.set_page_dirty = noop_set_page_dirty,
|
||||
.bmap = ext4_bmap,
|
||||
.invalidatepage = noop_invalidatepage,
|
||||
.swap_activate = ext4_iomap_swap_activate,
|
||||
};
|
||||
|
||||
void ext4_set_aops(struct inode *inode)
|
||||
@ -3754,11 +3791,8 @@ static int __ext4_block_zero_page_range(handle_t *handle,
|
||||
set_buffer_uptodate(bh);
|
||||
|
||||
if (!buffer_uptodate(bh)) {
|
||||
err = -EIO;
|
||||
ll_rw_block(REQ_OP_READ, 0, 1, &bh);
|
||||
wait_on_buffer(bh);
|
||||
/* Uhhuh. Read error. Complain and punt. */
|
||||
if (!buffer_uptodate(bh))
|
||||
err = ext4_read_bh_lock(bh, 0, true);
|
||||
if (err)
|
||||
goto unlock;
|
||||
if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
|
||||
/* We expect the key to be set. */
|
||||
@ -4097,6 +4131,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
|
||||
|
||||
up_write(&EXT4_I(inode)->i_data_sem);
|
||||
}
|
||||
ext4_fc_track_range(inode, first_block, stop_block);
|
||||
if (IS_SYNC(inode))
|
||||
ext4_handle_sync(handle);
|
||||
|
||||
@ -4276,22 +4311,22 @@ int ext4_truncate(struct inode *inode)
|
||||
* data in memory that is needed to recreate the on-disk version of this
|
||||
* inode.
|
||||
*/
|
||||
static int __ext4_get_inode_loc(struct inode *inode,
|
||||
struct ext4_iloc *iloc, int in_mem)
|
||||
static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
|
||||
struct ext4_iloc *iloc, int in_mem,
|
||||
ext4_fsblk_t *ret_block)
|
||||
{
|
||||
struct ext4_group_desc *gdp;
|
||||
struct buffer_head *bh;
|
||||
struct super_block *sb = inode->i_sb;
|
||||
ext4_fsblk_t block;
|
||||
struct blk_plug plug;
|
||||
int inodes_per_block, inode_offset;
|
||||
|
||||
iloc->bh = NULL;
|
||||
if (inode->i_ino < EXT4_ROOT_INO ||
|
||||
inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))
|
||||
if (ino < EXT4_ROOT_INO ||
|
||||
ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))
|
||||
return -EFSCORRUPTED;
|
||||
|
||||
iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb);
|
||||
iloc->block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
|
||||
gdp = ext4_get_group_desc(sb, iloc->block_group, NULL);
|
||||
if (!gdp)
|
||||
return -EIO;
|
||||
@ -4300,7 +4335,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
|
||||
* Figure out the offset within the block group inode table
|
||||
*/
|
||||
inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
|
||||
inode_offset = ((inode->i_ino - 1) %
|
||||
inode_offset = ((ino - 1) %
|
||||
EXT4_INODES_PER_GROUP(sb));
|
||||
block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block);
|
||||
iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb);
|
||||
@ -4313,16 +4348,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
|
||||
if (!buffer_uptodate(bh)) {
|
||||
lock_buffer(bh);
|
||||
|
||||
/*
|
||||
* If the buffer has the write error flag, we have failed
|
||||
* to write out another inode in the same block. In this
|
||||
* case, we don't have to read the block because we may
|
||||
* read the old inode data successfully.
|
||||
*/
|
||||
if (buffer_write_io_error(bh) && !buffer_uptodate(bh))
|
||||
set_buffer_uptodate(bh);
|
||||
|
||||
if (buffer_uptodate(bh)) {
|
||||
if (ext4_buffer_uptodate(bh)) {
|
||||
/* someone brought it uptodate while we waited */
|
||||
unlock_buffer(bh);
|
||||
goto has_buffer;
|
||||
@ -4393,7 +4419,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
|
||||
if (end > table)
|
||||
end = table;
|
||||
while (b <= end)
|
||||
sb_breadahead_unmovable(sb, b++);
|
||||
ext4_sb_breadahead_unmovable(sb, b++);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -4401,16 +4427,14 @@ static int __ext4_get_inode_loc(struct inode *inode,
|
||||
* has in-inode xattrs, or we don't have this inode in memory.
|
||||
* Read the block from disk.
|
||||
*/
|
||||
trace_ext4_load_inode(inode);
|
||||
get_bh(bh);
|
||||
bh->b_end_io = end_buffer_read_sync;
|
||||
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
|
||||
trace_ext4_load_inode(sb, ino);
|
||||
ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
|
||||
blk_finish_plug(&plug);
|
||||
wait_on_buffer(bh);
|
||||
if (!buffer_uptodate(bh)) {
|
||||
simulate_eio:
|
||||
ext4_error_inode_block(inode, block, EIO,
|
||||
"unable to read itable block");
|
||||
if (ret_block)
|
||||
*ret_block = block;
|
||||
brelse(bh);
|
||||
return -EIO;
|
||||
}
|
||||
@ -4420,11 +4444,43 @@ static int __ext4_get_inode_loc(struct inode *inode,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __ext4_get_inode_loc_noinmem(struct inode *inode,
|
||||
struct ext4_iloc *iloc)
|
||||
{
|
||||
ext4_fsblk_t err_blk;
|
||||
int ret;
|
||||
|
||||
ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0,
|
||||
&err_blk);
|
||||
|
||||
if (ret == -EIO)
|
||||
ext4_error_inode_block(inode, err_blk, EIO,
|
||||
"unable to read itable block");
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
|
||||
{
|
||||
ext4_fsblk_t err_blk;
|
||||
int ret;
|
||||
|
||||
/* We have all inode data except xattrs in memory here. */
|
||||
return __ext4_get_inode_loc(inode, iloc,
|
||||
!ext4_test_inode_state(inode, EXT4_STATE_XATTR));
|
||||
ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc,
|
||||
!ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk);
|
||||
|
||||
if (ret == -EIO)
|
||||
ext4_error_inode_block(inode, err_blk, EIO,
|
||||
"unable to read itable block");
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
|
||||
struct ext4_iloc *iloc)
|
||||
{
|
||||
return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL);
|
||||
}
|
||||
|
||||
static bool ext4_should_enable_dax(struct inode *inode)
|
||||
@ -4590,7 +4646,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
|
||||
ei = EXT4_I(inode);
|
||||
iloc.bh = NULL;
|
||||
|
||||
ret = __ext4_get_inode_loc(inode, &iloc, 0);
|
||||
ret = __ext4_get_inode_loc_noinmem(inode, &iloc);
|
||||
if (ret < 0)
|
||||
goto bad_inode;
|
||||
raw_inode = ext4_raw_inode(&iloc);
|
||||
@ -4636,10 +4692,11 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
|
||||
sizeof(gen));
|
||||
}
|
||||
|
||||
if (!ext4_inode_csum_verify(inode, raw_inode, ei) ||
|
||||
ext4_simulate_fail(sb, EXT4_SIM_INODE_CRC)) {
|
||||
ext4_error_inode_err(inode, function, line, 0, EFSBADCRC,
|
||||
"iget: checksum invalid");
|
||||
if ((!ext4_inode_csum_verify(inode, raw_inode, ei) ||
|
||||
ext4_simulate_fail(sb, EXT4_SIM_INODE_CRC)) &&
|
||||
(!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))) {
|
||||
ext4_error_inode_err(inode, function, line, 0,
|
||||
EFSBADCRC, "iget: checksum invalid");
|
||||
ret = -EFSBADCRC;
|
||||
goto bad_inode;
|
||||
}
|
||||
@ -4727,6 +4784,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
|
||||
for (block = 0; block < EXT4_N_BLOCKS; block++)
|
||||
ei->i_data[block] = raw_inode->i_block[block];
|
||||
INIT_LIST_HEAD(&ei->i_orphan);
|
||||
ext4_fc_init_inode(&ei->vfs_inode);
|
||||
|
||||
/*
|
||||
* Set transaction id's of transactions that have to be committed
|
||||
@ -4792,9 +4850,10 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
|
||||
goto bad_inode;
|
||||
} else if (!ext4_has_inline_data(inode)) {
|
||||
/* validate the block references in the inode */
|
||||
if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
|
||||
(S_ISLNK(inode->i_mode) &&
|
||||
!ext4_inode_is_fast_symlink(inode))) {
|
||||
if (!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) &&
|
||||
(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
|
||||
(S_ISLNK(inode->i_mode) &&
|
||||
!ext4_inode_is_fast_symlink(inode)))) {
|
||||
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
|
||||
ret = ext4_ext_check_inode(inode);
|
||||
else
|
||||
@ -4995,6 +5054,12 @@ static int ext4_do_update_inode(handle_t *handle,
|
||||
if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
|
||||
memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
|
||||
|
||||
err = ext4_inode_blocks_set(handle, raw_inode, ei);
|
||||
if (err) {
|
||||
spin_unlock(&ei->i_raw_lock);
|
||||
goto out_brelse;
|
||||
}
|
||||
|
||||
raw_inode->i_mode = cpu_to_le16(inode->i_mode);
|
||||
i_uid = i_uid_read(inode);
|
||||
i_gid = i_gid_read(inode);
|
||||
@ -5028,11 +5093,6 @@ static int ext4_do_update_inode(handle_t *handle,
|
||||
EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
|
||||
EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
|
||||
|
||||
err = ext4_inode_blocks_set(handle, raw_inode, ei);
|
||||
if (err) {
|
||||
spin_unlock(&ei->i_raw_lock);
|
||||
goto out_brelse;
|
||||
}
|
||||
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
|
||||
raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
|
||||
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
|
||||
@ -5173,12 +5233,12 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
|
||||
if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync)
|
||||
return 0;
|
||||
|
||||
err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal,
|
||||
err = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
|
||||
EXT4_I(inode)->i_sync_tid);
|
||||
} else {
|
||||
struct ext4_iloc iloc;
|
||||
|
||||
err = __ext4_get_inode_loc(inode, &iloc, 0);
|
||||
err = __ext4_get_inode_loc_noinmem(inode, &iloc);
|
||||
if (err)
|
||||
return err;
|
||||
/*
|
||||
@ -5302,6 +5362,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
if (error)
|
||||
return error;
|
||||
}
|
||||
ext4_fc_start_update(inode);
|
||||
if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
|
||||
(ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
|
||||
handle_t *handle;
|
||||
@ -5325,6 +5386,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
|
||||
if (error) {
|
||||
ext4_journal_stop(handle);
|
||||
ext4_fc_stop_update(inode);
|
||||
return error;
|
||||
}
|
||||
/* Update corresponding info in inode so that everything is in
|
||||
@ -5347,11 +5409,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
|
||||
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
||||
|
||||
if (attr->ia_size > sbi->s_bitmap_maxbytes)
|
||||
if (attr->ia_size > sbi->s_bitmap_maxbytes) {
|
||||
ext4_fc_stop_update(inode);
|
||||
return -EFBIG;
|
||||
}
|
||||
}
|
||||
if (!S_ISREG(inode->i_mode))
|
||||
if (!S_ISREG(inode->i_mode)) {
|
||||
ext4_fc_stop_update(inode);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
|
||||
inode_inc_iversion(inode);
|
||||
@ -5375,7 +5441,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
rc = ext4_break_layouts(inode);
|
||||
if (rc) {
|
||||
up_write(&EXT4_I(inode)->i_mmap_sem);
|
||||
return rc;
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
if (attr->ia_size != inode->i_size) {
|
||||
@ -5396,6 +5462,21 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
inode->i_mtime = current_time(inode);
|
||||
inode->i_ctime = inode->i_mtime;
|
||||
}
|
||||
|
||||
if (shrink)
|
||||
ext4_fc_track_range(inode,
|
||||
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
|
||||
inode->i_sb->s_blocksize_bits,
|
||||
(oldsize > 0 ? oldsize - 1 : 0) >>
|
||||
inode->i_sb->s_blocksize_bits);
|
||||
else
|
||||
ext4_fc_track_range(
|
||||
inode,
|
||||
(oldsize > 0 ? oldsize - 1 : oldsize) >>
|
||||
inode->i_sb->s_blocksize_bits,
|
||||
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
|
||||
inode->i_sb->s_blocksize_bits);
|
||||
|
||||
down_write(&EXT4_I(inode)->i_data_sem);
|
||||
EXT4_I(inode)->i_disksize = attr->ia_size;
|
||||
rc = ext4_mark_inode_dirty(handle, inode);
|
||||
@ -5454,9 +5535,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
rc = posix_acl_chmod(inode, inode->i_mode);
|
||||
|
||||
err_out:
|
||||
ext4_std_error(inode->i_sb, error);
|
||||
if (error)
|
||||
ext4_std_error(inode->i_sb, error);
|
||||
if (!error)
|
||||
error = rc;
|
||||
ext4_fc_stop_update(inode);
|
||||
return error;
|
||||
}
|
||||
|
||||
@ -5638,6 +5721,8 @@ int ext4_mark_iloc_dirty(handle_t *handle,
|
||||
put_bh(iloc->bh);
|
||||
return -EIO;
|
||||
}
|
||||
ext4_fc_track_inode(inode);
|
||||
|
||||
if (IS_I_VERSION(inode))
|
||||
inode_inc_iversion(inode);
|
||||
|
||||
@ -5961,6 +6046,8 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
|
||||
if (IS_ERR(handle))
|
||||
return PTR_ERR(handle);
|
||||
|
||||
ext4_fc_mark_ineligible(inode->i_sb,
|
||||
EXT4_FC_REASON_JOURNAL_FLAG_CHANGE);
|
||||
err = ext4_mark_inode_dirty(handle, inode);
|
||||
ext4_handle_sync(handle);
|
||||
ext4_journal_stop(handle);
|
||||
@ -6001,9 +6088,17 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
|
||||
if (err)
|
||||
goto out_ret;
|
||||
|
||||
/*
|
||||
* On data journalling we skip straight to the transaction handle:
|
||||
* there's no delalloc; page truncated will be checked later; the
|
||||
* early return w/ all buffers mapped (calculates size/len) can't
|
||||
* be used; and there's no dioread_nolock, so only ext4_get_block.
|
||||
*/
|
||||
if (ext4_should_journal_data(inode))
|
||||
goto retry_alloc;
|
||||
|
||||
/* Delalloc case is easy... */
|
||||
if (test_opt(inode->i_sb, DELALLOC) &&
|
||||
!ext4_should_journal_data(inode) &&
|
||||
!ext4_nonda_switch(inode->i_sb)) {
|
||||
do {
|
||||
err = block_page_mkwrite(vma, vmf,
|
||||
@ -6029,6 +6124,9 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
|
||||
/*
|
||||
* Return if we have all the buffers mapped. This avoids the need to do
|
||||
* journal_start/journal_stop which can block and take a long time
|
||||
*
|
||||
* This cannot be done for data journalling, as we have to add the
|
||||
* inode to the transaction's list to writeprotect pages on commit.
|
||||
*/
|
||||
if (page_has_buffers(page)) {
|
||||
if (!ext4_walk_page_buffers(NULL, page_buffers(page),
|
||||
@ -6053,16 +6151,42 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
|
||||
ret = VM_FAULT_SIGBUS;
|
||||
goto out;
|
||||
}
|
||||
err = block_page_mkwrite(vma, vmf, get_block);
|
||||
if (!err && ext4_should_journal_data(inode)) {
|
||||
if (ext4_walk_page_buffers(handle, page_buffers(page), 0,
|
||||
PAGE_SIZE, NULL, do_journal_get_write_access)) {
|
||||
unlock_page(page);
|
||||
ret = VM_FAULT_SIGBUS;
|
||||
ext4_journal_stop(handle);
|
||||
goto out;
|
||||
/*
|
||||
* Data journalling can't use block_page_mkwrite() because it
|
||||
* will set_buffer_dirty() before do_journal_get_write_access()
|
||||
* thus might hit warning messages for dirty metadata buffers.
|
||||
*/
|
||||
if (!ext4_should_journal_data(inode)) {
|
||||
err = block_page_mkwrite(vma, vmf, get_block);
|
||||
} else {
|
||||
lock_page(page);
|
||||
size = i_size_read(inode);
|
||||
/* Page got truncated from under us? */
|
||||
if (page->mapping != mapping || page_offset(page) > size) {
|
||||
ret = VM_FAULT_NOPAGE;
|
||||
goto out_error;
|
||||
}
|
||||
|
||||
if (page->index == size >> PAGE_SHIFT)
|
||||
len = size & ~PAGE_MASK;
|
||||
else
|
||||
len = PAGE_SIZE;
|
||||
|
||||
err = __block_write_begin(page, 0, len, ext4_get_block);
|
||||
if (!err) {
|
||||
ret = VM_FAULT_SIGBUS;
|
||||
if (ext4_walk_page_buffers(handle, page_buffers(page),
|
||||
0, len, NULL, do_journal_get_write_access))
|
||||
goto out_error;
|
||||
if (ext4_walk_page_buffers(handle, page_buffers(page),
|
||||
0, len, NULL, write_end_fn))
|
||||
goto out_error;
|
||||
if (ext4_jbd2_inode_add_write(handle, inode, 0, len))
|
||||
goto out_error;
|
||||
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
|
||||
} else {
|
||||
unlock_page(page);
|
||||
}
|
||||
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
|
||||
}
|
||||
ext4_journal_stop(handle);
|
||||
if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
|
||||
@ -6073,6 +6197,10 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
|
||||
up_read(&EXT4_I(inode)->i_mmap_sem);
|
||||
sb_end_pagefault(inode->i_sb);
|
||||
return ret;
|
||||
out_error:
|
||||
unlock_page(page);
|
||||
ext4_journal_stop(handle);
|
||||
goto out;
|
||||
}
|
||||
|
||||
vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
|
||||
|
@ -86,7 +86,7 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2)
|
||||
i_size_write(inode2, isize);
|
||||
}
|
||||
|
||||
static void reset_inode_seed(struct inode *inode)
|
||||
void ext4_reset_inode_seed(struct inode *inode)
|
||||
{
|
||||
struct ext4_inode_info *ei = EXT4_I(inode);
|
||||
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
||||
@ -165,6 +165,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
|
||||
err = -EINVAL;
|
||||
goto err_out;
|
||||
}
|
||||
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_SWAP_BOOT);
|
||||
|
||||
/* Protect extent tree against block allocations via delalloc */
|
||||
ext4_double_down_write_data_sem(inode, inode_bl);
|
||||
@ -199,8 +200,8 @@ static long swap_inode_boot_loader(struct super_block *sb,
|
||||
|
||||
inode->i_generation = prandom_u32();
|
||||
inode_bl->i_generation = prandom_u32();
|
||||
reset_inode_seed(inode);
|
||||
reset_inode_seed(inode_bl);
|
||||
ext4_reset_inode_seed(inode);
|
||||
ext4_reset_inode_seed(inode_bl);
|
||||
|
||||
ext4_discard_preallocations(inode, 0);
|
||||
|
||||
@ -247,6 +248,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
|
||||
|
||||
err_out1:
|
||||
ext4_journal_stop(handle);
|
||||
ext4_fc_stop_ineligible(sb);
|
||||
ext4_double_up_write_data_sem(inode, inode_bl);
|
||||
|
||||
err_out:
|
||||
@ -807,7 +809,7 @@ static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
|
||||
return error;
|
||||
}
|
||||
|
||||
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
|
||||
static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
struct inode *inode = file_inode(filp);
|
||||
struct super_block *sb = inode->i_sb;
|
||||
@ -1074,6 +1076,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
|
||||
|
||||
err = ext4_resize_fs(sb, n_blocks_count);
|
||||
if (EXT4_SB(sb)->s_journal) {
|
||||
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE);
|
||||
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
|
||||
err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
|
||||
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
|
||||
@ -1308,6 +1311,17 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
|
||||
}
|
||||
}
|
||||
|
||||
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
long ret;
|
||||
|
||||
ext4_fc_start_update(file_inode(filp));
|
||||
ret = __ext4_ioctl(filp, cmd, arg);
|
||||
ext4_fc_stop_update(file_inode(filp));
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
|
@ -124,7 +124,7 @@
|
||||
* /sys/fs/ext4/<partition>/mb_group_prealloc. The value is represented in
|
||||
* terms of number of blocks. If we have mounted the file system with -O
|
||||
* stripe=<value> option the group prealloc request is normalized to the
|
||||
* the smallest multiple of the stripe value (sbi->s_stripe) which is
|
||||
* smallest multiple of the stripe value (sbi->s_stripe) which is
|
||||
* greater than the default mb_group_prealloc.
|
||||
*
|
||||
* The regular allocator (using the buddy cache) supports a few tunables.
|
||||
@ -619,11 +619,8 @@ static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
|
||||
void *buddy;
|
||||
void *buddy2;
|
||||
|
||||
{
|
||||
static int mb_check_counter;
|
||||
if (mb_check_counter++ % 100 != 0)
|
||||
return 0;
|
||||
}
|
||||
if (e4b->bd_info->bb_check_counter++ % 10)
|
||||
return 0;
|
||||
|
||||
while (order > 1) {
|
||||
buddy = mb_find_buddy(e4b, order, &max);
|
||||
@ -1394,9 +1391,6 @@ void ext4_set_bits(void *bm, int cur, int len)
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* _________________________________________________________________ */
|
||||
|
||||
static inline int mb_buddy_adjust_border(int* bit, void* bitmap, int side)
|
||||
{
|
||||
if (mb_test_bit(*bit + side, bitmap)) {
|
||||
@ -1508,14 +1502,16 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
|
||||
|
||||
blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
|
||||
blocknr += EXT4_C2B(sbi, block);
|
||||
ext4_grp_locked_error(sb, e4b->bd_group,
|
||||
inode ? inode->i_ino : 0,
|
||||
blocknr,
|
||||
"freeing already freed block "
|
||||
"(bit %u); block bitmap corrupt.",
|
||||
block);
|
||||
ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
|
||||
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
||||
ext4_grp_locked_error(sb, e4b->bd_group,
|
||||
inode ? inode->i_ino : 0,
|
||||
blocknr,
|
||||
"freeing already freed block (bit %u); block bitmap corrupt.",
|
||||
block);
|
||||
ext4_mark_group_bitmap_corrupted(
|
||||
sb, e4b->bd_group,
|
||||
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
|
||||
}
|
||||
mb_regenerate_buddy(e4b);
|
||||
goto done;
|
||||
}
|
||||
@ -2019,7 +2015,7 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
|
||||
/*
|
||||
* IF we have corrupt bitmap, we won't find any
|
||||
* free blocks even though group info says we
|
||||
* we have free blocks
|
||||
* have free blocks
|
||||
*/
|
||||
ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
|
||||
"%d free clusters as per "
|
||||
@ -3302,6 +3298,84 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
|
||||
return err;
|
||||
}
|
||||
|
||||
/*
|
||||
* Idempotent helper for Ext4 fast commit replay path to set the state of
|
||||
* blocks in bitmaps and update counters.
|
||||
*/
|
||||
void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
|
||||
int len, int state)
|
||||
{
|
||||
struct buffer_head *bitmap_bh = NULL;
|
||||
struct ext4_group_desc *gdp;
|
||||
struct buffer_head *gdp_bh;
|
||||
struct ext4_sb_info *sbi = EXT4_SB(sb);
|
||||
ext4_group_t group;
|
||||
ext4_grpblk_t blkoff;
|
||||
int i, clen, err;
|
||||
int already;
|
||||
|
||||
clen = EXT4_B2C(sbi, len);
|
||||
|
||||
ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
|
||||
bitmap_bh = ext4_read_block_bitmap(sb, group);
|
||||
if (IS_ERR(bitmap_bh)) {
|
||||
err = PTR_ERR(bitmap_bh);
|
||||
bitmap_bh = NULL;
|
||||
goto out_err;
|
||||
}
|
||||
|
||||
err = -EIO;
|
||||
gdp = ext4_get_group_desc(sb, group, &gdp_bh);
|
||||
if (!gdp)
|
||||
goto out_err;
|
||||
|
||||
ext4_lock_group(sb, group);
|
||||
already = 0;
|
||||
for (i = 0; i < clen; i++)
|
||||
if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
|
||||
already++;
|
||||
|
||||
if (state)
|
||||
ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
|
||||
else
|
||||
mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
|
||||
if (ext4_has_group_desc_csum(sb) &&
|
||||
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
|
||||
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
|
||||
ext4_free_group_clusters_set(sb, gdp,
|
||||
ext4_free_clusters_after_init(sb,
|
||||
group, gdp));
|
||||
}
|
||||
if (state)
|
||||
clen = ext4_free_group_clusters(sb, gdp) - clen + already;
|
||||
else
|
||||
clen = ext4_free_group_clusters(sb, gdp) + clen - already;
|
||||
|
||||
ext4_free_group_clusters_set(sb, gdp, clen);
|
||||
ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
|
||||
ext4_group_desc_csum_set(sb, group, gdp);
|
||||
|
||||
ext4_unlock_group(sb, group);
|
||||
|
||||
if (sbi->s_log_groups_per_flex) {
|
||||
ext4_group_t flex_group = ext4_flex_group(sbi, group);
|
||||
|
||||
atomic64_sub(len,
|
||||
&sbi_array_rcu_deref(sbi, s_flex_groups,
|
||||
flex_group)->free_clusters);
|
||||
}
|
||||
|
||||
err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
|
||||
if (err)
|
||||
goto out_err;
|
||||
sync_dirty_buffer(bitmap_bh);
|
||||
err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
|
||||
sync_dirty_buffer(gdp_bh);
|
||||
|
||||
out_err:
|
||||
brelse(bitmap_bh);
|
||||
}
|
||||
|
||||
/*
|
||||
* here we normalize request for locality group
|
||||
* Group request are normalized to s_mb_group_prealloc, which goes to
|
||||
@ -4160,7 +4234,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
|
||||
struct ext4_buddy e4b;
|
||||
int err;
|
||||
int busy = 0;
|
||||
int free = 0;
|
||||
int free, free_total = 0;
|
||||
|
||||
mb_debug(sb, "discard preallocation for group %u\n", group);
|
||||
if (list_empty(&grp->bb_prealloc_list))
|
||||
@ -4188,8 +4262,8 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
|
||||
|
||||
INIT_LIST_HEAD(&list);
|
||||
repeat:
|
||||
free = 0;
|
||||
ext4_lock_group(sb, group);
|
||||
this_cpu_inc(discard_pa_seq);
|
||||
list_for_each_entry_safe(pa, tmp,
|
||||
&grp->bb_prealloc_list, pa_group_list) {
|
||||
spin_lock(&pa->pa_lock);
|
||||
@ -4206,6 +4280,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
|
||||
/* seems this one can be freed ... */
|
||||
ext4_mb_mark_pa_deleted(sb, pa);
|
||||
|
||||
if (!free)
|
||||
this_cpu_inc(discard_pa_seq);
|
||||
|
||||
/* we can trust pa_free ... */
|
||||
free += pa->pa_free;
|
||||
|
||||
@ -4215,22 +4292,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
|
||||
list_add(&pa->u.pa_tmp_list, &list);
|
||||
}
|
||||
|
||||
/* if we still need more blocks and some PAs were used, try again */
|
||||
if (free < needed && busy) {
|
||||
busy = 0;
|
||||
ext4_unlock_group(sb, group);
|
||||
cond_resched();
|
||||
goto repeat;
|
||||
}
|
||||
|
||||
/* found anything to free? */
|
||||
if (list_empty(&list)) {
|
||||
BUG_ON(free != 0);
|
||||
mb_debug(sb, "Someone else may have freed PA for this group %u\n",
|
||||
group);
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* now free all selected PAs */
|
||||
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
|
||||
|
||||
@ -4248,14 +4309,22 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
|
||||
call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
|
||||
}
|
||||
|
||||
out:
|
||||
free_total += free;
|
||||
|
||||
/* if we still need more blocks and some PAs were used, try again */
|
||||
if (free_total < needed && busy) {
|
||||
ext4_unlock_group(sb, group);
|
||||
cond_resched();
|
||||
busy = 0;
|
||||
goto repeat;
|
||||
}
|
||||
ext4_unlock_group(sb, group);
|
||||
ext4_mb_unload_buddy(&e4b);
|
||||
put_bh(bitmap_bh);
|
||||
out_dbg:
|
||||
mb_debug(sb, "discarded (%d) blocks preallocated for group %u bb_free (%d)\n",
|
||||
free, group, grp->bb_free);
|
||||
return free;
|
||||
free_total, group, grp->bb_free);
|
||||
return free_total;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -4283,6 +4352,9 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed)
|
||||
return;
|
||||
}
|
||||
|
||||
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
return;
|
||||
|
||||
mb_debug(sb, "discard preallocation for inode %lu\n",
|
||||
inode->i_ino);
|
||||
trace_ext4_discard_preallocations(inode,
|
||||
@ -4830,6 +4902,9 @@ static bool ext4_mb_discard_preallocations_should_retry(struct super_block *sb,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle,
|
||||
struct ext4_allocation_request *ar, int *errp);
|
||||
|
||||
/*
|
||||
* Main entry point into mballoc to allocate blocks
|
||||
* it tries to use preallocation first, then falls back
|
||||
@ -4851,6 +4926,8 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
|
||||
sbi = EXT4_SB(sb);
|
||||
|
||||
trace_ext4_request_blocks(ar);
|
||||
if (sbi->s_mount_state & EXT4_FC_REPLAY)
|
||||
return ext4_mb_new_blocks_simple(handle, ar, errp);
|
||||
|
||||
/* Allow to use superuser reservation for quota file */
|
||||
if (ext4_is_quota_file(ar->inode))
|
||||
@ -5078,6 +5155,102 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Simple allocator for Ext4 fast commit replay path. It searches for blocks
|
||||
* linearly starting at the goal block and also excludes the blocks which
|
||||
* are going to be in use after fast commit replay.
|
||||
*/
|
||||
static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle,
|
||||
struct ext4_allocation_request *ar, int *errp)
|
||||
{
|
||||
struct buffer_head *bitmap_bh;
|
||||
struct super_block *sb = ar->inode->i_sb;
|
||||
ext4_group_t group;
|
||||
ext4_grpblk_t blkoff;
|
||||
int i;
|
||||
ext4_fsblk_t goal, block;
|
||||
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
|
||||
|
||||
goal = ar->goal;
|
||||
if (goal < le32_to_cpu(es->s_first_data_block) ||
|
||||
goal >= ext4_blocks_count(es))
|
||||
goal = le32_to_cpu(es->s_first_data_block);
|
||||
|
||||
ar->len = 0;
|
||||
ext4_get_group_no_and_offset(sb, goal, &group, &blkoff);
|
||||
for (; group < ext4_get_groups_count(sb); group++) {
|
||||
bitmap_bh = ext4_read_block_bitmap(sb, group);
|
||||
if (IS_ERR(bitmap_bh)) {
|
||||
*errp = PTR_ERR(bitmap_bh);
|
||||
pr_warn("Failed to read block bitmap\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
ext4_get_group_no_and_offset(sb,
|
||||
max(ext4_group_first_block_no(sb, group), goal),
|
||||
NULL, &blkoff);
|
||||
i = mb_find_next_zero_bit(bitmap_bh->b_data, sb->s_blocksize,
|
||||
blkoff);
|
||||
brelse(bitmap_bh);
|
||||
if (i >= sb->s_blocksize)
|
||||
continue;
|
||||
if (ext4_fc_replay_check_excluded(sb,
|
||||
ext4_group_first_block_no(sb, group) + i))
|
||||
continue;
|
||||
break;
|
||||
}
|
||||
|
||||
if (group >= ext4_get_groups_count(sb) && i >= sb->s_blocksize)
|
||||
return 0;
|
||||
|
||||
block = ext4_group_first_block_no(sb, group) + i;
|
||||
ext4_mb_mark_bb(sb, block, 1, 1);
|
||||
ar->len = 1;
|
||||
|
||||
return block;
|
||||
}
|
||||
|
||||
static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
|
||||
unsigned long count)
|
||||
{
|
||||
struct buffer_head *bitmap_bh;
|
||||
struct super_block *sb = inode->i_sb;
|
||||
struct ext4_group_desc *gdp;
|
||||
struct buffer_head *gdp_bh;
|
||||
ext4_group_t group;
|
||||
ext4_grpblk_t blkoff;
|
||||
int already_freed = 0, err, i;
|
||||
|
||||
ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
|
||||
bitmap_bh = ext4_read_block_bitmap(sb, group);
|
||||
if (IS_ERR(bitmap_bh)) {
|
||||
err = PTR_ERR(bitmap_bh);
|
||||
pr_warn("Failed to read block bitmap\n");
|
||||
return;
|
||||
}
|
||||
gdp = ext4_get_group_desc(sb, group, &gdp_bh);
|
||||
if (!gdp)
|
||||
return;
|
||||
|
||||
for (i = 0; i < count; i++) {
|
||||
if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
|
||||
already_freed++;
|
||||
}
|
||||
mb_clear_bits(bitmap_bh->b_data, blkoff, count);
|
||||
err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
|
||||
if (err)
|
||||
return;
|
||||
ext4_free_group_clusters_set(
|
||||
sb, gdp, ext4_free_group_clusters(sb, gdp) +
|
||||
count - already_freed);
|
||||
ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
|
||||
ext4_group_desc_csum_set(sb, group, gdp);
|
||||
ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
|
||||
sync_dirty_buffer(bitmap_bh);
|
||||
sync_dirty_buffer(gdp_bh);
|
||||
brelse(bitmap_bh);
|
||||
}
|
||||
|
||||
/**
|
||||
* ext4_free_blocks() -- Free given blocks and update quota
|
||||
* @handle: handle for this transaction
|
||||
@ -5104,6 +5277,13 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
|
||||
int err = 0;
|
||||
int ret;
|
||||
|
||||
sbi = EXT4_SB(sb);
|
||||
|
||||
if (sbi->s_mount_state & EXT4_FC_REPLAY) {
|
||||
ext4_free_blocks_simple(inode, block, count);
|
||||
return;
|
||||
}
|
||||
|
||||
might_sleep();
|
||||
if (bh) {
|
||||
if (block)
|
||||
@ -5112,7 +5292,6 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
|
||||
block = bh->b_blocknr;
|
||||
}
|
||||
|
||||
sbi = EXT4_SB(sb);
|
||||
if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
|
||||
!ext4_inode_block_valid(inode, block, count)) {
|
||||
ext4_error(sb, "Freeing blocks not in datazone - "
|
||||
|
@ -85,15 +85,11 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
|
||||
}
|
||||
}
|
||||
|
||||
get_bh(*bh);
|
||||
lock_buffer(*bh);
|
||||
(*bh)->b_end_io = end_buffer_read_sync;
|
||||
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, *bh);
|
||||
wait_on_buffer(*bh);
|
||||
if (!buffer_uptodate(*bh)) {
|
||||
ret = -EIO;
|
||||
ret = ext4_read_bh(*bh, REQ_META | REQ_PRIO, NULL);
|
||||
if (ret)
|
||||
goto warn_exit;
|
||||
}
|
||||
|
||||
mmp = (struct mmp_struct *)((*bh)->b_data);
|
||||
if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC) {
|
||||
ret = -EFSCORRUPTED;
|
||||
|
@ -215,7 +215,7 @@ mext_page_mkuptodate(struct page *page, unsigned from, unsigned to)
|
||||
for (i = 0; i < nr; i++) {
|
||||
bh = arr[i];
|
||||
if (!bh_uptodate_or_lock(bh)) {
|
||||
err = bh_submit_read(bh);
|
||||
err = ext4_read_bh(bh, 0, NULL);
|
||||
if (err)
|
||||
return err;
|
||||
}
|
||||
|
206
fs/ext4/namei.c
206
fs/ext4/namei.c
@ -2658,7 +2658,7 @@ static int ext4_delete_entry(handle_t *handle,
|
||||
* for checking S_ISDIR(inode) (since the INODE_INDEX feature will not be set
|
||||
* on regular files) and to avoid creating huge/slow non-HTREE directories.
|
||||
*/
|
||||
static void ext4_inc_count(handle_t *handle, struct inode *inode)
|
||||
static void ext4_inc_count(struct inode *inode)
|
||||
{
|
||||
inc_nlink(inode);
|
||||
if (is_dx(inode) &&
|
||||
@ -2670,7 +2670,7 @@ static void ext4_inc_count(handle_t *handle, struct inode *inode)
|
||||
* If a directory had nlink == 1, then we should let it be 1. This indicates
|
||||
* directory has >EXT4_LINK_MAX subdirs.
|
||||
*/
|
||||
static void ext4_dec_count(handle_t *handle, struct inode *inode)
|
||||
static void ext4_dec_count(struct inode *inode)
|
||||
{
|
||||
if (!S_ISDIR(inode->i_mode) || inode->i_nlink > 2)
|
||||
drop_nlink(inode);
|
||||
@ -2715,7 +2715,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
|
||||
bool excl)
|
||||
{
|
||||
handle_t *handle;
|
||||
struct inode *inode;
|
||||
struct inode *inode, *inode_save;
|
||||
int err, credits, retries = 0;
|
||||
|
||||
err = dquot_initialize(dir);
|
||||
@ -2733,7 +2733,11 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
|
||||
inode->i_op = &ext4_file_inode_operations;
|
||||
inode->i_fop = &ext4_file_operations;
|
||||
ext4_set_aops(inode);
|
||||
inode_save = inode;
|
||||
ihold(inode_save);
|
||||
err = ext4_add_nondir(handle, dentry, &inode);
|
||||
ext4_fc_track_create(inode_save, dentry);
|
||||
iput(inode_save);
|
||||
}
|
||||
if (handle)
|
||||
ext4_journal_stop(handle);
|
||||
@ -2748,7 +2752,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
|
||||
umode_t mode, dev_t rdev)
|
||||
{
|
||||
handle_t *handle;
|
||||
struct inode *inode;
|
||||
struct inode *inode, *inode_save;
|
||||
int err, credits, retries = 0;
|
||||
|
||||
err = dquot_initialize(dir);
|
||||
@ -2765,7 +2769,12 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
|
||||
if (!IS_ERR(inode)) {
|
||||
init_special_inode(inode, inode->i_mode, rdev);
|
||||
inode->i_op = &ext4_special_inode_operations;
|
||||
inode_save = inode;
|
||||
ihold(inode_save);
|
||||
err = ext4_add_nondir(handle, dentry, &inode);
|
||||
if (!err)
|
||||
ext4_fc_track_create(inode_save, dentry);
|
||||
iput(inode_save);
|
||||
}
|
||||
if (handle)
|
||||
ext4_journal_stop(handle);
|
||||
@ -2845,7 +2854,7 @@ struct ext4_dir_entry_2 *ext4_init_dot_dotdot(struct inode *inode,
|
||||
return ext4_next_entry(de, blocksize);
|
||||
}
|
||||
|
||||
static int ext4_init_new_dir(handle_t *handle, struct inode *dir,
|
||||
int ext4_init_new_dir(handle_t *handle, struct inode *dir,
|
||||
struct inode *inode)
|
||||
{
|
||||
struct buffer_head *dir_block = NULL;
|
||||
@ -2930,7 +2939,9 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
|
||||
iput(inode);
|
||||
goto out_retry;
|
||||
}
|
||||
ext4_inc_count(handle, dir);
|
||||
ext4_fc_track_create(inode, dentry);
|
||||
ext4_inc_count(dir);
|
||||
|
||||
ext4_update_dx_flag(dir);
|
||||
err = ext4_mark_inode_dirty(handle, dir);
|
||||
if (err)
|
||||
@ -3271,8 +3282,9 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
|
||||
retval = ext4_mark_inode_dirty(handle, inode);
|
||||
if (retval)
|
||||
goto end_rmdir;
|
||||
ext4_dec_count(handle, dir);
|
||||
ext4_dec_count(dir);
|
||||
ext4_update_dx_flag(dir);
|
||||
ext4_fc_track_unlink(inode, dentry);
|
||||
retval = ext4_mark_inode_dirty(handle, dir);
|
||||
|
||||
#ifdef CONFIG_UNICODE
|
||||
@ -3293,43 +3305,33 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
|
||||
return retval;
|
||||
}
|
||||
|
||||
static int ext4_unlink(struct inode *dir, struct dentry *dentry)
|
||||
int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
|
||||
struct inode *inode)
|
||||
{
|
||||
int retval;
|
||||
struct inode *inode;
|
||||
int retval = -ENOENT;
|
||||
struct buffer_head *bh;
|
||||
struct ext4_dir_entry_2 *de;
|
||||
handle_t *handle = NULL;
|
||||
int skip_remove_dentry = 0;
|
||||
ext4_lblk_t lblk;
|
||||
|
||||
if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
|
||||
return -EIO;
|
||||
bh = ext4_find_entry(dir, d_name, &de, NULL, &lblk);
|
||||
if (IS_ERR(bh))
|
||||
return PTR_ERR(bh);
|
||||
|
||||
trace_ext4_unlink_enter(dir, dentry);
|
||||
/* Initialize quotas before so that eventual writes go
|
||||
* in separate transaction */
|
||||
retval = dquot_initialize(dir);
|
||||
if (retval)
|
||||
goto out_trace;
|
||||
retval = dquot_initialize(d_inode(dentry));
|
||||
if (retval)
|
||||
goto out_trace;
|
||||
|
||||
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL, &lblk);
|
||||
if (IS_ERR(bh)) {
|
||||
retval = PTR_ERR(bh);
|
||||
goto out_trace;
|
||||
}
|
||||
if (!bh) {
|
||||
retval = -ENOENT;
|
||||
goto out_trace;
|
||||
}
|
||||
|
||||
inode = d_inode(dentry);
|
||||
if (!bh)
|
||||
return -ENOENT;
|
||||
|
||||
if (le32_to_cpu(de->inode) != inode->i_ino) {
|
||||
retval = -EFSCORRUPTED;
|
||||
goto out_bh;
|
||||
/*
|
||||
* It's okay if we find dont find dentry which matches
|
||||
* the inode. That's because it might have gotten
|
||||
* renamed to a different inode number
|
||||
*/
|
||||
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|
||||
skip_remove_dentry = 1;
|
||||
else
|
||||
goto out_bh;
|
||||
}
|
||||
|
||||
handle = ext4_journal_start(dir, EXT4_HT_DIR,
|
||||
@ -3342,17 +3344,21 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
|
||||
if (IS_DIRSYNC(dir))
|
||||
ext4_handle_sync(handle);
|
||||
|
||||
retval = ext4_delete_entry(handle, dir, de, lblk, bh);
|
||||
if (retval)
|
||||
goto out_handle;
|
||||
dir->i_ctime = dir->i_mtime = current_time(dir);
|
||||
ext4_update_dx_flag(dir);
|
||||
retval = ext4_mark_inode_dirty(handle, dir);
|
||||
if (retval)
|
||||
goto out_handle;
|
||||
if (!skip_remove_dentry) {
|
||||
retval = ext4_delete_entry(handle, dir, de, lblk, bh);
|
||||
if (retval)
|
||||
goto out_handle;
|
||||
dir->i_ctime = dir->i_mtime = current_time(dir);
|
||||
ext4_update_dx_flag(dir);
|
||||
retval = ext4_mark_inode_dirty(handle, dir);
|
||||
if (retval)
|
||||
goto out_handle;
|
||||
} else {
|
||||
retval = 0;
|
||||
}
|
||||
if (inode->i_nlink == 0)
|
||||
ext4_warning_inode(inode, "Deleting file '%.*s' with no links",
|
||||
dentry->d_name.len, dentry->d_name.name);
|
||||
d_name->len, d_name->name);
|
||||
else
|
||||
drop_nlink(inode);
|
||||
if (!inode->i_nlink)
|
||||
@ -3360,6 +3366,35 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
|
||||
inode->i_ctime = current_time(inode);
|
||||
retval = ext4_mark_inode_dirty(handle, inode);
|
||||
|
||||
out_handle:
|
||||
ext4_journal_stop(handle);
|
||||
out_bh:
|
||||
brelse(bh);
|
||||
return retval;
|
||||
}
|
||||
|
||||
static int ext4_unlink(struct inode *dir, struct dentry *dentry)
|
||||
{
|
||||
int retval;
|
||||
|
||||
if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
|
||||
return -EIO;
|
||||
|
||||
trace_ext4_unlink_enter(dir, dentry);
|
||||
/*
|
||||
* Initialize quotas before so that eventual writes go
|
||||
* in separate transaction
|
||||
*/
|
||||
retval = dquot_initialize(dir);
|
||||
if (retval)
|
||||
goto out_trace;
|
||||
retval = dquot_initialize(d_inode(dentry));
|
||||
if (retval)
|
||||
goto out_trace;
|
||||
|
||||
retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry));
|
||||
if (!retval)
|
||||
ext4_fc_track_unlink(d_inode(dentry), dentry);
|
||||
#ifdef CONFIG_UNICODE
|
||||
/* VFS negative dentries are incompatible with Encoding and
|
||||
* Case-insensitiveness. Eventually we'll want avoid
|
||||
@ -3371,10 +3406,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
|
||||
d_invalidate(dentry);
|
||||
#endif
|
||||
|
||||
out_handle:
|
||||
ext4_journal_stop(handle);
|
||||
out_bh:
|
||||
brelse(bh);
|
||||
out_trace:
|
||||
trace_ext4_unlink_exit(dentry, retval);
|
||||
return retval;
|
||||
@ -3455,7 +3486,8 @@ static int ext4_symlink(struct inode *dir,
|
||||
*/
|
||||
drop_nlink(inode);
|
||||
err = ext4_orphan_add(handle, inode);
|
||||
ext4_journal_stop(handle);
|
||||
if (handle)
|
||||
ext4_journal_stop(handle);
|
||||
handle = NULL;
|
||||
if (err)
|
||||
goto err_drop_inode;
|
||||
@ -3509,29 +3541,10 @@ static int ext4_symlink(struct inode *dir,
|
||||
return err;
|
||||
}
|
||||
|
||||
static int ext4_link(struct dentry *old_dentry,
|
||||
struct inode *dir, struct dentry *dentry)
|
||||
int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
|
||||
{
|
||||
handle_t *handle;
|
||||
struct inode *inode = d_inode(old_dentry);
|
||||
int err, retries = 0;
|
||||
|
||||
if (inode->i_nlink >= EXT4_LINK_MAX)
|
||||
return -EMLINK;
|
||||
|
||||
err = fscrypt_prepare_link(old_dentry, dir, dentry);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
|
||||
(!projid_eq(EXT4_I(dir)->i_projid,
|
||||
EXT4_I(old_dentry->d_inode)->i_projid)))
|
||||
return -EXDEV;
|
||||
|
||||
err = dquot_initialize(dir);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
retry:
|
||||
handle = ext4_journal_start(dir, EXT4_HT_DIR,
|
||||
(EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
|
||||
@ -3543,11 +3556,12 @@ static int ext4_link(struct dentry *old_dentry,
|
||||
ext4_handle_sync(handle);
|
||||
|
||||
inode->i_ctime = current_time(inode);
|
||||
ext4_inc_count(handle, inode);
|
||||
ext4_inc_count(inode);
|
||||
ihold(inode);
|
||||
|
||||
err = ext4_add_entry(handle, dentry, inode);
|
||||
if (!err) {
|
||||
ext4_fc_track_link(inode, dentry);
|
||||
err = ext4_mark_inode_dirty(handle, inode);
|
||||
/* this can happen only for tmpfile being
|
||||
* linked the first time
|
||||
@ -3565,6 +3579,29 @@ static int ext4_link(struct dentry *old_dentry,
|
||||
return err;
|
||||
}
|
||||
|
||||
static int ext4_link(struct dentry *old_dentry,
|
||||
struct inode *dir, struct dentry *dentry)
|
||||
{
|
||||
struct inode *inode = d_inode(old_dentry);
|
||||
int err;
|
||||
|
||||
if (inode->i_nlink >= EXT4_LINK_MAX)
|
||||
return -EMLINK;
|
||||
|
||||
err = fscrypt_prepare_link(old_dentry, dir, dentry);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
|
||||
(!projid_eq(EXT4_I(dir)->i_projid,
|
||||
EXT4_I(old_dentry->d_inode)->i_projid)))
|
||||
return -EXDEV;
|
||||
|
||||
err = dquot_initialize(dir);
|
||||
if (err)
|
||||
return err;
|
||||
return __ext4_link(dir, inode, dentry);
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to find buffer head where contains the parent block.
|
||||
@ -3743,9 +3780,9 @@ static void ext4_update_dir_count(handle_t *handle, struct ext4_renament *ent)
|
||||
{
|
||||
if (ent->dir_nlink_delta) {
|
||||
if (ent->dir_nlink_delta == -1)
|
||||
ext4_dec_count(handle, ent->dir);
|
||||
ext4_dec_count(ent->dir);
|
||||
else
|
||||
ext4_inc_count(handle, ent->dir);
|
||||
ext4_inc_count(ent->dir);
|
||||
ext4_mark_inode_dirty(handle, ent->dir);
|
||||
}
|
||||
}
|
||||
@ -3958,7 +3995,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
|
||||
}
|
||||
|
||||
if (new.inode) {
|
||||
ext4_dec_count(handle, new.inode);
|
||||
ext4_dec_count(new.inode);
|
||||
new.inode->i_ctime = current_time(new.inode);
|
||||
}
|
||||
old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
|
||||
@ -3968,14 +4005,14 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
|
||||
if (retval)
|
||||
goto end_rename;
|
||||
|
||||
ext4_dec_count(handle, old.dir);
|
||||
ext4_dec_count(old.dir);
|
||||
if (new.inode) {
|
||||
/* checked ext4_empty_dir above, can't have another
|
||||
* parent, ext4_dec_count() won't work for many-linked
|
||||
* dirs */
|
||||
clear_nlink(new.inode);
|
||||
} else {
|
||||
ext4_inc_count(handle, new.dir);
|
||||
ext4_inc_count(new.dir);
|
||||
ext4_update_dx_flag(new.dir);
|
||||
retval = ext4_mark_inode_dirty(handle, new.dir);
|
||||
if (unlikely(retval))
|
||||
@ -3985,6 +4022,22 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
|
||||
retval = ext4_mark_inode_dirty(handle, old.dir);
|
||||
if (unlikely(retval))
|
||||
goto end_rename;
|
||||
|
||||
if (S_ISDIR(old.inode->i_mode)) {
|
||||
/*
|
||||
* We disable fast commits here that's because the
|
||||
* replay code is not yet capable of changing dot dot
|
||||
* dirents in directories.
|
||||
*/
|
||||
ext4_fc_mark_ineligible(old.inode->i_sb,
|
||||
EXT4_FC_REASON_RENAME_DIR);
|
||||
} else {
|
||||
if (new.inode)
|
||||
ext4_fc_track_unlink(new.inode, new.dentry);
|
||||
ext4_fc_track_link(old.inode, new.dentry);
|
||||
ext4_fc_track_unlink(old.inode, old.dentry);
|
||||
}
|
||||
|
||||
if (new.inode) {
|
||||
retval = ext4_mark_inode_dirty(handle, new.inode);
|
||||
if (unlikely(retval))
|
||||
@ -4128,7 +4181,8 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
|
||||
retval = ext4_mark_inode_dirty(handle, new.inode);
|
||||
if (unlikely(retval))
|
||||
goto end_rename;
|
||||
|
||||
ext4_fc_mark_ineligible(new.inode->i_sb,
|
||||
EXT4_FC_REASON_CROSS_RENAME);
|
||||
if (old.dir_bh) {
|
||||
retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
|
||||
if (retval)
|
||||
|
@ -843,8 +843,10 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
|
||||
|
||||
BUFFER_TRACE(dind, "get_write_access");
|
||||
err = ext4_journal_get_write_access(handle, dind);
|
||||
if (unlikely(err))
|
||||
if (unlikely(err)) {
|
||||
ext4_std_error(sb, err);
|
||||
goto errout;
|
||||
}
|
||||
|
||||
/* ext4_reserve_inode_write() gets a reference on the iloc */
|
||||
err = ext4_reserve_inode_write(handle, inode, &iloc);
|
||||
@ -1243,7 +1245,7 @@ static struct buffer_head *ext4_get_bitmap(struct super_block *sb, __u64 block)
|
||||
if (unlikely(!bh))
|
||||
return NULL;
|
||||
if (!bh_uptodate_or_lock(bh)) {
|
||||
if (bh_submit_read(bh) < 0) {
|
||||
if (ext4_read_bh(bh, 0, NULL) < 0) {
|
||||
brelse(bh);
|
||||
return NULL;
|
||||
}
|
||||
@ -1806,8 +1808,8 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
|
||||
o_blocks_count + add, add);
|
||||
|
||||
/* See if the device is actually as big as what was requested */
|
||||
bh = sb_bread(sb, o_blocks_count + add - 1);
|
||||
if (!bh) {
|
||||
bh = ext4_sb_bread(sb, o_blocks_count + add - 1, 0);
|
||||
if (IS_ERR(bh)) {
|
||||
ext4_warning(sb, "can't read last block, resize aborted");
|
||||
return -ENOSPC;
|
||||
}
|
||||
@ -1932,8 +1934,8 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
|
||||
int meta_bg;
|
||||
|
||||
/* See if the device is actually as big as what was requested */
|
||||
bh = sb_bread(sb, n_blocks_count - 1);
|
||||
if (!bh) {
|
||||
bh = ext4_sb_bread(sb, n_blocks_count - 1, 0);
|
||||
if (IS_ERR(bh)) {
|
||||
ext4_warning(sb, "can't read last block, resize aborted");
|
||||
return -ENOSPC;
|
||||
}
|
||||
|
352
fs/ext4/super.c
352
fs/ext4/super.c
@ -141,27 +141,115 @@ MODULE_ALIAS_FS("ext3");
|
||||
MODULE_ALIAS("ext3");
|
||||
#define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == &ext3_fs_type)
|
||||
|
||||
|
||||
static inline void __ext4_read_bh(struct buffer_head *bh, int op_flags,
|
||||
bh_end_io_t *end_io)
|
||||
{
|
||||
/*
|
||||
* buffer's verified bit is no longer valid after reading from
|
||||
* disk again due to write out error, clear it to make sure we
|
||||
* recheck the buffer contents.
|
||||
*/
|
||||
clear_buffer_verified(bh);
|
||||
|
||||
bh->b_end_io = end_io ? end_io : end_buffer_read_sync;
|
||||
get_bh(bh);
|
||||
submit_bh(REQ_OP_READ, op_flags, bh);
|
||||
}
|
||||
|
||||
void ext4_read_bh_nowait(struct buffer_head *bh, int op_flags,
|
||||
bh_end_io_t *end_io)
|
||||
{
|
||||
BUG_ON(!buffer_locked(bh));
|
||||
|
||||
if (ext4_buffer_uptodate(bh)) {
|
||||
unlock_buffer(bh);
|
||||
return;
|
||||
}
|
||||
__ext4_read_bh(bh, op_flags, end_io);
|
||||
}
|
||||
|
||||
int ext4_read_bh(struct buffer_head *bh, int op_flags, bh_end_io_t *end_io)
|
||||
{
|
||||
BUG_ON(!buffer_locked(bh));
|
||||
|
||||
if (ext4_buffer_uptodate(bh)) {
|
||||
unlock_buffer(bh);
|
||||
return 0;
|
||||
}
|
||||
|
||||
__ext4_read_bh(bh, op_flags, end_io);
|
||||
|
||||
wait_on_buffer(bh);
|
||||
if (buffer_uptodate(bh))
|
||||
return 0;
|
||||
return -EIO;
|
||||
}
|
||||
|
||||
int ext4_read_bh_lock(struct buffer_head *bh, int op_flags, bool wait)
|
||||
{
|
||||
if (trylock_buffer(bh)) {
|
||||
if (wait)
|
||||
return ext4_read_bh(bh, op_flags, NULL);
|
||||
ext4_read_bh_nowait(bh, op_flags, NULL);
|
||||
return 0;
|
||||
}
|
||||
if (wait) {
|
||||
wait_on_buffer(bh);
|
||||
if (buffer_uptodate(bh))
|
||||
return 0;
|
||||
return -EIO;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* This works like sb_bread() except it uses ERR_PTR for error
|
||||
* This works like __bread_gfp() except it uses ERR_PTR for error
|
||||
* returns. Currently with sb_bread it's impossible to distinguish
|
||||
* between ENOMEM and EIO situations (since both result in a NULL
|
||||
* return.
|
||||
*/
|
||||
struct buffer_head *
|
||||
ext4_sb_bread(struct super_block *sb, sector_t block, int op_flags)
|
||||
static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
|
||||
sector_t block, int op_flags,
|
||||
gfp_t gfp)
|
||||
{
|
||||
struct buffer_head *bh = sb_getblk(sb, block);
|
||||
struct buffer_head *bh;
|
||||
int ret;
|
||||
|
||||
bh = sb_getblk_gfp(sb, block, gfp);
|
||||
if (bh == NULL)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
if (ext4_buffer_uptodate(bh))
|
||||
return bh;
|
||||
ll_rw_block(REQ_OP_READ, REQ_META | op_flags, 1, &bh);
|
||||
wait_on_buffer(bh);
|
||||
if (buffer_uptodate(bh))
|
||||
return bh;
|
||||
put_bh(bh);
|
||||
return ERR_PTR(-EIO);
|
||||
|
||||
ret = ext4_read_bh_lock(bh, REQ_META | op_flags, true);
|
||||
if (ret) {
|
||||
put_bh(bh);
|
||||
return ERR_PTR(ret);
|
||||
}
|
||||
return bh;
|
||||
}
|
||||
|
||||
struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
|
||||
int op_flags)
|
||||
{
|
||||
return __ext4_sb_bread_gfp(sb, block, op_flags, __GFP_MOVABLE);
|
||||
}
|
||||
|
||||
struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
|
||||
sector_t block)
|
||||
{
|
||||
return __ext4_sb_bread_gfp(sb, block, 0, 0);
|
||||
}
|
||||
|
||||
void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
|
||||
{
|
||||
struct buffer_head *bh = sb_getblk_gfp(sb, block, 0);
|
||||
|
||||
if (likely(bh)) {
|
||||
ext4_read_bh_lock(bh, REQ_RAHEAD, false);
|
||||
brelse(bh);
|
||||
}
|
||||
}
|
||||
|
||||
static int ext4_verify_csum_type(struct super_block *sb,
|
||||
@ -201,7 +289,18 @@ void ext4_superblock_csum_set(struct super_block *sb)
|
||||
if (!ext4_has_metadata_csum(sb))
|
||||
return;
|
||||
|
||||
/*
|
||||
* Locking the superblock prevents the scenario
|
||||
* where:
|
||||
* 1) a first thread pauses during checksum calculation.
|
||||
* 2) a second thread updates the superblock, recalculates
|
||||
* the checksum, and updates s_checksum
|
||||
* 3) the first thread resumes and finishes its checksum calculation
|
||||
* and updates s_checksum with a potentially stale or torn value.
|
||||
*/
|
||||
lock_buffer(EXT4_SB(sb)->s_sbh);
|
||||
es->s_checksum = ext4_superblock_csum(sb, es);
|
||||
unlock_buffer(EXT4_SB(sb)->s_sbh);
|
||||
}
|
||||
|
||||
ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
|
||||
@ -472,6 +571,89 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
|
||||
spin_unlock(&sbi->s_md_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* This writepage callback for write_cache_pages()
|
||||
* takes care of a few cases after page cleaning.
|
||||
*
|
||||
* write_cache_pages() already checks for dirty pages
|
||||
* and calls clear_page_dirty_for_io(), which we want,
|
||||
* to write protect the pages.
|
||||
*
|
||||
* However, we may have to redirty a page (see below.)
|
||||
*/
|
||||
static int ext4_journalled_writepage_callback(struct page *page,
|
||||
struct writeback_control *wbc,
|
||||
void *data)
|
||||
{
|
||||
transaction_t *transaction = (transaction_t *) data;
|
||||
struct buffer_head *bh, *head;
|
||||
struct journal_head *jh;
|
||||
|
||||
bh = head = page_buffers(page);
|
||||
do {
|
||||
/*
|
||||
* We have to redirty a page in these cases:
|
||||
* 1) If buffer is dirty, it means the page was dirty because it
|
||||
* contains a buffer that needs checkpointing. So the dirty bit
|
||||
* needs to be preserved so that checkpointing writes the buffer
|
||||
* properly.
|
||||
* 2) If buffer is not part of the committing transaction
|
||||
* (we may have just accidentally come across this buffer because
|
||||
* inode range tracking is not exact) or if the currently running
|
||||
* transaction already contains this buffer as well, dirty bit
|
||||
* needs to be preserved so that the buffer gets writeprotected
|
||||
* properly on running transaction's commit.
|
||||
*/
|
||||
jh = bh2jh(bh);
|
||||
if (buffer_dirty(bh) ||
|
||||
(jh && (jh->b_transaction != transaction ||
|
||||
jh->b_next_transaction))) {
|
||||
redirty_page_for_writepage(wbc, page);
|
||||
goto out;
|
||||
}
|
||||
} while ((bh = bh->b_this_page) != head);
|
||||
|
||||
out:
|
||||
return AOP_WRITEPAGE_ACTIVATE;
|
||||
}
|
||||
|
||||
static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode)
|
||||
{
|
||||
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
|
||||
struct writeback_control wbc = {
|
||||
.sync_mode = WB_SYNC_ALL,
|
||||
.nr_to_write = LONG_MAX,
|
||||
.range_start = jinode->i_dirty_start,
|
||||
.range_end = jinode->i_dirty_end,
|
||||
};
|
||||
|
||||
return write_cache_pages(mapping, &wbc,
|
||||
ext4_journalled_writepage_callback,
|
||||
jinode->i_transaction);
|
||||
}
|
||||
|
||||
static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
|
||||
{
|
||||
int ret;
|
||||
|
||||
if (ext4_should_journal_data(jinode->i_vfs_inode))
|
||||
ret = ext4_journalled_submit_inode_data_buffers(jinode);
|
||||
else
|
||||
ret = jbd2_journal_submit_inode_data_buffers(jinode);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
|
||||
{
|
||||
int ret = 0;
|
||||
|
||||
if (!ext4_should_journal_data(jinode->i_vfs_inode))
|
||||
ret = jbd2_journal_finish_inode_data_buffers(jinode);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static bool system_going_down(void)
|
||||
{
|
||||
return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF
|
||||
@ -939,10 +1121,10 @@ static void ext4_blkdev_put(struct block_device *bdev)
|
||||
static void ext4_blkdev_remove(struct ext4_sb_info *sbi)
|
||||
{
|
||||
struct block_device *bdev;
|
||||
bdev = sbi->journal_bdev;
|
||||
bdev = sbi->s_journal_bdev;
|
||||
if (bdev) {
|
||||
ext4_blkdev_put(bdev);
|
||||
sbi->journal_bdev = NULL;
|
||||
sbi->s_journal_bdev = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
@ -1073,14 +1255,14 @@ static void ext4_put_super(struct super_block *sb)
|
||||
|
||||
sync_blockdev(sb->s_bdev);
|
||||
invalidate_bdev(sb->s_bdev);
|
||||
if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {
|
||||
if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) {
|
||||
/*
|
||||
* Invalidate the journal device's buffers. We don't want them
|
||||
* floating about in memory - the physical journal device may
|
||||
* hotswapped, and it breaks the `ro-after' testing code.
|
||||
*/
|
||||
sync_blockdev(sbi->journal_bdev);
|
||||
invalidate_bdev(sbi->journal_bdev);
|
||||
sync_blockdev(sbi->s_journal_bdev);
|
||||
invalidate_bdev(sbi->s_journal_bdev);
|
||||
ext4_blkdev_remove(sbi);
|
||||
}
|
||||
|
||||
@ -1149,6 +1331,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
|
||||
ei->i_datasync_tid = 0;
|
||||
atomic_set(&ei->i_unwritten, 0);
|
||||
INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
|
||||
ext4_fc_init_inode(&ei->vfs_inode);
|
||||
mutex_init(&ei->i_fc_lock);
|
||||
return &ei->vfs_inode;
|
||||
}
|
||||
|
||||
@ -1166,6 +1350,10 @@ static int ext4_drop_inode(struct inode *inode)
|
||||
static void ext4_free_in_core_inode(struct inode *inode)
|
||||
{
|
||||
fscrypt_free_inode(inode);
|
||||
if (!list_empty(&(EXT4_I(inode)->i_fc_list))) {
|
||||
pr_warn("%s: inode %ld still in fc list",
|
||||
__func__, inode->i_ino);
|
||||
}
|
||||
kmem_cache_free(ext4_inode_cachep, EXT4_I(inode));
|
||||
}
|
||||
|
||||
@ -1191,6 +1379,7 @@ static void init_once(void *foo)
|
||||
init_rwsem(&ei->i_data_sem);
|
||||
init_rwsem(&ei->i_mmap_sem);
|
||||
inode_init_once(&ei->vfs_inode);
|
||||
ext4_fc_init_inode(&ei->vfs_inode);
|
||||
}
|
||||
|
||||
static int __init init_inodecache(void)
|
||||
@ -1219,6 +1408,7 @@ static void destroy_inodecache(void)
|
||||
|
||||
void ext4_clear_inode(struct inode *inode)
|
||||
{
|
||||
ext4_fc_del(inode);
|
||||
invalidate_inode_buffers(inode);
|
||||
clear_inode(inode);
|
||||
ext4_discard_preallocations(inode, 0);
|
||||
@ -1526,7 +1716,11 @@ enum {
|
||||
Opt_dioread_nolock, Opt_dioread_lock,
|
||||
Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable,
|
||||
Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache,
|
||||
Opt_prefetch_block_bitmaps,
|
||||
Opt_prefetch_block_bitmaps, Opt_no_fc,
|
||||
#ifdef CONFIG_EXT4_DEBUG
|
||||
Opt_fc_debug_max_replay,
|
||||
#endif
|
||||
Opt_fc_debug_force
|
||||
};
|
||||
|
||||
static const match_table_t tokens = {
|
||||
@ -1613,6 +1807,11 @@ static const match_table_t tokens = {
|
||||
{Opt_init_itable, "init_itable=%u"},
|
||||
{Opt_init_itable, "init_itable"},
|
||||
{Opt_noinit_itable, "noinit_itable"},
|
||||
{Opt_no_fc, "no_fc"},
|
||||
{Opt_fc_debug_force, "fc_debug_force"},
|
||||
#ifdef CONFIG_EXT4_DEBUG
|
||||
{Opt_fc_debug_max_replay, "fc_debug_max_replay=%u"},
|
||||
#endif
|
||||
{Opt_max_dir_size_kb, "max_dir_size_kb=%u"},
|
||||
{Opt_test_dummy_encryption, "test_dummy_encryption=%s"},
|
||||
{Opt_test_dummy_encryption, "test_dummy_encryption"},
|
||||
@ -1739,6 +1938,7 @@ static int clear_qf_name(struct super_block *sb, int qtype)
|
||||
#define MOPT_EXT4_ONLY (MOPT_NO_EXT2 | MOPT_NO_EXT3)
|
||||
#define MOPT_STRING 0x0400
|
||||
#define MOPT_SKIP 0x0800
|
||||
#define MOPT_2 0x1000
|
||||
|
||||
static const struct mount_opts {
|
||||
int token;
|
||||
@ -1839,6 +2039,13 @@ static const struct mount_opts {
|
||||
{Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET},
|
||||
{Opt_prefetch_block_bitmaps, EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS,
|
||||
MOPT_SET},
|
||||
{Opt_no_fc, EXT4_MOUNT2_JOURNAL_FAST_COMMIT,
|
||||
MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY},
|
||||
{Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT,
|
||||
MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY},
|
||||
#ifdef CONFIG_EXT4_DEBUG
|
||||
{Opt_fc_debug_max_replay, 0, MOPT_GTE0},
|
||||
#endif
|
||||
{Opt_err, 0, 0}
|
||||
};
|
||||
|
||||
@ -2048,6 +2255,10 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
|
||||
sbi->s_li_wait_mult = arg;
|
||||
} else if (token == Opt_max_dir_size_kb) {
|
||||
sbi->s_max_dir_size_kb = arg;
|
||||
#ifdef CONFIG_EXT4_DEBUG
|
||||
} else if (token == Opt_fc_debug_max_replay) {
|
||||
sbi->s_fc_debug_max_replay = arg;
|
||||
#endif
|
||||
} else if (token == Opt_stripe) {
|
||||
sbi->s_stripe = arg;
|
||||
} else if (token == Opt_resuid) {
|
||||
@ -2216,10 +2427,17 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
|
||||
WARN_ON(1);
|
||||
return -1;
|
||||
}
|
||||
if (arg != 0)
|
||||
sbi->s_mount_opt |= m->mount_opt;
|
||||
else
|
||||
sbi->s_mount_opt &= ~m->mount_opt;
|
||||
if (m->flags & MOPT_2) {
|
||||
if (arg != 0)
|
||||
sbi->s_mount_opt2 |= m->mount_opt;
|
||||
else
|
||||
sbi->s_mount_opt2 &= ~m->mount_opt;
|
||||
} else {
|
||||
if (arg != 0)
|
||||
sbi->s_mount_opt |= m->mount_opt;
|
||||
else
|
||||
sbi->s_mount_opt &= ~m->mount_opt;
|
||||
}
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
@ -2436,6 +2654,9 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
|
||||
SEQ_OPTS_PUTS("dax=inode");
|
||||
}
|
||||
|
||||
if (test_opt2(sb, JOURNAL_FAST_COMMIT))
|
||||
SEQ_OPTS_PUTS("fast_commit");
|
||||
|
||||
ext4_show_quota_options(seq, sb);
|
||||
return 0;
|
||||
}
|
||||
@ -3754,7 +3975,7 @@ int ext4_calculate_overhead(struct super_block *sb)
|
||||
* Add the internal journal blocks whether the journal has been
|
||||
* loaded or not
|
||||
*/
|
||||
if (sbi->s_journal && !sbi->journal_bdev)
|
||||
if (sbi->s_journal && !sbi->s_journal_bdev)
|
||||
overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_maxlen);
|
||||
else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) {
|
||||
/* j_inum for internal journal is non-zero */
|
||||
@ -3868,8 +4089,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
logical_sb_block = sb_block;
|
||||
}
|
||||
|
||||
if (!(bh = sb_bread_unmovable(sb, logical_sb_block))) {
|
||||
bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
|
||||
if (IS_ERR(bh)) {
|
||||
ext4_msg(sb, KERN_ERR, "unable to read superblock");
|
||||
ret = PTR_ERR(bh);
|
||||
bh = NULL;
|
||||
goto out_fail;
|
||||
}
|
||||
/*
|
||||
@ -3936,6 +4160,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
#ifdef CONFIG_EXT4_FS_POSIX_ACL
|
||||
set_opt(sb, POSIX_ACL);
|
||||
#endif
|
||||
if (ext4_has_feature_fast_commit(sb))
|
||||
set_opt2(sb, JOURNAL_FAST_COMMIT);
|
||||
/* don't forget to enable journal_csum when metadata_csum is enabled. */
|
||||
if (ext4_has_metadata_csum(sb))
|
||||
set_opt(sb, JOURNAL_CHECKSUM);
|
||||
@ -4259,10 +4485,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
brelse(bh);
|
||||
logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
|
||||
offset = do_div(logical_sb_block, blocksize);
|
||||
bh = sb_bread_unmovable(sb, logical_sb_block);
|
||||
if (!bh) {
|
||||
bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
|
||||
if (IS_ERR(bh)) {
|
||||
ext4_msg(sb, KERN_ERR,
|
||||
"Can't read superblock on 2nd try");
|
||||
ret = PTR_ERR(bh);
|
||||
bh = NULL;
|
||||
goto failed_mount;
|
||||
}
|
||||
es = (struct ext4_super_block *)(bh->b_data + offset);
|
||||
@ -4474,18 +4702,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
/* Pre-read the descriptors into the buffer cache */
|
||||
for (i = 0; i < db_count; i++) {
|
||||
block = descriptor_loc(sb, logical_sb_block, i);
|
||||
sb_breadahead_unmovable(sb, block);
|
||||
ext4_sb_breadahead_unmovable(sb, block);
|
||||
}
|
||||
|
||||
for (i = 0; i < db_count; i++) {
|
||||
struct buffer_head *bh;
|
||||
|
||||
block = descriptor_loc(sb, logical_sb_block, i);
|
||||
bh = sb_bread_unmovable(sb, block);
|
||||
if (!bh) {
|
||||
bh = ext4_sb_bread_unmovable(sb, block);
|
||||
if (IS_ERR(bh)) {
|
||||
ext4_msg(sb, KERN_ERR,
|
||||
"can't read group descriptor %d", i);
|
||||
db_count = i;
|
||||
ret = PTR_ERR(bh);
|
||||
bh = NULL;
|
||||
goto failed_mount2;
|
||||
}
|
||||
rcu_read_lock();
|
||||
@ -4533,6 +4763,26 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
|
||||
mutex_init(&sbi->s_orphan_lock);
|
||||
|
||||
/* Initialize fast commit stuff */
|
||||
atomic_set(&sbi->s_fc_subtid, 0);
|
||||
atomic_set(&sbi->s_fc_ineligible_updates, 0);
|
||||
INIT_LIST_HEAD(&sbi->s_fc_q[FC_Q_MAIN]);
|
||||
INIT_LIST_HEAD(&sbi->s_fc_q[FC_Q_STAGING]);
|
||||
INIT_LIST_HEAD(&sbi->s_fc_dentry_q[FC_Q_MAIN]);
|
||||
INIT_LIST_HEAD(&sbi->s_fc_dentry_q[FC_Q_STAGING]);
|
||||
sbi->s_fc_bytes = 0;
|
||||
sbi->s_mount_state &= ~EXT4_FC_INELIGIBLE;
|
||||
sbi->s_mount_state &= ~EXT4_FC_COMMITTING;
|
||||
spin_lock_init(&sbi->s_fc_lock);
|
||||
memset(&sbi->s_fc_stats, 0, sizeof(sbi->s_fc_stats));
|
||||
sbi->s_fc_replay_state.fc_regions = NULL;
|
||||
sbi->s_fc_replay_state.fc_regions_size = 0;
|
||||
sbi->s_fc_replay_state.fc_regions_used = 0;
|
||||
sbi->s_fc_replay_state.fc_regions_valid = 0;
|
||||
sbi->s_fc_replay_state.fc_modified_inodes = NULL;
|
||||
sbi->s_fc_replay_state.fc_modified_inodes_size = 0;
|
||||
sbi->s_fc_replay_state.fc_modified_inodes_used = 0;
|
||||
|
||||
sb->s_root = NULL;
|
||||
|
||||
needs_recovery = (es->s_last_orphan != 0 ||
|
||||
@ -4582,6 +4832,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM;
|
||||
clear_opt(sb, JOURNAL_CHECKSUM);
|
||||
clear_opt(sb, DATA_FLAGS);
|
||||
clear_opt2(sb, JOURNAL_FAST_COMMIT);
|
||||
sbi->s_journal = NULL;
|
||||
needs_recovery = 0;
|
||||
goto no_journal;
|
||||
@ -4640,6 +4891,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);
|
||||
|
||||
sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
|
||||
sbi->s_journal->j_submit_inode_data_buffers =
|
||||
ext4_journal_submit_inode_data_buffers;
|
||||
sbi->s_journal->j_finish_inode_data_buffers =
|
||||
ext4_journal_finish_inode_data_buffers;
|
||||
|
||||
no_journal:
|
||||
if (!test_opt(sb, NO_MBCACHE)) {
|
||||
@ -4737,6 +4992,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
goto failed_mount4a;
|
||||
}
|
||||
}
|
||||
ext4_fc_replay_cleanup(sb);
|
||||
|
||||
ext4_ext_init(sb);
|
||||
err = ext4_mb_init(sb);
|
||||
@ -4803,9 +5059,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
* used to detect the metadata async write error.
|
||||
*/
|
||||
spin_lock_init(&sbi->s_bdev_wb_lock);
|
||||
if (!sb_rdonly(sb))
|
||||
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
|
||||
&sbi->s_bdev_wb_err);
|
||||
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
|
||||
&sbi->s_bdev_wb_err);
|
||||
sb->s_bdev->bd_super = sb;
|
||||
EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
|
||||
ext4_orphan_cleanup(sb, es);
|
||||
@ -4861,6 +5116,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
|
||||
|
||||
failed_mount8:
|
||||
ext4_unregister_sysfs(sb);
|
||||
kobject_put(&sbi->s_kobj);
|
||||
failed_mount7:
|
||||
ext4_unregister_li_request(sb);
|
||||
failed_mount6:
|
||||
@ -4949,6 +5205,7 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal)
|
||||
journal->j_commit_interval = sbi->s_commit_interval;
|
||||
journal->j_min_batch_time = sbi->s_min_batch_time;
|
||||
journal->j_max_batch_time = sbi->s_max_batch_time;
|
||||
ext4_fc_init(sb, journal);
|
||||
|
||||
write_lock(&journal->j_state_lock);
|
||||
if (test_opt(sb, BARRIER))
|
||||
@ -5091,9 +5348,7 @@ static journal_t *ext4_get_dev_journal(struct super_block *sb,
|
||||
goto out_bdev;
|
||||
}
|
||||
journal->j_private = sb;
|
||||
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &journal->j_sb_buffer);
|
||||
wait_on_buffer(journal->j_sb_buffer);
|
||||
if (!buffer_uptodate(journal->j_sb_buffer)) {
|
||||
if (ext4_read_bh_lock(journal->j_sb_buffer, REQ_META | REQ_PRIO, true)) {
|
||||
ext4_msg(sb, KERN_ERR, "I/O error on journal device");
|
||||
goto out_journal;
|
||||
}
|
||||
@ -5103,7 +5358,7 @@ static journal_t *ext4_get_dev_journal(struct super_block *sb,
|
||||
be32_to_cpu(journal->j_superblock->s_nr_users));
|
||||
goto out_journal;
|
||||
}
|
||||
EXT4_SB(sb)->journal_bdev = bdev;
|
||||
EXT4_SB(sb)->s_journal_bdev = bdev;
|
||||
ext4_init_journal_params(sb, journal);
|
||||
return journal;
|
||||
|
||||
@ -5696,14 +5951,6 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
|
||||
goto restore_opts;
|
||||
}
|
||||
|
||||
/*
|
||||
* Update the original bdev mapping's wb_err value
|
||||
* which could be used to detect the metadata async
|
||||
* write error.
|
||||
*/
|
||||
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
|
||||
&sbi->s_bdev_wb_err);
|
||||
|
||||
/*
|
||||
* Mounting a RDONLY partition read-write, so reread
|
||||
* and store the current valid flag. (It may have
|
||||
@ -5749,7 +5996,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
|
||||
* Releasing of existing data is done when we are sure remount will
|
||||
* succeed.
|
||||
*/
|
||||
if (test_opt(sb, BLOCK_VALIDITY) && !sbi->system_blks) {
|
||||
if (test_opt(sb, BLOCK_VALIDITY) && !sbi->s_system_blks) {
|
||||
err = ext4_setup_system_zone(sb);
|
||||
if (err)
|
||||
goto restore_opts;
|
||||
@ -5775,7 +6022,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
|
||||
}
|
||||
}
|
||||
#endif
|
||||
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->system_blks)
|
||||
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
|
||||
ext4_release_system_zone(sb);
|
||||
|
||||
/*
|
||||
@ -5798,7 +6045,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
|
||||
sbi->s_commit_interval = old_opts.s_commit_interval;
|
||||
sbi->s_min_batch_time = old_opts.s_min_batch_time;
|
||||
sbi->s_max_batch_time = old_opts.s_max_batch_time;
|
||||
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->system_blks)
|
||||
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
|
||||
ext4_release_system_zone(sb);
|
||||
#ifdef CONFIG_QUOTA
|
||||
sbi->s_jquota_fmt = old_opts.s_jquota_fmt;
|
||||
@ -6031,6 +6278,11 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
|
||||
/* Quotafile not on the same filesystem? */
|
||||
if (path->dentry->d_sb != sb)
|
||||
return -EXDEV;
|
||||
|
||||
/* Quota already enabled for this file? */
|
||||
if (IS_NOQUOTA(d_inode(path->dentry)))
|
||||
return -EBUSY;
|
||||
|
||||
/* Journaling quota? */
|
||||
if (EXT4_SB(sb)->s_qf_names[type]) {
|
||||
/* Quotafile not in fs root? */
|
||||
@ -6298,6 +6550,10 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
|
||||
brelse(bh);
|
||||
out:
|
||||
if (inode->i_size < off + len) {
|
||||
ext4_fc_track_range(inode,
|
||||
(inode->i_size > 0 ? inode->i_size - 1 : 0)
|
||||
>> inode->i_sb->s_blocksize_bits,
|
||||
(off + len) >> inode->i_sb->s_blocksize_bits);
|
||||
i_size_write(inode, off + len);
|
||||
EXT4_I(inode)->i_disksize = inode->i_size;
|
||||
err2 = ext4_mark_inode_dirty(handle, inode);
|
||||
@ -6426,6 +6682,11 @@ static int __init ext4_init_fs(void)
|
||||
err = init_inodecache();
|
||||
if (err)
|
||||
goto out1;
|
||||
|
||||
err = ext4_fc_init_dentry_cache();
|
||||
if (err)
|
||||
goto out05;
|
||||
|
||||
register_as_ext3();
|
||||
register_as_ext2();
|
||||
err = register_filesystem(&ext4_fs_type);
|
||||
@ -6436,6 +6697,7 @@ static int __init ext4_init_fs(void)
|
||||
out:
|
||||
unregister_as_ext2();
|
||||
unregister_as_ext3();
|
||||
out05:
|
||||
destroy_inodecache();
|
||||
out1:
|
||||
ext4_exit_mballoc();
|
||||
|
@ -521,6 +521,8 @@ int ext4_register_sysfs(struct super_block *sb)
|
||||
proc_create_single_data("es_shrinker_info", S_IRUGO,
|
||||
sbi->s_proc, ext4_seq_es_shrinker_info_show,
|
||||
sb);
|
||||
proc_create_single_data("fc_info", 0444, sbi->s_proc,
|
||||
ext4_fc_info_show, sb);
|
||||
proc_create_seq_data("mb_groups", S_IRUGO, sbi->s_proc,
|
||||
&ext4_mb_seq_groups_ops, sb);
|
||||
}
|
||||
|
@ -2419,6 +2419,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
|
||||
if (IS_SYNC(inode))
|
||||
ext4_handle_sync(handle);
|
||||
}
|
||||
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
|
||||
|
||||
cleanup:
|
||||
brelse(is.iloc.bh);
|
||||
@ -2496,6 +2497,7 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
|
||||
if (error == 0)
|
||||
error = error2;
|
||||
}
|
||||
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
|
||||
|
||||
return error;
|
||||
}
|
||||
@ -2928,6 +2930,7 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
|
||||
error);
|
||||
goto cleanup;
|
||||
}
|
||||
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
|
||||
}
|
||||
error = 0;
|
||||
cleanup:
|
||||
|
106
fs/jbd2/commit.c
106
fs/jbd2/commit.c
@ -187,21 +187,49 @@ static int journal_wait_on_commit_record(journal_t *journal,
|
||||
* use writepages() because with delayed allocation we may be doing
|
||||
* block allocation in writepages().
|
||||
*/
|
||||
static int journal_submit_inode_data_buffers(struct address_space *mapping,
|
||||
loff_t dirty_start, loff_t dirty_end)
|
||||
int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
|
||||
{
|
||||
int ret;
|
||||
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
|
||||
struct writeback_control wbc = {
|
||||
.sync_mode = WB_SYNC_ALL,
|
||||
.nr_to_write = mapping->nrpages * 2,
|
||||
.range_start = dirty_start,
|
||||
.range_end = dirty_end,
|
||||
.range_start = jinode->i_dirty_start,
|
||||
.range_end = jinode->i_dirty_end,
|
||||
};
|
||||
|
||||
ret = generic_writepages(mapping, &wbc);
|
||||
return ret;
|
||||
/*
|
||||
* submit the inode data buffers. We use writepage
|
||||
* instead of writepages. Because writepages can do
|
||||
* block allocation with delalloc. We need to write
|
||||
* only allocated blocks here.
|
||||
*/
|
||||
return generic_writepages(mapping, &wbc);
|
||||
}
|
||||
|
||||
/* Send all the data buffers related to an inode */
|
||||
int jbd2_submit_inode_data(struct jbd2_inode *jinode)
|
||||
{
|
||||
|
||||
if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
|
||||
return 0;
|
||||
|
||||
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
|
||||
return jbd2_journal_submit_inode_data_buffers(jinode);
|
||||
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_submit_inode_data);
|
||||
|
||||
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
|
||||
{
|
||||
if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
|
||||
!jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
|
||||
return 0;
|
||||
return filemap_fdatawait_range_keep_errors(
|
||||
jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
|
||||
jinode->i_dirty_end);
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_wait_inode_data);
|
||||
|
||||
/*
|
||||
* Submit all the data buffers of inode associated with the transaction to
|
||||
* disk.
|
||||
@ -215,29 +243,20 @@ static int journal_submit_data_buffers(journal_t *journal,
|
||||
{
|
||||
struct jbd2_inode *jinode;
|
||||
int err, ret = 0;
|
||||
struct address_space *mapping;
|
||||
|
||||
spin_lock(&journal->j_list_lock);
|
||||
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
|
||||
loff_t dirty_start = jinode->i_dirty_start;
|
||||
loff_t dirty_end = jinode->i_dirty_end;
|
||||
|
||||
if (!(jinode->i_flags & JI_WRITE_DATA))
|
||||
continue;
|
||||
mapping = jinode->i_vfs_inode->i_mapping;
|
||||
jinode->i_flags |= JI_COMMIT_RUNNING;
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
/*
|
||||
* submit the inode data buffers. We use writepage
|
||||
* instead of writepages. Because writepages can do
|
||||
* block allocation with delalloc. We need to write
|
||||
* only allocated blocks here.
|
||||
*/
|
||||
/* submit the inode data buffers. */
|
||||
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
|
||||
err = journal_submit_inode_data_buffers(mapping, dirty_start,
|
||||
dirty_end);
|
||||
if (!ret)
|
||||
ret = err;
|
||||
if (journal->j_submit_inode_data_buffers) {
|
||||
err = journal->j_submit_inode_data_buffers(jinode);
|
||||
if (!ret)
|
||||
ret = err;
|
||||
}
|
||||
spin_lock(&journal->j_list_lock);
|
||||
J_ASSERT(jinode->i_transaction == commit_transaction);
|
||||
jinode->i_flags &= ~JI_COMMIT_RUNNING;
|
||||
@ -248,6 +267,15 @@ static int journal_submit_data_buffers(journal_t *journal,
|
||||
return ret;
|
||||
}
|
||||
|
||||
int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
|
||||
{
|
||||
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
|
||||
|
||||
return filemap_fdatawait_range_keep_errors(mapping,
|
||||
jinode->i_dirty_start,
|
||||
jinode->i_dirty_end);
|
||||
}
|
||||
|
||||
/*
|
||||
* Wait for data submitted for writeout, refile inodes to proper
|
||||
* transaction if needed.
|
||||
@ -262,18 +290,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
|
||||
/* For locking, see the comment in journal_submit_data_buffers() */
|
||||
spin_lock(&journal->j_list_lock);
|
||||
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
|
||||
loff_t dirty_start = jinode->i_dirty_start;
|
||||
loff_t dirty_end = jinode->i_dirty_end;
|
||||
|
||||
if (!(jinode->i_flags & JI_WAIT_DATA))
|
||||
continue;
|
||||
jinode->i_flags |= JI_COMMIT_RUNNING;
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
err = filemap_fdatawait_range_keep_errors(
|
||||
jinode->i_vfs_inode->i_mapping, dirty_start,
|
||||
dirty_end);
|
||||
if (!ret)
|
||||
ret = err;
|
||||
/* wait for the inode data buffers writeout. */
|
||||
if (journal->j_finish_inode_data_buffers) {
|
||||
err = journal->j_finish_inode_data_buffers(jinode);
|
||||
if (!ret)
|
||||
ret = err;
|
||||
}
|
||||
spin_lock(&journal->j_list_lock);
|
||||
jinode->i_flags &= ~JI_COMMIT_RUNNING;
|
||||
smp_mb();
|
||||
@ -413,6 +439,20 @@ void jbd2_journal_commit_transaction(journal_t *journal)
|
||||
J_ASSERT(journal->j_running_transaction != NULL);
|
||||
J_ASSERT(journal->j_committing_transaction == NULL);
|
||||
|
||||
write_lock(&journal->j_state_lock);
|
||||
journal->j_flags |= JBD2_FULL_COMMIT_ONGOING;
|
||||
while (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) {
|
||||
DEFINE_WAIT(wait);
|
||||
|
||||
prepare_to_wait(&journal->j_fc_wait, &wait,
|
||||
TASK_UNINTERRUPTIBLE);
|
||||
write_unlock(&journal->j_state_lock);
|
||||
schedule();
|
||||
write_lock(&journal->j_state_lock);
|
||||
finish_wait(&journal->j_fc_wait, &wait);
|
||||
}
|
||||
write_unlock(&journal->j_state_lock);
|
||||
|
||||
commit_transaction = journal->j_running_transaction;
|
||||
|
||||
trace_jbd2_start_commit(journal, commit_transaction);
|
||||
@ -420,6 +460,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
|
||||
commit_transaction->t_tid);
|
||||
|
||||
write_lock(&journal->j_state_lock);
|
||||
journal->j_fc_off = 0;
|
||||
J_ASSERT(commit_transaction->t_state == T_RUNNING);
|
||||
commit_transaction->t_state = T_LOCKED;
|
||||
|
||||
@ -1119,12 +1160,16 @@ void jbd2_journal_commit_transaction(journal_t *journal)
|
||||
|
||||
if (journal->j_commit_callback)
|
||||
journal->j_commit_callback(journal, commit_transaction);
|
||||
if (journal->j_fc_cleanup_callback)
|
||||
journal->j_fc_cleanup_callback(journal, 1);
|
||||
|
||||
trace_jbd2_end_commit(journal, commit_transaction);
|
||||
jbd_debug(1, "JBD2: commit %d complete, head %d\n",
|
||||
journal->j_commit_sequence, journal->j_tail_sequence);
|
||||
|
||||
write_lock(&journal->j_state_lock);
|
||||
journal->j_flags &= ~JBD2_FULL_COMMIT_ONGOING;
|
||||
journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING;
|
||||
spin_lock(&journal->j_list_lock);
|
||||
commit_transaction->t_state = T_FINISHED;
|
||||
/* Check if the transaction can be dropped now that we are finished */
|
||||
@ -1136,6 +1181,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
write_unlock(&journal->j_state_lock);
|
||||
wake_up(&journal->j_wait_done_commit);
|
||||
wake_up(&journal->j_fc_wait);
|
||||
|
||||
/*
|
||||
* Calculate overall stats
|
||||
|
@ -91,6 +91,8 @@ EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
|
||||
EXPORT_SYMBOL(jbd2_journal_force_commit);
|
||||
EXPORT_SYMBOL(jbd2_journal_inode_ranged_write);
|
||||
EXPORT_SYMBOL(jbd2_journal_inode_ranged_wait);
|
||||
EXPORT_SYMBOL(jbd2_journal_submit_inode_data_buffers);
|
||||
EXPORT_SYMBOL(jbd2_journal_finish_inode_data_buffers);
|
||||
EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
|
||||
EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
|
||||
EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
|
||||
@ -157,7 +159,9 @@ static void commit_timeout(struct timer_list *t)
|
||||
*
|
||||
* 1) COMMIT: Every so often we need to commit the current state of the
|
||||
* filesystem to disk. The journal thread is responsible for writing
|
||||
* all of the metadata buffers to disk.
|
||||
* all of the metadata buffers to disk. If a fast commit is ongoing
|
||||
* journal thread waits until it's done and then continues from
|
||||
* there on.
|
||||
*
|
||||
* 2) CHECKPOINT: We cannot reuse a used section of the log file until all
|
||||
* of the data in that part of the log has been rewritten elsewhere on
|
||||
@ -714,6 +718,75 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
|
||||
return err;
|
||||
}
|
||||
|
||||
/*
|
||||
* Start a fast commit. If there's an ongoing fast or full commit wait for
|
||||
* it to complete. Returns 0 if a new fast commit was started. Returns -EALREADY
|
||||
* if a fast commit is not needed, either because there's an already a commit
|
||||
* going on or this tid has already been committed. Returns -EINVAL if no jbd2
|
||||
* commit has yet been performed.
|
||||
*/
|
||||
int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
|
||||
{
|
||||
/*
|
||||
* Fast commits only allowed if at least one full commit has
|
||||
* been processed.
|
||||
*/
|
||||
if (!journal->j_stats.ts_tid)
|
||||
return -EINVAL;
|
||||
|
||||
if (tid <= journal->j_commit_sequence)
|
||||
return -EALREADY;
|
||||
|
||||
write_lock(&journal->j_state_lock);
|
||||
if (journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
|
||||
(journal->j_flags & JBD2_FAST_COMMIT_ONGOING)) {
|
||||
DEFINE_WAIT(wait);
|
||||
|
||||
prepare_to_wait(&journal->j_fc_wait, &wait,
|
||||
TASK_UNINTERRUPTIBLE);
|
||||
write_unlock(&journal->j_state_lock);
|
||||
schedule();
|
||||
finish_wait(&journal->j_fc_wait, &wait);
|
||||
return -EALREADY;
|
||||
}
|
||||
journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
|
||||
write_unlock(&journal->j_state_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_begin_commit);
|
||||
|
||||
/*
|
||||
* Stop a fast commit. If fallback is set, this function starts commit of
|
||||
* TID tid before any other fast commit can start.
|
||||
*/
|
||||
static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
|
||||
{
|
||||
if (journal->j_fc_cleanup_callback)
|
||||
journal->j_fc_cleanup_callback(journal, 0);
|
||||
write_lock(&journal->j_state_lock);
|
||||
journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING;
|
||||
if (fallback)
|
||||
journal->j_flags |= JBD2_FULL_COMMIT_ONGOING;
|
||||
write_unlock(&journal->j_state_lock);
|
||||
wake_up(&journal->j_fc_wait);
|
||||
if (fallback)
|
||||
return jbd2_complete_transaction(journal, tid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int jbd2_fc_end_commit(journal_t *journal)
|
||||
{
|
||||
return __jbd2_fc_end_commit(journal, 0, 0);
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_end_commit);
|
||||
|
||||
int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid)
|
||||
{
|
||||
return __jbd2_fc_end_commit(journal, tid, 1);
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);
|
||||
|
||||
/* Return 1 when transaction with given tid has already committed. */
|
||||
int jbd2_transaction_committed(journal_t *journal, tid_t tid)
|
||||
{
|
||||
@ -782,6 +855,110 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp)
|
||||
return jbd2_journal_bmap(journal, blocknr, retp);
|
||||
}
|
||||
|
||||
/* Map one fast commit buffer for use by the file system */
|
||||
int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
|
||||
{
|
||||
unsigned long long pblock;
|
||||
unsigned long blocknr;
|
||||
int ret = 0;
|
||||
struct buffer_head *bh;
|
||||
int fc_off;
|
||||
|
||||
*bh_out = NULL;
|
||||
write_lock(&journal->j_state_lock);
|
||||
|
||||
if (journal->j_fc_off + journal->j_fc_first < journal->j_fc_last) {
|
||||
fc_off = journal->j_fc_off;
|
||||
blocknr = journal->j_fc_first + fc_off;
|
||||
journal->j_fc_off++;
|
||||
} else {
|
||||
ret = -EINVAL;
|
||||
}
|
||||
write_unlock(&journal->j_state_lock);
|
||||
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = jbd2_journal_bmap(journal, blocknr, &pblock);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
bh = __getblk(journal->j_dev, pblock, journal->j_blocksize);
|
||||
if (!bh)
|
||||
return -ENOMEM;
|
||||
|
||||
lock_buffer(bh);
|
||||
|
||||
clear_buffer_uptodate(bh);
|
||||
set_buffer_dirty(bh);
|
||||
unlock_buffer(bh);
|
||||
journal->j_fc_wbuf[fc_off] = bh;
|
||||
|
||||
*bh_out = bh;
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_get_buf);
|
||||
|
||||
/*
|
||||
* Wait on fast commit buffers that were allocated by jbd2_fc_get_buf
|
||||
* for completion.
|
||||
*/
|
||||
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks)
|
||||
{
|
||||
struct buffer_head *bh;
|
||||
int i, j_fc_off;
|
||||
|
||||
read_lock(&journal->j_state_lock);
|
||||
j_fc_off = journal->j_fc_off;
|
||||
read_unlock(&journal->j_state_lock);
|
||||
|
||||
/*
|
||||
* Wait in reverse order to minimize chances of us being woken up before
|
||||
* all IOs have completed
|
||||
*/
|
||||
for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) {
|
||||
bh = journal->j_fc_wbuf[i];
|
||||
wait_on_buffer(bh);
|
||||
put_bh(bh);
|
||||
journal->j_fc_wbuf[i] = NULL;
|
||||
if (unlikely(!buffer_uptodate(bh)))
|
||||
return -EIO;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_wait_bufs);
|
||||
|
||||
/*
|
||||
* Wait on fast commit buffers that were allocated by jbd2_fc_get_buf
|
||||
* for completion.
|
||||
*/
|
||||
int jbd2_fc_release_bufs(journal_t *journal)
|
||||
{
|
||||
struct buffer_head *bh;
|
||||
int i, j_fc_off;
|
||||
|
||||
read_lock(&journal->j_state_lock);
|
||||
j_fc_off = journal->j_fc_off;
|
||||
read_unlock(&journal->j_state_lock);
|
||||
|
||||
/*
|
||||
* Wait in reverse order to minimize chances of us being woken up before
|
||||
* all IOs have completed
|
||||
*/
|
||||
for (i = j_fc_off - 1; i >= 0; i--) {
|
||||
bh = journal->j_fc_wbuf[i];
|
||||
if (!bh)
|
||||
break;
|
||||
put_bh(bh);
|
||||
journal->j_fc_wbuf[i] = NULL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_release_bufs);
|
||||
|
||||
/*
|
||||
* Conversion of logical to physical block numbers for the journal
|
||||
*
|
||||
@ -1140,6 +1317,7 @@ static journal_t *journal_init_common(struct block_device *bdev,
|
||||
init_waitqueue_head(&journal->j_wait_commit);
|
||||
init_waitqueue_head(&journal->j_wait_updates);
|
||||
init_waitqueue_head(&journal->j_wait_reserved);
|
||||
init_waitqueue_head(&journal->j_fc_wait);
|
||||
mutex_init(&journal->j_abort_mutex);
|
||||
mutex_init(&journal->j_barrier);
|
||||
mutex_init(&journal->j_checkpoint_mutex);
|
||||
@ -1179,6 +1357,14 @@ static journal_t *journal_init_common(struct block_device *bdev,
|
||||
if (!journal->j_wbuf)
|
||||
goto err_cleanup;
|
||||
|
||||
if (journal->j_fc_wbufsize > 0) {
|
||||
journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
|
||||
sizeof(struct buffer_head *),
|
||||
GFP_KERNEL);
|
||||
if (!journal->j_fc_wbuf)
|
||||
goto err_cleanup;
|
||||
}
|
||||
|
||||
bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize);
|
||||
if (!bh) {
|
||||
pr_err("%s: Cannot get buffer for journal superblock\n",
|
||||
@ -1192,11 +1378,23 @@ static journal_t *journal_init_common(struct block_device *bdev,
|
||||
|
||||
err_cleanup:
|
||||
kfree(journal->j_wbuf);
|
||||
kfree(journal->j_fc_wbuf);
|
||||
jbd2_journal_destroy_revoke(journal);
|
||||
kfree(journal);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
int jbd2_fc_init(journal_t *journal, int num_fc_blks)
|
||||
{
|
||||
journal->j_fc_wbufsize = num_fc_blks;
|
||||
journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
|
||||
sizeof(struct buffer_head *), GFP_KERNEL);
|
||||
if (!journal->j_fc_wbuf)
|
||||
return -ENOMEM;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(jbd2_fc_init);
|
||||
|
||||
/* jbd2_journal_init_dev and jbd2_journal_init_inode:
|
||||
*
|
||||
* Create a journal structure assigned some fixed set of disk blocks to
|
||||
@ -1314,11 +1512,20 @@ static int journal_reset(journal_t *journal)
|
||||
}
|
||||
|
||||
journal->j_first = first;
|
||||
journal->j_last = last;
|
||||
|
||||
journal->j_head = first;
|
||||
journal->j_tail = first;
|
||||
journal->j_free = last - first;
|
||||
if (jbd2_has_feature_fast_commit(journal) &&
|
||||
journal->j_fc_wbufsize > 0) {
|
||||
journal->j_fc_last = last;
|
||||
journal->j_last = last - journal->j_fc_wbufsize;
|
||||
journal->j_fc_first = journal->j_last + 1;
|
||||
journal->j_fc_off = 0;
|
||||
} else {
|
||||
journal->j_last = last;
|
||||
}
|
||||
|
||||
journal->j_head = journal->j_first;
|
||||
journal->j_tail = journal->j_first;
|
||||
journal->j_free = journal->j_last - journal->j_first;
|
||||
|
||||
journal->j_tail_sequence = journal->j_transaction_sequence;
|
||||
journal->j_commit_sequence = journal->j_transaction_sequence - 1;
|
||||
@ -1464,6 +1671,7 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
|
||||
static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
|
||||
{
|
||||
journal_superblock_t *sb = journal->j_superblock;
|
||||
bool had_fast_commit = false;
|
||||
|
||||
BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
|
||||
lock_buffer(journal->j_sb_buffer);
|
||||
@ -1477,9 +1685,20 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
|
||||
|
||||
sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
|
||||
sb->s_start = cpu_to_be32(0);
|
||||
if (jbd2_has_feature_fast_commit(journal)) {
|
||||
/*
|
||||
* When journal is clean, no need to commit fast commit flag and
|
||||
* make file system incompatible with older kernels.
|
||||
*/
|
||||
jbd2_clear_feature_fast_commit(journal);
|
||||
had_fast_commit = true;
|
||||
}
|
||||
|
||||
jbd2_write_superblock(journal, write_op);
|
||||
|
||||
if (had_fast_commit)
|
||||
jbd2_set_feature_fast_commit(journal);
|
||||
|
||||
/* Log is no longer empty */
|
||||
write_lock(&journal->j_state_lock);
|
||||
journal->j_flags |= JBD2_FLUSHED;
|
||||
@ -1663,9 +1882,18 @@ static int load_superblock(journal_t *journal)
|
||||
journal->j_tail_sequence = be32_to_cpu(sb->s_sequence);
|
||||
journal->j_tail = be32_to_cpu(sb->s_start);
|
||||
journal->j_first = be32_to_cpu(sb->s_first);
|
||||
journal->j_last = be32_to_cpu(sb->s_maxlen);
|
||||
journal->j_errno = be32_to_cpu(sb->s_errno);
|
||||
|
||||
if (jbd2_has_feature_fast_commit(journal) &&
|
||||
journal->j_fc_wbufsize > 0) {
|
||||
journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
|
||||
journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize;
|
||||
journal->j_fc_first = journal->j_last + 1;
|
||||
journal->j_fc_off = 0;
|
||||
} else {
|
||||
journal->j_last = be32_to_cpu(sb->s_maxlen);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1726,6 +1954,9 @@ int jbd2_journal_load(journal_t *journal)
|
||||
*/
|
||||
journal->j_flags &= ~JBD2_ABORT;
|
||||
|
||||
if (journal->j_fc_wbufsize > 0)
|
||||
jbd2_journal_set_features(journal, 0, 0,
|
||||
JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
|
||||
/* OK, we've finished with the dynamic journal bits:
|
||||
* reinitialise the dynamic contents of the superblock in memory
|
||||
* and reset them on disk. */
|
||||
@ -1809,6 +2040,8 @@ int jbd2_journal_destroy(journal_t *journal)
|
||||
jbd2_journal_destroy_revoke(journal);
|
||||
if (journal->j_chksum_driver)
|
||||
crypto_free_shash(journal->j_chksum_driver);
|
||||
if (journal->j_fc_wbufsize > 0)
|
||||
kfree(journal->j_fc_wbuf);
|
||||
kfree(journal->j_wbuf);
|
||||
kfree(journal);
|
||||
|
||||
|
@ -35,7 +35,6 @@ struct recovery_info
|
||||
int nr_revoke_hits;
|
||||
};
|
||||
|
||||
enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
|
||||
static int do_one_pass(journal_t *journal,
|
||||
struct recovery_info *info, enum passtype pass);
|
||||
static int scan_revoke_records(journal_t *, struct buffer_head *,
|
||||
@ -225,10 +224,51 @@ static int count_tags(journal_t *journal, struct buffer_head *bh)
|
||||
/* Make sure we wrap around the log correctly! */
|
||||
#define wrap(journal, var) \
|
||||
do { \
|
||||
if (var >= (journal)->j_last) \
|
||||
var -= ((journal)->j_last - (journal)->j_first); \
|
||||
unsigned long _wrap_last = \
|
||||
jbd2_has_feature_fast_commit(journal) ? \
|
||||
(journal)->j_fc_last : (journal)->j_last; \
|
||||
\
|
||||
if (var >= _wrap_last) \
|
||||
var -= (_wrap_last - (journal)->j_first); \
|
||||
} while (0)
|
||||
|
||||
static int fc_do_one_pass(journal_t *journal,
|
||||
struct recovery_info *info, enum passtype pass)
|
||||
{
|
||||
unsigned int expected_commit_id = info->end_transaction;
|
||||
unsigned long next_fc_block;
|
||||
struct buffer_head *bh;
|
||||
int err = 0;
|
||||
|
||||
next_fc_block = journal->j_fc_first;
|
||||
if (!journal->j_fc_replay_callback)
|
||||
return 0;
|
||||
|
||||
while (next_fc_block <= journal->j_fc_last) {
|
||||
jbd_debug(3, "Fast commit replay: next block %ld",
|
||||
next_fc_block);
|
||||
err = jread(&bh, journal, next_fc_block);
|
||||
if (err) {
|
||||
jbd_debug(3, "Fast commit replay: read error");
|
||||
break;
|
||||
}
|
||||
|
||||
jbd_debug(3, "Processing fast commit blk with seq %d");
|
||||
err = journal->j_fc_replay_callback(journal, bh, pass,
|
||||
next_fc_block - journal->j_fc_first,
|
||||
expected_commit_id);
|
||||
next_fc_block++;
|
||||
if (err < 0 || err == JBD2_FC_REPLAY_STOP)
|
||||
break;
|
||||
err = 0;
|
||||
}
|
||||
|
||||
if (err)
|
||||
jbd_debug(3, "Fast commit replay failed, err = %d\n", err);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
/**
|
||||
* jbd2_journal_recover - recovers a on-disk journal
|
||||
* @journal: the journal to recover
|
||||
@ -428,6 +468,8 @@ static int do_one_pass(journal_t *journal,
|
||||
__u32 crc32_sum = ~0; /* Transactional Checksums */
|
||||
int descr_csum_size = 0;
|
||||
int block_error = 0;
|
||||
bool need_check_commit_time = false;
|
||||
__u64 last_trans_commit_time = 0, commit_time;
|
||||
|
||||
/*
|
||||
* First thing is to establish what we expect to find in the log
|
||||
@ -470,7 +512,9 @@ static int do_one_pass(journal_t *journal,
|
||||
break;
|
||||
|
||||
jbd_debug(2, "Scanning for sequence ID %u at %lu/%lu\n",
|
||||
next_commit_ID, next_log_block, journal->j_last);
|
||||
next_commit_ID, next_log_block,
|
||||
jbd2_has_feature_fast_commit(journal) ?
|
||||
journal->j_fc_last : journal->j_last);
|
||||
|
||||
/* Skip over each chunk of the transaction looking
|
||||
* either the next descriptor block or the final commit
|
||||
@ -520,12 +564,21 @@ static int do_one_pass(journal_t *journal,
|
||||
if (descr_csum_size > 0 &&
|
||||
!jbd2_descriptor_block_csum_verify(journal,
|
||||
bh->b_data)) {
|
||||
printk(KERN_ERR "JBD2: Invalid checksum "
|
||||
"recovering block %lu in log\n",
|
||||
next_log_block);
|
||||
err = -EFSBADCRC;
|
||||
brelse(bh);
|
||||
goto failed;
|
||||
/*
|
||||
* PASS_SCAN can see stale blocks due to lazy
|
||||
* journal init. Don't error out on those yet.
|
||||
*/
|
||||
if (pass != PASS_SCAN) {
|
||||
pr_err("JBD2: Invalid checksum recovering block %lu in log\n",
|
||||
next_log_block);
|
||||
err = -EFSBADCRC;
|
||||
brelse(bh);
|
||||
goto failed;
|
||||
}
|
||||
need_check_commit_time = true;
|
||||
jbd_debug(1,
|
||||
"invalid descriptor block found in %lu\n",
|
||||
next_log_block);
|
||||
}
|
||||
|
||||
/* If it is a valid descriptor block, replay it
|
||||
@ -535,6 +588,7 @@ static int do_one_pass(journal_t *journal,
|
||||
if (pass != PASS_REPLAY) {
|
||||
if (pass == PASS_SCAN &&
|
||||
jbd2_has_feature_checksum(journal) &&
|
||||
!need_check_commit_time &&
|
||||
!info->end_transaction) {
|
||||
if (calc_chksums(journal, bh,
|
||||
&next_log_block,
|
||||
@ -683,11 +737,41 @@ static int do_one_pass(journal_t *journal,
|
||||
* mentioned conditions. Hence assume
|
||||
* "Interrupted Commit".)
|
||||
*/
|
||||
commit_time = be64_to_cpu(
|
||||
((struct commit_header *)bh->b_data)->h_commit_sec);
|
||||
/*
|
||||
* If need_check_commit_time is set, it means we are in
|
||||
* PASS_SCAN and csum verify failed before. If
|
||||
* commit_time is increasing, it's the same journal,
|
||||
* otherwise it is stale journal block, just end this
|
||||
* recovery.
|
||||
*/
|
||||
if (need_check_commit_time) {
|
||||
if (commit_time >= last_trans_commit_time) {
|
||||
pr_err("JBD2: Invalid checksum found in transaction %u\n",
|
||||
next_commit_ID);
|
||||
err = -EFSBADCRC;
|
||||
brelse(bh);
|
||||
goto failed;
|
||||
}
|
||||
ignore_crc_mismatch:
|
||||
/*
|
||||
* It likely does not belong to same journal,
|
||||
* just end this recovery with success.
|
||||
*/
|
||||
jbd_debug(1, "JBD2: Invalid checksum ignored in transaction %u, likely stale data\n",
|
||||
next_commit_ID);
|
||||
err = 0;
|
||||
brelse(bh);
|
||||
goto done;
|
||||
}
|
||||
|
||||
/* Found an expected commit block: if checksums
|
||||
* are present verify them in PASS_SCAN; else not
|
||||
/*
|
||||
* Found an expected commit block: if checksums
|
||||
* are present, verify them in PASS_SCAN; else not
|
||||
* much to do other than move on to the next sequence
|
||||
* number. */
|
||||
* number.
|
||||
*/
|
||||
if (pass == PASS_SCAN &&
|
||||
jbd2_has_feature_checksum(journal)) {
|
||||
struct commit_header *cbh =
|
||||
@ -719,6 +803,8 @@ static int do_one_pass(journal_t *journal,
|
||||
!jbd2_commit_block_csum_verify(journal,
|
||||
bh->b_data)) {
|
||||
chksum_error:
|
||||
if (commit_time < last_trans_commit_time)
|
||||
goto ignore_crc_mismatch;
|
||||
info->end_transaction = next_commit_ID;
|
||||
|
||||
if (!jbd2_has_feature_async_commit(journal)) {
|
||||
@ -728,11 +814,24 @@ static int do_one_pass(journal_t *journal,
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (pass == PASS_SCAN)
|
||||
last_trans_commit_time = commit_time;
|
||||
brelse(bh);
|
||||
next_commit_ID++;
|
||||
continue;
|
||||
|
||||
case JBD2_REVOKE_BLOCK:
|
||||
/*
|
||||
* Check revoke block crc in pass_scan, if csum verify
|
||||
* failed, check commit block time later.
|
||||
*/
|
||||
if (pass == PASS_SCAN &&
|
||||
!jbd2_descriptor_block_csum_verify(journal,
|
||||
bh->b_data)) {
|
||||
jbd_debug(1, "JBD2: invalid revoke block found in %lu\n",
|
||||
next_log_block);
|
||||
need_check_commit_time = true;
|
||||
}
|
||||
/* If we aren't in the REVOKE pass, then we can
|
||||
* just skip over this block. */
|
||||
if (pass != PASS_REVOKE) {
|
||||
@ -777,6 +876,13 @@ static int do_one_pass(journal_t *journal,
|
||||
success = -EIO;
|
||||
}
|
||||
}
|
||||
|
||||
if (jbd2_has_feature_fast_commit(journal) && pass != PASS_REVOKE) {
|
||||
err = fc_do_one_pass(journal, info, pass);
|
||||
if (err)
|
||||
success = err;
|
||||
}
|
||||
|
||||
if (block_error && success == 0)
|
||||
success = -EIO;
|
||||
return success;
|
||||
@ -800,9 +906,6 @@ static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
|
||||
offset = sizeof(jbd2_journal_revoke_header_t);
|
||||
rcount = be32_to_cpu(header->r_count);
|
||||
|
||||
if (!jbd2_descriptor_block_csum_verify(journal, header))
|
||||
return -EFSBADCRC;
|
||||
|
||||
if (jbd2_journal_has_csum_v2or3(journal))
|
||||
csum_size = sizeof(struct jbd2_journal_block_tail);
|
||||
if (rcount > journal->j_blocksize - csum_size)
|
||||
|
@ -883,6 +883,10 @@ int ocfs2_journal_init(struct ocfs2_journal *journal, int *dirty)
|
||||
OCFS2_JOURNAL_DIRTY_FL);
|
||||
|
||||
journal->j_journal = j_journal;
|
||||
journal->j_journal->j_submit_inode_data_buffers =
|
||||
jbd2_journal_submit_inode_data_buffers;
|
||||
journal->j_journal->j_finish_inode_data_buffers =
|
||||
jbd2_journal_finish_inode_data_buffers;
|
||||
journal->j_inode = inode;
|
||||
journal->j_bh = bh;
|
||||
|
||||
|
@ -289,6 +289,7 @@ typedef struct journal_superblock_s
|
||||
#define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT 0x00000004
|
||||
#define JBD2_FEATURE_INCOMPAT_CSUM_V2 0x00000008
|
||||
#define JBD2_FEATURE_INCOMPAT_CSUM_V3 0x00000010
|
||||
#define JBD2_FEATURE_INCOMPAT_FAST_COMMIT 0x00000020
|
||||
|
||||
/* See "journal feature predicate functions" below */
|
||||
|
||||
@ -299,7 +300,8 @@ typedef struct journal_superblock_s
|
||||
JBD2_FEATURE_INCOMPAT_64BIT | \
|
||||
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \
|
||||
JBD2_FEATURE_INCOMPAT_CSUM_V2 | \
|
||||
JBD2_FEATURE_INCOMPAT_CSUM_V3)
|
||||
JBD2_FEATURE_INCOMPAT_CSUM_V3 | \
|
||||
JBD2_FEATURE_INCOMPAT_FAST_COMMIT)
|
||||
|
||||
#ifdef __KERNEL__
|
||||
|
||||
@ -452,8 +454,8 @@ struct jbd2_inode {
|
||||
struct jbd2_revoke_table_s;
|
||||
|
||||
/**
|
||||
* struct handle_s - The handle_s type is the concrete type associated with
|
||||
* handle_t.
|
||||
* struct jbd2_journal_handle - The jbd2_journal_handle type is the concrete
|
||||
* type associated with handle_t.
|
||||
* @h_transaction: Which compound transaction is this update a part of?
|
||||
* @h_journal: Which journal handle belongs to - used iff h_reserved set.
|
||||
* @h_rsv_handle: Handle reserved for finishing the logical operation.
|
||||
@ -629,7 +631,9 @@ struct transaction_s
|
||||
struct journal_head *t_shadow_list;
|
||||
|
||||
/*
|
||||
* List of inodes whose data we've modified in data=ordered mode.
|
||||
* List of inodes associated with the transaction; e.g., ext4 uses
|
||||
* this to track inodes in data=ordered and data=journal mode that
|
||||
* need special handling on transaction commit; also used by ocfs2.
|
||||
* [j_list_lock]
|
||||
*/
|
||||
struct list_head t_inode_list;
|
||||
@ -747,6 +751,11 @@ jbd2_time_diff(unsigned long start, unsigned long end)
|
||||
|
||||
#define JBD2_NR_BATCH 64
|
||||
|
||||
enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
|
||||
|
||||
#define JBD2_FC_REPLAY_STOP 0
|
||||
#define JBD2_FC_REPLAY_CONTINUE 1
|
||||
|
||||
/**
|
||||
* struct journal_s - The journal_s type is the concrete type associated with
|
||||
* journal_t.
|
||||
@ -857,6 +866,13 @@ struct journal_s
|
||||
*/
|
||||
wait_queue_head_t j_wait_reserved;
|
||||
|
||||
/**
|
||||
* @j_fc_wait:
|
||||
*
|
||||
* Wait queue to wait for completion of async fast commits.
|
||||
*/
|
||||
wait_queue_head_t j_fc_wait;
|
||||
|
||||
/**
|
||||
* @j_checkpoint_mutex:
|
||||
*
|
||||
@ -914,6 +930,30 @@ struct journal_s
|
||||
*/
|
||||
unsigned long j_last;
|
||||
|
||||
/**
|
||||
* @j_fc_first:
|
||||
*
|
||||
* The block number of the first fast commit block in the journal
|
||||
* [j_state_lock].
|
||||
*/
|
||||
unsigned long j_fc_first;
|
||||
|
||||
/**
|
||||
* @j_fc_off:
|
||||
*
|
||||
* Number of fast commit blocks currently allocated.
|
||||
* [j_state_lock].
|
||||
*/
|
||||
unsigned long j_fc_off;
|
||||
|
||||
/**
|
||||
* @j_fc_last:
|
||||
*
|
||||
* The block number one beyond the last fast commit block in the journal
|
||||
* [j_state_lock].
|
||||
*/
|
||||
unsigned long j_fc_last;
|
||||
|
||||
/**
|
||||
* @j_dev: Device where we store the journal.
|
||||
*/
|
||||
@ -1064,6 +1104,12 @@ struct journal_s
|
||||
*/
|
||||
struct buffer_head **j_wbuf;
|
||||
|
||||
/**
|
||||
* @j_fc_wbuf: Array of fast commit bhs for
|
||||
* jbd2_journal_commit_transaction.
|
||||
*/
|
||||
struct buffer_head **j_fc_wbuf;
|
||||
|
||||
/**
|
||||
* @j_wbufsize:
|
||||
*
|
||||
@ -1071,6 +1117,13 @@ struct journal_s
|
||||
*/
|
||||
int j_wbufsize;
|
||||
|
||||
/**
|
||||
* @j_fc_wbufsize:
|
||||
*
|
||||
* Size of @j_fc_wbuf array.
|
||||
*/
|
||||
int j_fc_wbufsize;
|
||||
|
||||
/**
|
||||
* @j_last_sync_writer:
|
||||
*
|
||||
@ -1111,6 +1164,27 @@ struct journal_s
|
||||
void (*j_commit_callback)(journal_t *,
|
||||
transaction_t *);
|
||||
|
||||
/**
|
||||
* @j_submit_inode_data_buffers:
|
||||
*
|
||||
* This function is called for all inodes associated with the
|
||||
* committing transaction marked with JI_WRITE_DATA flag
|
||||
* before we start to write out the transaction to the journal.
|
||||
*/
|
||||
int (*j_submit_inode_data_buffers)
|
||||
(struct jbd2_inode *);
|
||||
|
||||
/**
|
||||
* @j_finish_inode_data_buffers:
|
||||
*
|
||||
* This function is called for all inodes associated with the
|
||||
* committing transaction marked with JI_WAIT_DATA flag
|
||||
* after we have written the transaction to the journal
|
||||
* but before we write out the commit block.
|
||||
*/
|
||||
int (*j_finish_inode_data_buffers)
|
||||
(struct jbd2_inode *);
|
||||
|
||||
/*
|
||||
* Journal statistics
|
||||
*/
|
||||
@ -1170,6 +1244,30 @@ struct journal_s
|
||||
*/
|
||||
struct lockdep_map j_trans_commit_map;
|
||||
#endif
|
||||
|
||||
/**
|
||||
* @j_fc_cleanup_callback:
|
||||
*
|
||||
* Clean-up after fast commit or full commit. JBD2 calls this function
|
||||
* after every commit operation.
|
||||
*/
|
||||
void (*j_fc_cleanup_callback)(struct journal_s *journal, int);
|
||||
|
||||
/*
|
||||
* @j_fc_replay_callback:
|
||||
*
|
||||
* File-system specific function that performs replay of a fast
|
||||
* commit. JBD2 calls this function for each fast commit block found in
|
||||
* the journal. This function should return JBD2_FC_REPLAY_CONTINUE
|
||||
* to indicate that the block was processed correctly and more fast
|
||||
* commit replay should continue. Return value of JBD2_FC_REPLAY_STOP
|
||||
* indicates the end of replay (no more blocks remaining). A negative
|
||||
* return value indicates error.
|
||||
*/
|
||||
int (*j_fc_replay_callback)(struct journal_s *journal,
|
||||
struct buffer_head *bh,
|
||||
enum passtype pass, int off,
|
||||
tid_t expected_commit_id);
|
||||
};
|
||||
|
||||
#define jbd2_might_wait_for_commit(j) \
|
||||
@ -1240,6 +1338,7 @@ JBD2_FEATURE_INCOMPAT_FUNCS(64bit, 64BIT)
|
||||
JBD2_FEATURE_INCOMPAT_FUNCS(async_commit, ASYNC_COMMIT)
|
||||
JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2)
|
||||
JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3)
|
||||
JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT)
|
||||
|
||||
/*
|
||||
* Journal flag definitions
|
||||
@ -1253,6 +1352,8 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3)
|
||||
#define JBD2_ABORT_ON_SYNCDATA_ERR 0x040 /* Abort the journal on file
|
||||
* data write error in ordered
|
||||
* mode */
|
||||
#define JBD2_FAST_COMMIT_ONGOING 0x100 /* Fast commit is ongoing */
|
||||
#define JBD2_FULL_COMMIT_ONGOING 0x200 /* Full commit is ongoing */
|
||||
|
||||
/*
|
||||
* Function declarations for the journaling transaction and buffer
|
||||
@ -1421,6 +1522,10 @@ extern int jbd2_journal_inode_ranged_write(handle_t *handle,
|
||||
extern int jbd2_journal_inode_ranged_wait(handle_t *handle,
|
||||
struct jbd2_inode *inode, loff_t start_byte,
|
||||
loff_t length);
|
||||
extern int jbd2_journal_submit_inode_data_buffers(
|
||||
struct jbd2_inode *jinode);
|
||||
extern int jbd2_journal_finish_inode_data_buffers(
|
||||
struct jbd2_inode *jinode);
|
||||
extern int jbd2_journal_begin_ordered_truncate(journal_t *journal,
|
||||
struct jbd2_inode *inode, loff_t new_size);
|
||||
extern void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
|
||||
@ -1505,6 +1610,17 @@ void __jbd2_log_wait_for_space(journal_t *journal);
|
||||
extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
|
||||
extern int jbd2_cleanup_journal_tail(journal_t *);
|
||||
|
||||
/* Fast commit related APIs */
|
||||
int jbd2_fc_init(journal_t *journal, int num_fc_blks);
|
||||
int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
|
||||
int jbd2_fc_end_commit(journal_t *journal);
|
||||
int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid);
|
||||
int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
|
||||
int jbd2_submit_inode_data(struct jbd2_inode *jinode);
|
||||
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
|
||||
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
|
||||
int jbd2_fc_release_bufs(journal_t *journal);
|
||||
|
||||
/*
|
||||
* is_journal_abort
|
||||
*
|
||||
|
@ -95,6 +95,16 @@ TRACE_DEFINE_ENUM(ES_REFERENCED_B);
|
||||
{ FALLOC_FL_COLLAPSE_RANGE, "COLLAPSE_RANGE"}, \
|
||||
{ FALLOC_FL_ZERO_RANGE, "ZERO_RANGE"})
|
||||
|
||||
#define show_fc_reason(reason) \
|
||||
__print_symbolic(reason, \
|
||||
{ EXT4_FC_REASON_XATTR, "XATTR"}, \
|
||||
{ EXT4_FC_REASON_CROSS_RENAME, "CROSS_RENAME"}, \
|
||||
{ EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, "JOURNAL_FLAG_CHANGE"}, \
|
||||
{ EXT4_FC_REASON_MEM, "NO_MEM"}, \
|
||||
{ EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \
|
||||
{ EXT4_FC_REASON_RESIZE, "RESIZE"}, \
|
||||
{ EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
|
||||
{ EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"})
|
||||
|
||||
TRACE_EVENT(ext4_other_inode_update_time,
|
||||
TP_PROTO(struct inode *inode, ino_t orig_ino),
|
||||
@ -1766,9 +1776,9 @@ TRACE_EVENT(ext4_ext_load_extent,
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext4_load_inode,
|
||||
TP_PROTO(struct inode *inode),
|
||||
TP_PROTO(struct super_block *sb, unsigned long ino),
|
||||
|
||||
TP_ARGS(inode),
|
||||
TP_ARGS(sb, ino),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
@ -1776,8 +1786,8 @@ TRACE_EVENT(ext4_load_inode,
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->ino = ino;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %ld",
|
||||
@ -2791,6 +2801,216 @@ TRACE_EVENT(ext4_lazy_itable_init,
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev), __entry->group)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext4_fc_replay_scan,
|
||||
TP_PROTO(struct super_block *sb, int error, int off),
|
||||
|
||||
TP_ARGS(sb, error, off),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
__field(int, error)
|
||||
__field(int, off)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->error = error;
|
||||
__entry->off = off;
|
||||
),
|
||||
|
||||
TP_printk("FC scan pass on dev %d,%d: error %d, off %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->error, __entry->off)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext4_fc_replay,
|
||||
TP_PROTO(struct super_block *sb, int tag, int ino, int priv1, int priv2),
|
||||
|
||||
TP_ARGS(sb, tag, ino, priv1, priv2),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
__field(int, tag)
|
||||
__field(int, ino)
|
||||
__field(int, priv1)
|
||||
__field(int, priv2)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->tag = tag;
|
||||
__entry->ino = ino;
|
||||
__entry->priv1 = priv1;
|
||||
__entry->priv2 = priv2;
|
||||
),
|
||||
|
||||
TP_printk("FC Replay %d,%d: tag %d, ino %d, data1 %d, data2 %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->tag, __entry->ino, __entry->priv1, __entry->priv2)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext4_fc_commit_start,
|
||||
TP_PROTO(struct super_block *sb),
|
||||
|
||||
TP_ARGS(sb),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("fast_commit started on dev %d,%d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev))
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext4_fc_commit_stop,
|
||||
TP_PROTO(struct super_block *sb, int nblks, int reason),
|
||||
|
||||
TP_ARGS(sb, nblks, reason),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
__field(int, nblks)
|
||||
__field(int, reason)
|
||||
__field(int, num_fc)
|
||||
__field(int, num_fc_ineligible)
|
||||
__field(int, nblks_agg)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->nblks = nblks;
|
||||
__entry->reason = reason;
|
||||
__entry->num_fc = EXT4_SB(sb)->s_fc_stats.fc_num_commits;
|
||||
__entry->num_fc_ineligible =
|
||||
EXT4_SB(sb)->s_fc_stats.fc_ineligible_commits;
|
||||
__entry->nblks_agg = EXT4_SB(sb)->s_fc_stats.fc_numblks;
|
||||
),
|
||||
|
||||
TP_printk("fc on [%d,%d] nblks %d, reason %d, fc = %d, ineligible = %d, agg_nblks %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->nblks, __entry->reason, __entry->num_fc,
|
||||
__entry->num_fc_ineligible, __entry->nblks_agg)
|
||||
);
|
||||
|
||||
#define FC_REASON_NAME_STAT(reason) \
|
||||
show_fc_reason(reason), \
|
||||
__entry->sbi->s_fc_stats.fc_ineligible_reason_count[reason]
|
||||
|
||||
TRACE_EVENT(ext4_fc_stats,
|
||||
TP_PROTO(struct super_block *sb),
|
||||
|
||||
TP_ARGS(sb),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
__field(struct ext4_sb_info *, sbi)
|
||||
__field(int, count)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->sbi = EXT4_SB(sb);
|
||||
),
|
||||
|
||||
TP_printk("dev %d:%d fc ineligible reasons:\n"
|
||||
"%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s,%d; "
|
||||
"num_commits:%ld, ineligible: %ld, numblks: %ld",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_CROSS_RENAME),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_JOURNAL_FLAG_CHANGE),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_MEM),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_SWAP_BOOT),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR),
|
||||
FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE),
|
||||
__entry->sbi->s_fc_stats.fc_num_commits,
|
||||
__entry->sbi->s_fc_stats.fc_ineligible_commits,
|
||||
__entry->sbi->s_fc_stats.fc_numblks)
|
||||
|
||||
);
|
||||
|
||||
#define DEFINE_TRACE_DENTRY_EVENT(__type) \
|
||||
TRACE_EVENT(ext4_fc_track_##__type, \
|
||||
TP_PROTO(struct inode *inode, struct dentry *dentry, int ret), \
|
||||
\
|
||||
TP_ARGS(inode, dentry, ret), \
|
||||
\
|
||||
TP_STRUCT__entry( \
|
||||
__field(dev_t, dev) \
|
||||
__field(int, ino) \
|
||||
__field(int, error) \
|
||||
), \
|
||||
\
|
||||
TP_fast_assign( \
|
||||
__entry->dev = inode->i_sb->s_dev; \
|
||||
__entry->ino = inode->i_ino; \
|
||||
__entry->error = ret; \
|
||||
), \
|
||||
\
|
||||
TP_printk("dev %d:%d, inode %d, error %d, fc_%s", \
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev), \
|
||||
__entry->ino, __entry->error, \
|
||||
#__type) \
|
||||
)
|
||||
|
||||
DEFINE_TRACE_DENTRY_EVENT(create);
|
||||
DEFINE_TRACE_DENTRY_EVENT(link);
|
||||
DEFINE_TRACE_DENTRY_EVENT(unlink);
|
||||
|
||||
TRACE_EVENT(ext4_fc_track_inode,
|
||||
TP_PROTO(struct inode *inode, int ret),
|
||||
|
||||
TP_ARGS(inode, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
__field(int, ino)
|
||||
__field(int, error)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->error = ret;
|
||||
),
|
||||
|
||||
TP_printk("dev %d:%d, inode %d, error %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->ino, __entry->error)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext4_fc_track_range,
|
||||
TP_PROTO(struct inode *inode, long start, long end, int ret),
|
||||
|
||||
TP_ARGS(inode, start, end, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(dev_t, dev)
|
||||
__field(int, ino)
|
||||
__field(long, start)
|
||||
__field(long, end)
|
||||
__field(int, error)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->start = start;
|
||||
__entry->end = end;
|
||||
__entry->error = ret;
|
||||
),
|
||||
|
||||
TP_printk("dev %d:%d, inode %d, error %d, start %ld, end %ld",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->ino, __entry->error, __entry->start,
|
||||
__entry->end)
|
||||
);
|
||||
|
||||
#endif /* _TRACE_EXT4_H */
|
||||
|
||||
/* This part must be outside protection */
|
||||
|
Loading…
Reference in New Issue
Block a user