- Clear out i_mapping error state when we're reinitializing inodes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsPY0wACgkQ+H93GTRK
tOuuDQ/+IvBngJL9I5py7GF0EXlMuge0nAulEj4d1ZT4tNCPp0Ouu4Jy+za+RapQ
w604fvI6VPPbtidjLpUkR+ZzVeIAanaUkHY+MXl7DEYnyKC+VO7rPZUQiXe4kCeE
ExNpL063vj4FND3xO/tXz2cs6Wjk8RuCLPWprLVKPpZ79w+BQwYFlpKGschMhR7w
EQM+7TIJHff1C2nwETbX5ZcM6yxo6PVUwxEsF7+pubVulMoJZ57m5OnS7RXZY7L7
33S3du85A/Unby+mlYQTsmWf+1FOfIIf6+r1i13gRorGSZongPSenQdO6h4uKzXc
3OHXTl783ip2cFhhbbTnDlmly66Q1wcDwUDd88YvP94Wv9K+lWASKJGqDwpT/T3/
gkmg9pTXezPytTZb+F86nFN91b4NWSdskwN4/Du2ydnSEQVmzwdYLyc1oQn9IWal
HITBlVApLF33rHgmPJXRT64uKPqsPttu3DR5337waTPKf8po+Xk7CaATIIHx8gTD
Jj8UfH7b9u7tjk5yXnx7qVCquwsG1E8N3Xi5eqn2dsTVSqia3vjyBoI7esPX5DBO
ZvbBuU5MMmGr0p7DCEcFbe/otToqdoc0quebuUodKbhUS70+RGDoqwfR+R7Gbprq
M6+Tfm7S6DIKOsfgde5HBEEAjQtsrNMNdBsBemtL1v3fzI6SyJQ=
=UtGb
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.17-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Clear out i_mapping error state when we're reinitializing inodes.
This last minute fix prevents writeback error state from persisting
past the end of the in-core inode lifecycle and causing EIO errors to
be reported to userspace when no error has occurred.
This fix for the behavioral regression has been soaking in for-next
for a while, but various fs developers persuaded me to try to get it
upstream for 4.17 because the patch that broke things was introduced
in 4.17-rc4"
* tag 'xfs-4.17-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: clear writeback errors in inode_init_always
The function return values are confusing with the way the function is
named. We expect a true or false return value but it actually returns
0/-errno. This makes the code very confusing. Changing the return values
to return a bool where if DAX is supported then return true and no DAX
support returns false.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Change bdev_dax_supported so it takes a bdev parameter. This enables
multi-device filesystems like xfs to check that a dax device can work for
the particular filesystem. Once that's in place, actually fix all the
parts of XFS where we need to be able to distinguish between datadev and
rtdev.
This patch fixes the problem where we screw up the dax support checking
in xfs if the datadev and rtdev have different dax capabilities.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases]
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
syzbot is reporting use-after-free at fuse_kill_sb_blk() [1].
Since sb->s_fs_info field is not cleared after fc was released by
fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds
already released fc and tries to hold the lock. Fix this by clearing
sb->s_fs_info field after calling fuse_conn_put().
[1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+ec3986119086fe4eec97@syzkaller.appspotmail.com>
Fixes: 3b463ae0c626 ("fuse: invalidation reverse calls")
Cc: John Muir <john@jmuir.com>
Cc: Csaba Henk <csaba@gluster.com>
Cc: Anand Avati <avati@redhat.com>
Cc: <stable@vger.kernel.org> # v2.6.31
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1].
Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode()
failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to
clear d_inode(dentry)->i_private field.
Fix by only adding the dentry to the array after being fully set up.
When tearing down the control directory, do d_invalidate() on it to get rid
of any mounts that might have been added.
[1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6
Reported-by: syzbot <syzbot+32c236387d66c4516827@syzkaller.appspotmail.com>
Fixes: bafa96541b25 ("[PATCH] fuse: add control filesystem")
Cc: <stable@vger.kernel.org> # v2.6.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If a connection gets aborted while congested, FUSE can leave
nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
wait spuriously which can lead to severe performance degradation.
The leak is caused by gating congestion state clearing with
fc->connected test in request_end(). This was added way back in 2009
by 26c3679101db ("fuse: destroy bdi on umount"). While the commit
description doesn't explain why the test was added, it most likely was
to avoid dereferencing bdi after it got destroyed.
Since then, bdi lifetime rules have changed many times and now we're
always guaranteed to have access to the bdi while the superblock is
alive (fc->sb).
Drop fc->connected conditional to avoid leaking congestion states.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Joshua Miller <joshmiller@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # v2.6.29+
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Now that the fuse and the vfs work is complete. Allow the fuse filesystem
to be mounted by the root user in a user namespace.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Ensure the translation happens by failing to read or write
posix acls when the filesystem has not indicated it supports
posix acls.
This ensures that modern cached posix acl support is available
and used when dealing with posix acls. This is important
because only that path has the code to convernt the uids and
gids in posix acls into the user namespace of a fuse filesystem.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER
to allow normal users to call "btrfs subvolume list/show" etc. in
combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF.
This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
different. This is because it always searches the fs/file tree
correspoinding to the fd with which this ioctl is called and also
returns the name of bottom subvolume.
The main differences from original ino_lookup ioctl are:
1. Read + Exec permission will be checked using inode_permission()
during path construction. -EACCES will be returned in case
of failure.
2. Path construction will be stopped at the inode number which
corresponds to the fd with which this ioctl is called. If
constructed path does not exist under fd's inode, -EACCES
will be returned.
3. The name of bottom subvolume is also searched and filled.
Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1)
bytes than ino_lookup ioctl because of space of subvolume's name.
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns
ROOT_REF information of the subvolume containing this inode except the
subvolume name (this is because to prevent potential name leak). The
subvolume name will be gained by user version of ino_lookup ioctl
(BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check.
The min id of root ref's subvolume to be searched is specified by
@min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search
ends, @min_id is set to the last searched root ref's subvolid + 1. Also,
if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM,
-EOVERFLOW is returned. Therefore the caller can just call this ioctl
again without changing the argument to continue search.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes and struct item renames ]
Signed-off-by: David Sterba <dsterba@suse.com>
Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
the information of subvolume containing this inode.
(i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ minor style fixes, update struct comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, there is a small window where ovl_obtain_alias() can
race with ovl_instantiate() and create two different overlay inodes
with the same underlying real non-dir non-hardlink inode.
The race requires an adversary to guess the file handle of the
yet to be created upper inode and decode the guessed file handle
after ovl_creat_real(), but before ovl_instantiate().
This race does not affect overlay directory inodes, because those
are decoded via ovl_lookup_real() and not with ovl_obtain_alias().
This patch fixes the race, by using inode_insert5() to add a newly
created inode to cache.
If the newly created inode apears to already exist in cache (hashed
by the same real upper inode), we instantiate the dentry with the old
inode and drop the new inode, instead of silently not hashing the new
inode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
ovl_get_inode() right now has 5 parameters. Soon this patch series will
add 2 more and suddenly argument list starts looking too long.
Hence pass arguments to ovl_get_inode() in a structure and it looks
little cleaner.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Split out common helper for race free insertion of an already allocated
inode into the cache. Use this from iget5_locked() and
insert_inode_locked4(). Make iget5_locked() use new_inode()/iput() instead
of alloc_inode()/destroy_inode() directly.
Also export to modules for use by filesystems which want to preallocate an
inode before file/directory creation.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
vfs_mkdir() may succeed and leave the dentry passed to it unhashed and
negative. ovl_create_real() is the last caller breaking when that
happens.
[amir: split re-factoring of ovl_create_temp() to prep patch
add comment about unhashed dir after mkdir
add pr_warn() if mkdir succeeds and lookup fails]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Also used ovl_create_temp() in ovl_create_index() instead of calling
ovl_do_mkdir() directly, so now all callers of ovl_do_mkdir() are routed
through ovl_create_real(), which paves the way for Al's fix for non-hashed
result from vfs_mkdir().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Al Viro suggested to simplify callers of ovl_create_real() by
returning the created dentry (or ERR_PTR) from ovl_create_real().
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Overlayfs should cope with online changes to underlying layer
without crashing the kernel, which is what xfstest overlay/019
checks.
This test may sometimes trigger WARN_ON() in ovl_create_or_link()
when linking an overlay inode that has been changed on underlying
layer.
Remove those WARN_ON() to prevent the stress test from failing.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In inode_init_always(), we clear the inode mapping flags, which clears
any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not
also clear wb_err, which means that old mapping errors can leak through
to new inodes.
This is crucial for the XFS inode allocation path because we recycle old
in-core inodes and we do not want error state from an old file to leak
into the new file. This bug was discovered by running generic/036 and
generic/047 in a loop and noticing that the EIOs generated by the
collision of direct and buffered writes in generic/036 would survive the
remount between 036 and 047, and get reported to the fsyncs (on
different files!) in generic/047.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Convert XFS to embedded bio sets.
Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert btrfs to embedded bio sets.
Acked-by: Chris Mason <clm@fb.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert block DIO code to embedded bio sets.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If "posix" (or synonym "unix" for backward compatibility) specified on mount,
and server advertises support for SMB3.11 POSIX negotiate context, then
enable the new posix extensions on the tcon. This can be viewed by
looking for "posix" in the mount options displayed by /proc/mounts
for that mount (ie if posix extensions allowed by server and the
experimental POSIX extensions also requested on the mount by specifying
"posix" at mount time).
Also add check to warn user if conflicting unix/nounix or posix/noposix specified
on mount.
Signed-off-by: Steve French <smfrench@gmail.com>
Unlike CIFS where UNIX/POSIX extensions had been negotiatable,
SMB3 did not have POSIX extensions yet. Add the new SMB3.11
POSIX negotiate context to ask the server whether it can
support POSIX (and thus whether we can send the new POSIX open
context).
Signed-off-by: Steve French <smfrench@gmail.com>
To improve security it may be helpful to have additional ways to restrict the
ability to override the default dialects (SMB2.1, SMB3 and SMB3.02) on mount
with old dialects (CIFS/SMB1 and SMB2) since vers=1.0 (CIFS/SMB1) and vers=2.0
are weaker and less secure.
Add a module parameter "disable_legacy_dialects"
(/sys/module/cifs/parameters/disable_legacy_dialects) which can be set to
1 (or equivalently Y) to forbid use of vers=1.0 or vers=2.0 on mount.
Also cleans up a few build warnings about globals for various module parms.
Signed-off-by: Steve French <smfrench@gmail.com>
Note which ones of the module params are cifs dialect only
(N/A for default dialect now that has moved to SMB2.1 or later)
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
RHBZ: 1539612
Lets show the "w" bit for those files have a .write interface to set/enable/...
the feature.
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We really don't want to be encouraging people to use the old
(less secure) cifs dialect (SMB1) and it can be confusing for them
with SMB3 (or later) being recommended but the module name is cifs.
Add a module alias for "smb3" to cifs.ko to make this less confusing.
Signed-off-by: Steve French <smfrench@gmail.com>
RHBZ: 1539617
Check that, if it is not a boolean, the value the user tries
to write to /proc/fs/cifs/cifsFYI is valid and return an error
if not.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
RHBZ: 1566345
When truncating a file we always do this synchronously to the server.
Thus we need to make sure that the cached inode metadata is
marked as stale so that on next getattr we will refresh the metadata.
In this particular bug we want to ensure that both ctime and mtime
are updated and become visible to the application after a truncate.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
When loooking at the logs for the new trace-cmd tracepoints for cifs,
it would help to know which tid is for which share (UNC name) so
update /proc/fs/cifs/DebugData to display the tid.
Also display Maximal Access which was missing as well.
Now the entry for typical entry for a tcon (in proc/fs/cifs/) looks
like:
1) \\localhost\test Mounts: 1 DevInfo: 0x20 Attributes: 0x1006f
PathComponentMax: 255 Status: 1 type: DISK
Share Capabilities: None Aligned, Partition Aligned, Share Flags: 0x0
tid: 0xe0632a55 Optimal sector size: 0x200 Maximal Access: 0x1f01ff
Signed-off-by: Steve French <smfrench@gmail.com>
Fix a few cases where we were not freeing the xid which led to
active requests being non-zero at unmount time.
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
When direct I/O is used, the data buffer may not always align to page
boundaries. Introduce a page offset in transport data structures to
describe the location of the buffer within the page.
Also change the function to pass the page offset when sending data to
transport.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
btrfs_truncate() uses two variables for error handling, ret and err (if
this sounds familiar, it's because btrfs_truncate_inode_items() did
something similar). This is error prone, as was made evident by "Btrfs:
fix error handling in btrfs_truncate()". We only have err because we
don't want to mask an error if we call btrfs_update_inode() and
btrfs_end_transaction(), so let's make that its own scoped return
variable and use ret everywhere else.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that the read side is extracted into its own function, do the same
to the write side. This leaves btrfs_get_blocks_direct_write with the
sole purpose of handling common locking required. Also flip the
condition in btrfs_get_blocks_direct_write so that the write case
comes first and we check for if (Create) rather than if (!create). This
is purely subjective but I believe makes reading a bit more "linear".
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently this function handles both the READ and WRITE dio cases. This
is facilitated by a bunch of 'if' statements, a goto short-circuit
statement and a very perverse aliasing of "!created"(READ) case
by setting lockstart = lockend and checking for lockstart < lockend for
detecting the write. Let's simplify this mess by extracting the
READ-only code into a separate __btrfs_get_block_direct_read function.
This is only the first step, the next one will be to factor out the
write side as well. The end goal will be to have the common locking/
unlocking code in btrfs_get_blocks_direct and then it will call either
the read|write subvariants. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The error code does not match the reason of failure and may confuse the
callers.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the working buffers during regular init, instead of using stack
space. This refactors the allocation code a bit to make it easier
to review.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If one of the backup superblocks is found to differ seriously from
superblock 0, write out a fresh copy from the in-core sb.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>