75646 Commits

Author SHA1 Message Date
c37731301d ntfs: Correct mark_ntfs_record_dirty() folio conversion
We've already done the work of block_dirty_folio() here, leaving
only the work that needs to be done by filemap_dirty_folio().
This was a misconversion where I misread __set_page_dirty_nobuffers()
as __set_page_dirty_buffers().

Fixes: e621900ad28b ("fs: Convert __set_page_dirty_buffers to block_dirty_folio")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
29c87793eb f2fs: Get the superblock from the mapping instead of the page
It's slightly more efficient to go directly from the mapping to the
superblock than to go from the page.  Now that these routines have
the mapping passed to them, there's no reason not to use it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
0fb5b2ebc0 f2fs: Correct f2fs_dirty_data_folio() conversion
I got the return value wrong.  Very little checks the return value
from set_page_dirty(), so nobody noticed during testing.

Fixes: 4f5e34f71318 ("f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
0f25233663 ext4: Correct ext4_journalled_dirty_folio() conversion
This should use the new folio_buffers() instead of page_has_buffers().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
d7414ba14a filemap: Remove AOP_FLAG_CONT_EXPAND
This flag is no longer used, so remove it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
800ba29547 fs: Pass an iocb to generic_perform_write()
We can extract both the file pointer and the pos from the iocb.
This simplifies each caller as well as allowing generic_perform_write()
to see more of the iocb contents in the future.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
2756c818e5 iomap: Simplify is_partially_uptodate a little
Remove the unnecessary variable 'len' and fix a comment to refer to
the folio instead of the page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:43 -04:00
704528d895 fs: Remove ->readpages address space operation
All filesystems have now been converted to use ->readahead, so
remove the ->readpages operation and fix all the comments that
used to refer to it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 13:45:33 -04:00
a4a3d8c52d exfat: do not clear VolumeDirty in writeback
Before this commit, VolumeDirty will be cleared first in
writeback if 'dirsync' or 'sync' is not enabled. If the power
is suddenly cut off after cleaning VolumeDirty but other
updates are not written, the exFAT filesystem will not be able
to detect the power failure in the next mount.

And VolumeDirty will be set again but not cleared when updating
the parent directory. It means that BootSector will be written at
least once in each write-back, which will shorten the life of the
device.

Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-04-01 10:51:03 +09:00
9ec784bf77 exfat: allow access to paths with trailing dots
The Linux kernel exfat driver currently unconditionally strips
 trailing periods '.' from path components. This isdone intentionally,
 loosely following Windows behaviour and specifications
 which state:

  #exFAT
  The concatenated file name has the same set of illegal characters as
  other FAT-based file systems (see Table 31).

  #FAT
  ...
  Leading and trailing spaces in a long name are ignored.
  Leading and embedded periods are allowed in a name and are stored in
  the long name. Trailing periods are ignored.

Note: Leading and trailing space ' ' characters are currently retained
by Linux kernel exfat, in conflict with the above specification.
On Windows 10, trailing and leading space ' ' characters are stripped
from the filenames.
Some implementations, such as fuse-exfat, don't perform path trailer
removal. When mounting images which contain trailing-dot paths, these
paths are unreachable, e.g.:

  + mount.exfat-fuse /dev/zram0 /mnt/test/
  FUSE exfat 1.3.0
  + cd /mnt/test/
  + touch fuse_created_dots... '  fuse_created_spaces  '
  + ls -l
  total 0
  -rwxrwxrwx 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -rwxrwxrwx 1 root 0 0 Aug 18 09:45  fuse_created_dots...
  + cd /
  + umount /mnt/test/
  + mount -t exfat /dev/zram0 /mnt/test
  + cd /mnt/test
  + ls -l
  ls: cannot access 'fuse_created_dots...': No such file or directory
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -????????? ? ?    ? ?            ?  fuse_created_dots...
  + touch kexfat_created_dots... '  kexfat_created_spaces  '
  + ls -l
  ls: cannot access 'fuse_created_dots...': No such file or directory
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  kexfat_created_spaces  '
  -????????? ? ?    ? ?            ?  fuse_created_dots...
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45  kexfat_created_dots
  + cd /
  + umount /mnt/test/

This commit adds "keep_last_dots" mount option that controls whether or
not trailing periods '.' are stripped
from path components during file lookup or file creation.
This mount option can be used to access
paths with trailing periods and disallow creating files with names with
trailing periods. E.g. continuing from the previous example:

  + mount -t exfat -o keep_last_dots /dev/zram0 /mnt/test
  + cd /mnt/test
  + ls -l
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32 '  fuse_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32 '  kexfat_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32  fuse_created_dots...
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32  kexfat_created_dots

  + echo > kexfat_created_dots_again...
  sh: kexfat_created_dots_again...: Invalid argument

Link: https://bugzilla.suse.com/show_bug.cgi?id=1188964
Link: https://lore.kernel.org/linux-fsdevel/003b01d755e4$31fb0d80$95f12880$
@samsung.com/
Link: https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Vasant Karasulli <vkarasulli@suse.de>
Co-developed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-04-01 10:51:02 +09:00
a87a08e3bf This pull request contains fixes for JFFS2, UBI and UBIFS
JFFS2:
         - Fixes for various memory issues
 
 UBI:
         - Fix for a race condition in cdev ioctl handler
 
 UBIFS:
 	- Fixes for O_TMPFILE and whiteout handling
 	- Fixes for various memory issues
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmJGFFEWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wcV9D/9DCaOP5Zso6Vi2QlAM3pi8ScgG
 7PXCK/l+iEMqWKv+5l4fOZuGDF9w8mf7MyBnzV6auRKEJqbOW6d3mDs2bPNORuPN
 LKLSOiumqxwrSqmpEFjS9Q3MsVMOWtx0aN57R3d8oCen44a5gGhYyP8iQSCLc8nQ
 pJszRpWnBhoWqxnc0BxDHYOYQhcQ/ckWc5mn0cEd0oN1NYmEbrAo0SpwVOjX9r1B
 TnTwDUzcAVK/M/hEsvHbEOChgibD+cTKfkORz6BkGB/VR5AlduF73UYUt/np4WH4
 6ZTe6RXKp6gqad8pJq6T/aev2m0aM5OHxwd6JTC8qK5KSa0iXnoeungsKy0ouhYv
 8V4yPRWKP9q7WEPJsiTM9KKQRiljXM+C/GOsFLKmme7zw66uu2qPdc2k99XRJzYj
 kO42AuXZ0Wx5Xueb+ZNmbPG3brNJ5ku3EXrzy/iAF6zS+uvmv1ZG5ZraLQ8RNqRT
 f8vO3NbtVnpbB6tvpPHYpjlcXkAGjumOIy9cxpl/vR5D87lRrUVSZiGu3xcW8tYm
 FOohVnLZFk3vgt9odU+TJ6qBUzWW5LVQx90Rr4o0sR7HDG8pFvX3cl3Z98bqCWGw
 wHSqy3tmUYOUaiIHvf94qJmjfY9vNlUyrnjZcoK+hFCqJz0IbrXQlrmq7o7OOvyh
 2mnOksk0XcFd41gsuw==
 =ZU5j
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
 "JFFS2:
   - Fixes for various memory issues

  UBI:
   - Fix for a race condition in cdev ioctl handler

  UBIFS:
   - Fixes for O_TMPFILE and whiteout handling

   - Fixes for various memory issues"

* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: rename_whiteout: correct old_dir size computing
  jffs2: fix memory leak in jffs2_scan_medium
  jffs2: fix memory leak in jffs2_do_mount_fs
  jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
  fs/jffs2: fix comments mentioning i_mutex
  ubi: fastmap: Return error code if memory allocation fails in add_aeb()
  ubifs: Fix to add refcount once page is set private
  ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
  ubifs: Rectify space amount budget for mkdir/tmpfile operations
  ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
  ubifs: Rename whiteout atomically
  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
  ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment
  ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
  ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
2022-03-31 16:09:41 -07:00
3d198e42ce gfs2 fixes
* To avoid deadlocks, actively cancel dlm locking requests when we give
   up on them.  Further dlm operations on the same lock will return
   -EBUSY until the cancel has been completed, so in that case, wait and
   repeat.  (This is rare.)
 * Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode().
 * Some more fallout from the gfs2 mmap + page fault deadlock fixes
   (merge c03098d4b9ad7).
 * Various other minor bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJGCAsUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrWcg//TEDazop2y7rGMFsMBXI7HPyBu4uD
 BwoclS5IfjoQbBTtkl7cWmQViMk8s3EFGxdEBorfGmMEq65I/krHi4JXG2GETdui
 ORoi8NH1sW9H2GJXmwtE2wYZlJBZtdntoBGdPXWFvt1hLajf6WGpy/CR1Wd4rYak
 8AHQxtd98OtsA6LAPlWl2UaXS4m7rhEt0Iy83mqWtbBOvZsULczuraazawnoQ/m4
 Wf5pvb+73hpwTVUkruH0+If+vi/HF0WVv1nZVyMwrSh3mpvkrsZSkbN0fd0veAhD
 b5XGI1dD5+YPxAOdwDKqnqy8/E3gRekybmpcd48BXoxF4EX/AlLX/Zn9qnrAhY6M
 qEbGzC2UqLIrPe/KjzQ8+0aKPCY5FB1VqoRMAHC/bj7mlmNgGtHxQUXdDmC4LIi6
 GOLpnueI1KtA7Hb4HCgX0BLxSqUEhUuGssBkNIqGet1cRwmM33pt1J4CG4TDLBt/
 VZiERnN3qktSlmukvd3oLSZso4fVbg7PyFTl8YMgiLDNfgcZI9RY5qwIJYrOaucr
 KTNfR6lAL2slFPIVcLwmgJt+axogk6GnCkfDVMX2VLJnMQYqJnDYn6fVG9jngSB+
 F4UBZ/alzhpel08r8xtxjADFJzA+weG1I2jnikSLKlgVN+uiQTBrhqyWdtxtEqFM
 31Nd7piiSVQEvrM=
 =xMYz
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:

 - To avoid deadlocks, actively cancel dlm locking requests when we give
   up on them.

   Further dlm operations on the same lock will return -EBUSY until the
   cancel has been completed, so in that case, wait and repeat. (This is
   rare.)

 - Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode().

 - Some more fallout from the gfs2 mmap + page fault deadlock fixes
   (merged in commit c03098d4b9ad7: "Merge tag 'gfs2-v5.15-rc5-mmap-fault'").

 - Various other minor bug fixes and cleanups.

* tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Make sure FITRIM minlen is rounded up to fs block size
  gfs2: Make sure not to return short direct writes
  gfs2: Remove dead code in gfs2_file_read_iter
  gfs2: Fix gfs2_file_buffered_write endless loop workaround
  gfs2: Minor retry logic cleanup
  gfs2: Disable page faults during lockless buffered reads
  gfs2: Fix should_fault_in_pages() logic
  gfs2: Remove return value for gfs2_indirect_init
  gfs2: Initialize gh_error in gfs2_glock_nq
  gfs2: Make use of list_is_first
  gfs2: Switch lock order of inode and iopen glock
  gfs2: cancel timed-out glock requests
  gfs2: Expect -EBUSY after canceling dlm locking requests
  gfs2: gfs2_setattr_size error path fix
  gfs2: assign rgrp glock before compute_bitstructs
2022-03-31 15:57:50 -07:00
f008b1d6e1 Netfs prep for write helpers
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmI1HOwACgkQ+7dXa6fL
 C2u9mA/+LUdXHqlvET/PAtFTg75bUPeOFGLnuDnYl1Ng2FCKMSodAohpbVtENxsK
 E/gTVS7uiVZFQgC+YmNA00z6eIQkAaDVyvKyEcUbKREBbUgONfJ/HLeaK/NvVKxx
 TY5gx/POdG6yHRQXL6JGBqSJUB8bZrGKwnJm8ebzeKOji9n7GSJBYiMlYBA7EAhs
 Aut/P7Y39ISHLw3y+y5czBeRoubljmTyznbP20xUZEzrRwhTpNwpJVzBGUZU635T
 93Sqcp//0U5LIdn6Pg6DUGHBMBTNDNJChb21ZoBusF/HHswXsOOnf/mcRUBSJUTI
 M1WSpNLk8PRBgajMdIymQpGU1sCZZzJ3krrSA3RcXdN6GPHwZg8kKjoroHsLDL6l
 igPbDSMJ5wfiwA2A2gXbY1CkAl3ik5ccb7ZqhTwS0WBk0vOnHmAsE9cs/bBo7Xii
 GTiWXEFOgtJiXANPMS2P9DiOS3ZQNf+wxotCYdkGPOXuX9wnIo1Kmy8XfujQ1bXf
 pJsEZKfeyROKrzyKWgqLI64/Kg5xNueoFQZfDpOlZYzF1uDstynADPUt0eQD706q
 jcuKaXLN3rn5gSPun5mWOYbRtXVgOLdFL/7zptMVJwFKBFguQENhjG4UMNZcjkVA
 3Mr0kGocsgoCSk1oDBkFlrw1wIsXxWbkRBL1Pww6kovivuGUwoo=
 =j0yx
 -----END PGP SIGNATURE-----

Merge tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull netfs updates from David Howells:
 "Netfs prep for write helpers.

  Having had a go at implementing write helpers and content encryption
  support in netfslib, it seems that the netfs_read_{,sub}request
  structs and the equivalent write request structs were almost the same
  and so should be merged, thereby requiring only one set of
  alloc/get/put functions and a common set of tracepoints.

  Merging the structs also has the advantage that if a bounce buffer is
  added to the request struct, a read operation can be performed to fill
  the bounce buffer, the contents of the buffer can be modified and then
  a write operation can be performed on it to send the data wherever it
  needs to go using the same request structure all the way through. The
  I/O handlers would then transparently perform any required crypto.
  This should make it easier to perform RMW cycles if needed.

  The potentially common functions and structs, however, by their names
  all proclaim themselves to be associated with the read side of things.

  The bulk of these changes alter this in the following ways:

   - Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request.

   - Rename some enums, members and flags to make them more appropriate.

   - Adjust some comments to match.

   - Drop "read"/"rreq" from the names of common functions. For
     instance, netfs_get_read_request() becomes netfs_get_request().

   - The ->init_rreq() and ->issue_op() methods become ->init_request()
     and ->issue_read(). I've kept the latter as a read-specific
     function and in another branch added an ->issue_write() method.

  The driver source is then reorganised into a number of files:

        fs/netfs/buffered_read.c        Create read reqs to the pagecache
        fs/netfs/io.c                   Dispatchers for read and write reqs
        fs/netfs/main.c                 Some general miscellaneous bits
        fs/netfs/objects.c              Alloc, get and put functions
        fs/netfs/stats.c                Optional procfs statistics.

  and future development can be fitted into this scheme, e.g.:

        fs/netfs/buffered_write.c       Modify the pagecache
        fs/netfs/buffered_flush.c       Writeback from the pagecache
        fs/netfs/direct_read.c          DIO read support
        fs/netfs/direct_write.c         DIO write support
        fs/netfs/unbuffered_write.c     Write modifications directly back

  Beyond the above changes, there are also some changes that affect how
  things work:

   - Make fscache_end_operation() generally available.

   - In the netfs tracing header, generate enums from the symbol ->
     string mapping tables rather than manually coding them.

   - Add a struct for filesystems that uses netfslib to put into their
     inode wrapper structs to hold extra state that netfslib is
     interested in, such as the fscache cookie. This allows netfslib
     functions to be set in filesystem operation tables and jumped to
     directly without having to have a filesystem wrapper.

   - Add a member to the struct added above to track the remote inode
     length as that may differ if local modifications are buffered. We
     may need to supply an appropriate EOF pointer when storing data (in
     AFS for example).

   - Pass extra information to netfs_alloc_request() so that the
     ->init_request() hook can access it and retain information to
     indicate the origin of the operation.

   - Make the ->init_request() hook return an error, thereby allowing a
     filesystem that isn't allowed to cache an inode (ceph or cifs, for
     example) to skip readahead.

   - Switch to using refcount_t for subrequests and add tracepoints to
     log refcount changes for the request and subrequest structs.

   - Add a function to consolidate dispatching a read request. Similar
     code is used in three places and another couple are likely to be
     added in the future"

Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/

* tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Maintain netfs_i_context::remote_i_size
  netfs: Keep track of the actual remote file size
  netfs: Split some core bits out into their own file
  netfs: Split fs/netfs/read_helper.c
  netfs: Rename read_helper.c to io.c
  netfs: Prepare to split read_helper.c
  netfs: Add a function to consolidate beginning a read
  netfs: Add a netfs inode context
  ceph: Make ceph_init_request() check caps on readahead
  netfs: Change ->init_request() to return an error code
  netfs: Refactor arguments for netfs_alloc_read_request
  netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines
  netfs: Trace refcounting on the netfs_io_subrequest struct
  netfs: Trace refcounting on the netfs_io_request struct
  netfs: Adjust the netfs_rreq tracepoint slightly
  netfs: Split netfs_io_* object handling out
  netfs: Finish off rename of netfs_read_request to netfs_io_request
  netfs: Rename netfs_read_*request to netfs_io_*request
  netfs: Generate enums from trace symbol mapping lists
  fscache: export fscache_end_operation()
2022-03-31 15:49:36 -07:00
b8321ed4a4 Kbuild updates for v5.18
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
    additional flags to be passed to user-space programs.
 
  - Fix missing fflush() bugs in Kconfig and fixdep
 
  - Fix a minor bug in the comment format of the .config file
 
  - Make kallsyms ignore llvm's local labels, .L*
 
  - Fix UAPI compile-test for cross-compiling with Clang
 
  - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
    particular version of LLVm, and LLVM=<prefix> form for using custom
    LLVM in a particular directory path.
 
  - Clean up Makefiles
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs
 IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/
 p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj
 ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa
 rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z
 ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF
 nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW
 /YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX
 oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2
 p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD
 ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1
 +LJMEpdZO0dFvpF+
 =84rW
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
   additional flags to be passed to user-space programs.

 - Fix missing fflush() bugs in Kconfig and fixdep

 - Fix a minor bug in the comment format of the .config file

 - Make kallsyms ignore llvm's local labels, .L*

 - Fix UAPI compile-test for cross-compiling with Clang

 - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
   particular version of LLVm, and LLVM=<prefix> form for using custom
   LLVM in a particular directory path.

 - Clean up Makefiles

* tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: Make $(LLVM) more flexible
  kbuild: add --target to correctly cross-compile UAPI headers with Clang
  fixdep: use fflush() and ferror() to ensure successful write to files
  arch: syscalls: simplify uapi/kapi directory creation
  usr/include: replace extra-y with always-y
  certs: simplify empty certs creation in certs/Makefile
  certs: include certs/signing_key.x509 unconditionally
  kallsyms: ignore all local labels prefixed by '.L'
  kconfig: fix missing '# end of' for empty menu
  kconfig: add fflush() before ferror() check
  kbuild: replace $(if A,A,B) with $(or A,B)
  kbuild: Add environment variables for userprogs flags
  kbuild: unify cmd_copy and cmd_shipped
2022-03-31 11:59:03 -07:00
27ca8273fd gfs2: Make sure FITRIM minlen is rounded up to fs block size
Per fstrim(8) we must round up the minlen argument to the fs block size.
The current calculation doesn't take into account devices that have a
discard granularity and requested minlen less than 1 fs block, so the
value can get shifted away to zero in the translation to fs blocks.

The zero minlen passed to gfs2_rgrp_send_discards() then allows
sb_issue_discard() to be called with nr_sects == 0 which returns -EINVAL
and results in gfs2_rgrp_send_discards() returning -EIO.

Make sure minlen is never < 1 fs block by taking the max of the
requested minlen and the fs block size before comparing to the device's
discard granularity and shifting to fs blocks.

Fixes: 076f0faa764ab ("GFS2: Fix FITRIM argument handling")
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-31 20:35:38 +02:00
999397926a nfsd: Clean up nfsd_file_put()
Make it a little less racy, by removing the refcount_read() test. Then
remove the redundant 'is_hashed' variable.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-31 10:39:59 -04:00
d6f5e35845 cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
When calling smb2_ioctl_query_info() with invalid
smb_query_info::flags, a NULL ptr dereference is triggered when trying
to kfree() uninitialised rqst[n].rq_iov array.

This also fixes leaked paths that are created in SMB2_open_init()
which required SMB2_open_free() to properly free them.

Here is a small C reproducer that triggers it

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/ioctl.h>

	#define die(s) perror(s), exit(1)
	#define QUERY_INFO 0xc018cf07

	int main(int argc, char *argv[])
	{
		int fd;

		if (argc < 2)
			exit(1);
		fd = open(argv[1], O_RDONLY);
		if (fd == -1)
			die("open");
		if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1)
			die("ioctl");
		close(fd);
		return 0;
	}

	mount.cifs //srv/share /mnt -o ...
	gcc repro.c && ./a.out /mnt/f0

	[ 1832.124468] CIFS: VFS: \\w22-dc.zelda.test\test Invalid passthru query flags: 0x4
	[ 1832.125043] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
	[ 1832.125764] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	[ 1832.126241] CPU: 3 PID: 1133 Comm: a.out Not tainted 5.17.0-rc8 #2
	[ 1832.126630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
	[ 1832.127322] RIP: 0010:smb2_ioctl_query_info+0x7a3/0xe30 [cifs]
	[ 1832.127749] Code: 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 6c 05 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 74 24 28 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 cb 04 00 00 49 8b 3e e8 bb fc fa ff 48 89 da 48
	[ 1832.128911] RSP: 0018:ffffc90000957b08 EFLAGS: 00010256
	[ 1832.129243] RAX: dffffc0000000000 RBX: ffff888117e9b850 RCX: ffffffffa020580d
	[ 1832.129691] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a2c0
	[ 1832.130137] RBP: ffff888117e9b878 R08: 0000000000000001 R09: 0000000000000003
	[ 1832.130585] R10: fffffbfff4087458 R11: 0000000000000001 R12: ffff888117e9b800
	[ 1832.131037] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff888117e9b8a8
	[ 1832.131485] FS:  00007fcee9900740(0000) GS:ffff888151a00000(0000) knlGS:0000000000000000
	[ 1832.131993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 1832.132354] CR2: 00007fcee9a1ef5e CR3: 0000000114cd2000 CR4: 0000000000350ee0
	[ 1832.132801] Call Trace:
	[ 1832.132962]  <TASK>
	[ 1832.133104]  ? smb2_query_reparse_tag+0x890/0x890 [cifs]
	[ 1832.133489]  ? cifs_mapchar+0x460/0x460 [cifs]
	[ 1832.133822]  ? rcu_read_lock_sched_held+0x3f/0x70
	[ 1832.134125]  ? cifs_strndup_to_utf16+0x15b/0x250 [cifs]
	[ 1832.134502]  ? lock_downgrade+0x6f0/0x6f0
	[ 1832.134760]  ? cifs_convert_path_to_utf16+0x198/0x220 [cifs]
	[ 1832.135170]  ? smb2_check_message+0x1080/0x1080 [cifs]
	[ 1832.135545]  cifs_ioctl+0x1577/0x3320 [cifs]
	[ 1832.135864]  ? lock_downgrade+0x6f0/0x6f0
	[ 1832.136125]  ? cifs_readdir+0x2e60/0x2e60 [cifs]
	[ 1832.136468]  ? rcu_read_lock_sched_held+0x3f/0x70
	[ 1832.136769]  ? __rseq_handle_notify_resume+0x80b/0xbe0
	[ 1832.137096]  ? __up_read+0x192/0x710
	[ 1832.137327]  ? __ia32_sys_rseq+0xf0/0xf0
	[ 1832.137578]  ? __x64_sys_openat+0x11f/0x1d0
	[ 1832.137850]  __x64_sys_ioctl+0x127/0x190
	[ 1832.138103]  do_syscall_64+0x3b/0x90
	[ 1832.138378]  entry_SYSCALL_64_after_hwframe+0x44/0xae
	[ 1832.138702] RIP: 0033:0x7fcee9a253df
	[ 1832.138937] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
	[ 1832.140107] RSP: 002b:00007ffeba94a8a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[ 1832.140606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcee9a253df
	[ 1832.141058] RDX: 00007ffeba94a910 RSI: 00000000c018cf07 RDI: 0000000000000003
	[ 1832.141503] RBP: 00007ffeba94a930 R08: 00007fcee9b24db0 R09: 00007fcee9b45c4e
	[ 1832.141948] R10: 00007fcee9918d40 R11: 0000000000000246 R12: 00007ffeba94aa48
	[ 1832.142396] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007fcee9b78000
	[ 1832.142851]  </TASK>
	[ 1832.142994] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [last unloaded: cifs]

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-31 09:39:58 -05:00
b92e358757 cifs: prevent bad output lengths in smb2_ioctl_query_info()
When calling smb2_ioctl_query_info() with
smb_query_info::flags=PASSTHRU_FSCTL and
smb_query_info::output_buffer_length=0, the following would return
0x10

	buffer = memdup_user(arg + sizeof(struct smb_query_info),
			     qi.output_buffer_length);
	if (IS_ERR(buffer)) {
		kfree(vars);
		return PTR_ERR(buffer);
	}

rather than a valid pointer thus making IS_ERR() check fail.  This
would then cause a NULL ptr deference in @buffer when accessing it
later in smb2_ioctl_query_ioctl().  While at it, prevent having a
@buffer smaller than 8 bytes to correctly handle SMB2_SET_INFO
FileEndOfFileInformation requests when
smb_query_info::flags=PASSTHRU_SET_INFO.

Here is a small C reproducer which triggers a NULL ptr in @buffer when
passing an invalid smb_query_info::flags

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/ioctl.h>

	#define die(s) perror(s), exit(1)
	#define QUERY_INFO 0xc018cf07

	int main(int argc, char *argv[])
	{
		int fd;

		if (argc < 2)
			exit(1);
		fd = open(argv[1], O_RDONLY);
		if (fd == -1)
			die("open");
		if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1)
			die("ioctl");
		close(fd);
		return 0;
	}

	mount.cifs //srv/share /mnt -o ...
	gcc repro.c && ./a.out /mnt/f0

	[  114.138620] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
	[  114.139310] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	[  114.139775] CPU: 2 PID: 995 Comm: a.out Not tainted 5.17.0-rc8 #1
	[  114.140148] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
	[  114.140818] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs]
	[  114.141221] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24
	[  114.142348] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256
	[  114.142692] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d
	[  114.143119] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380
	[  114.143544] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003
	[  114.143983] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288
	[  114.144424] R13: 00000000ffffffea R14: ffff888115503228 R15: 0000000000000000
	[  114.144852] FS:  00007f7aeabdf740(0000) GS:ffff888151600000(0000) knlGS:0000000000000000
	[  114.145338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  114.145692] CR2: 00007f7aeacfdf5e CR3: 000000012000e000 CR4: 0000000000350ee0
	[  114.146131] Call Trace:
	[  114.146291]  <TASK>
	[  114.146432]  ? smb2_query_reparse_tag+0x890/0x890 [cifs]
	[  114.146800]  ? cifs_mapchar+0x460/0x460 [cifs]
	[  114.147121]  ? rcu_read_lock_sched_held+0x3f/0x70
	[  114.147412]  ? cifs_strndup_to_utf16+0x15b/0x250 [cifs]
	[  114.147775]  ? dentry_path_raw+0xa6/0xf0
	[  114.148024]  ? cifs_convert_path_to_utf16+0x198/0x220 [cifs]
	[  114.148413]  ? smb2_check_message+0x1080/0x1080 [cifs]
	[  114.148766]  ? rcu_read_lock_sched_held+0x3f/0x70
	[  114.149065]  cifs_ioctl+0x1577/0x3320 [cifs]
	[  114.149371]  ? lock_downgrade+0x6f0/0x6f0
	[  114.149631]  ? cifs_readdir+0x2e60/0x2e60 [cifs]
	[  114.149956]  ? rcu_read_lock_sched_held+0x3f/0x70
	[  114.150250]  ? __rseq_handle_notify_resume+0x80b/0xbe0
	[  114.150562]  ? __up_read+0x192/0x710
	[  114.150791]  ? __ia32_sys_rseq+0xf0/0xf0
	[  114.151025]  ? __x64_sys_openat+0x11f/0x1d0
	[  114.151296]  __x64_sys_ioctl+0x127/0x190
	[  114.151549]  do_syscall_64+0x3b/0x90
	[  114.151768]  entry_SYSCALL_64_after_hwframe+0x44/0xae
	[  114.152079] RIP: 0033:0x7f7aead043df
	[  114.152306] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
	[  114.153431] RSP: 002b:00007ffc2e0c1f80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[  114.153890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7aead043df
	[  114.154315] RDX: 00007ffc2e0c1ff0 RSI: 00000000c018cf07 RDI: 0000000000000003
	[  114.154747] RBP: 00007ffc2e0c2010 R08: 00007f7aeae03db0 R09: 00007f7aeae24c4e
	[  114.155192] R10: 00007f7aeabf7d40 R11: 0000000000000246 R12: 00007ffc2e0c2128
	[  114.155642] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007f7aeae57000
	[  114.156071]  </TASK>
	[  114.156218] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload
	[  114.156608] ---[ end trace 0000000000000000 ]---
	[  114.156898] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs]
	[  114.157792] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24
	[  114.159293] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256
	[  114.159641] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d
	[  114.160093] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380
	[  114.160699] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003
	[  114.161196] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288
	[  114.155642] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007f7aeae57000
	[  114.156071]  </TASK>
	[  114.156218] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload
	[  114.156608] ---[ end trace 0000000000000000 ]---
	[  114.156898] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs]
	[  114.157792] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24
	[  114.159293] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256
	[  114.159641] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d
	[  114.160093] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380
	[  114.160699] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003
	[  114.161196] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288
	[  114.161823] R13: 00000000ffffffea R14: ffff888115503228 R15: 0000000000000000
	[  114.162274] FS:  00007f7aeabdf740(0000) GS:ffff888151600000(0000) knlGS:0000000000000000
	[  114.162853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  114.163218] CR2: 00007f7aeacfdf5e CR3: 000000012000e000 CR4: 0000000000350ee0
	[  114.163691] Kernel panic - not syncing: Fatal exception
	[  114.164087] Kernel Offset: disabled
	[  114.164316] ---[ end Kernel panic - not syncing: Fatal exception ]---

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-31 09:39:58 -05:00
6b8a94332e nfsd: Fix a write performance regression
The call to filemap_flush() in nfsd_file_put() is there to ensure that
we clear out any writes belonging to a NFSv3 client relatively quickly
and avoid situations where the file can't be evicted by the garbage
collector. It also ensures that we detect write errors quickly.

The problem is this causes a regression in performance for some
workloads.

So try to improve matters by deferring writeback until we're ready to
close the file, and need to detect errors so that we can force the
client to resend.

Tested-by: Jan Kara <jack@suse.cz>
Fixes: b6669305d35a ("nfsd: Reduce the number of calls to nfsd_file_gc()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-31 10:39:57 -04:00
c7803b05f7 smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common
Fix an endian bug in ksmbd for one remaining use of
Persistent/VolatileFid that unnecessarily converted it (it is an
opaque endian field that does not need to be and should not
be converted) in oplock_break for ksmbd, and move the definitions
for the oplock and lease break protocol requests and responses
to fs/smbfs_common/smb2pdu.h

Also move a few more definitions for various protocol requests
that were duplicated (in fs/cifs/smb2pdu.h and fs/ksmbd/smb2pdu.h)
into fs/smbfs_common/smb2pdu.h including:

- various ioctls and reparse structures
- validate negotiate request and response structs
- duplicate extents structs

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-31 09:38:53 -05:00
49df342218 fs: fix an infinite loop in iomap_fiemap
when get fiemap starting from MAX_LFS_FILESIZE, (maxbytes - *len) < start
will always true , then *len set zero. because of start offset is beyond
file size, for erofs filesystem it will always return iomap.length with
zero,iomap iterate will enter infinite loop. it is necessary cover this
corner case to avoid this situation.

------------[ cut here ]------------
WARNING: CPU: 7 PID: 905 at fs/iomap/iter.c:35 iomap_iter+0x97f/0xc70
Modules linked in: xfs erofs
CPU: 7 PID: 905 Comm: iomap Tainted: G        W         5.17.0-rc8 #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:iomap_iter+0x97f/0xc70
Code: 85 a1 fc ff ff e8 71 be 9c ff 0f 1f 44 00 00 e9 92 fc ff ff e8 62 be 9c ff 0f 0b b8 fb ff ff ff e9 fc f8 ff ff e8 51 be 9c ff <0f> 0b e9 2b fc ff ff e8 45 be 9c ff 0f 0b e9 e1 fb ff ff e8 39 be
RSP: 0018:ffff888060a37ab0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888060a37bb0 RCX: 0000000000000000
RDX: ffff88807e19a900 RSI: ffffffff81a7da7f RDI: ffff888060a37be0
RBP: 7fffffffffffffff R08: 0000000000000000 R09: ffff888060a37c20
R10: ffff888060a37c67 R11: ffffed100c146f8c R12: 7fffffffffffffff
R13: 0000000000000000 R14: ffff888060a37bd8 R15: ffff888060a37c20
FS:  00007fd3cca01540(0000) GS:ffff888108780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010820 CR3: 0000000054b92000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 iomap_fiemap+0x1c9/0x2f0
 erofs_fiemap+0x64/0x90 [erofs]
 do_vfs_ioctl+0x40d/0x12e0
 __x64_sys_ioctl+0xaa/0x1c0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>
---[ end trace 0000000000000000 ]---
watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [iomap:905]

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: fix some typos]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-30 09:49:28 -07:00
edf5f0548f ksmbd: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-30 08:17:55 -05:00
56b401fb0c ksmbd: Remove a redundant zeroing of memory
fill_transform_hdr() has only one caller that already clears tr_buf (it is
kzalloc'ed).

So there is no need to clear it another time here.

Remove the superfluous memset() and add a comment to remind that the caller
must clear the buffer.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-30 08:17:53 -05:00
adc3282140 ksmbd: shorten experimental warning on loading the module
ksmbd is continuing to improve.  Shorten the warning message
logged the first time it is loaded to:
   "The ksmbd server is experimental"

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-30 08:14:59 -05:00
d888c83fce fs: fix fd table size alignment properly
Jason Donenfeld reports that my commit 1c24a186398f ("fs: fd tables have
to be multiples of BITS_PER_LONG") doesn't work, and the reason is an
embarrassing brown-paper-bag bug.

Yes, we want to align the number of fds to BITS_PER_LONG, and yes, the
reason they might not be aligned is because the incoming 'max_fd'
argument might not be aligned.

But aligining the argument - while simple - will cause a "infinitely
big" maxfd (eg NR_OPEN_MAX) to just overflow to zero.  Which most
definitely isn't what we want either.

The obvious fix was always just to do the alignment last, but I had
moved it earlier just to make the patch smaller and the code look
simpler.  Duh.  It certainly made _me_ look simple.

Fixes: 1c24a186398f ("fs: fd tables have to be multiples of BITS_PER_LONG")
Reported-and-tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Fedor Pchelkin <aissur0002@gmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-29 23:29:18 -07:00
b243874f6f NFSv4: fix open failure with O_ACCMODE flag
open() with O_ACCMODE|O_DIRECT flags secondly will fail.

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)

Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when
client use incorrect share access mode of 0.

Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client,
just like firstly opening.

Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29 22:14:00 -04:00
ab0fc21bc7 Revert "NFSv4: Handle the special Linux file open access mode"
This reverts commit 44942b4e457beda00981f616402a1a791e8c616e.

After secondly opening a file with O_ACCMODE|O_DIRECT flags,
nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek().

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
  5. lseek(fd)

Reported-by: Lyu Tao <tao.lyu@epfl.ch>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29 22:14:00 -04:00
965181d7ef NFS client updates for Linux 5.18
Highlights include:
 
 Features:
 - Switch NFS to use readahead instead of the obsolete readpages.
 - Readdir fixes to improve cacheability of large directories when there
   are multiple readers and writers.
 - Readdir performance improvements when doing a seekdir() immediately
   after opening the directory (common when re-exporting NFS).
 - NFS swap improvements from Neil Brown.
 - Loosen up memory allocation to permit direct reclaim and write back
   in cases where there is no danger of deadlocking the writeback code or
   NFS swap.
 - Avoid sillyrename when the NFSv4 server claims to support the
   necessary features to recover the unlinked but open file after reboot.
 
 Bugfixes:
 - Patch from Olga to add a mount option to control NFSv4.1 session
   trunking discovery, and default it to being off.
 - Fix a lockup in nfs_do_recoalesce().
 - Two fixes for list iterator variables being used when pointing to the
   list head.
 - Fix a kernel memory scribble when reading from a non-socket transport
   in /sys/kernel/sunrpc.
 - Fix a race where reconnecting to a server could leave the TCP socket
   stuck forever in the connecting state.
 - Patch from Neil to fix a shutdown race which can leave the SUNRPC
   transport timer primed after we free the struct xprt itself.
 - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy
   offload.
 - Sunrpc patch from Olga to avoid resending a task on an offlined
   transport.
 
 Cleanups:
 - Patches from Dave Wysochanski to clean up the fscache code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJDLLUACgkQZwvnipYK
 APJZ5Q/9FH5S38RcUOxpFS+XaVdmxltJTMQv9Bz5mcyljq6PwRjTn73Qs7MfBQfD
 k90O9Pn/EXgQpFWNW4o/saQGLAO1MbVaesdNNF92Le9NziMz7lr95fsFXOqpg9EQ
 rEVOImWinv/N0+P0yv5tgjx6Fheu70L4dnwsxepnmtv2R1VjXOVchK7WjsV+9qey
 Xvn15OzhtVCGhRvvKfmGuGW+2I0D29tRNn2X0Pi+LjxAEQ0We/rcsJeGDMGYtbeC
 2U4ADk7KB0kJq04WYCH8k8tTrljL3PUONzO6QP1lltmAZ0FOwXf/AjyaD3SML2Xu
 A1yc2bEn3/S6wCQ3BiMxoE7UCSewg/ZyQD7B9GjZ/nWYNXREhLKvtVZt+ULTCA6o
 UKAI2vlEHljzQI2YfbVVZFyL8KCZqi+BFcJ/H/dtFKNazIkiefkaF+703Pflazis
 HjhA5G6IT0Pk6qZ6OBp+Z/4kCDYZYXXi9ysiNuMPWCiYEfZzt7ocUjm44uRAi+vR
 Qpf0V01Rg4iplTj3YFXqb9TwxrudzC7AcQ7hbLJia0kD/8djG+aGZs3SJWeO+VuE
 Xa41oxAx9jqOSRvSKPvj7nbexrwn8PtQI2NtDRqjPXlg6+Jbk23u8lIEVhsCH877
 xoMUHtTKLyqECskzRN4nh7qoBD3QfiLuI74aDcPtGEuXzU1+Ox8=
 =jeGt
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Switch NFS to use readahead instead of the obsolete readpages.

   - Readdir fixes to improve cacheability of large directories when
     there are multiple readers and writers.

   - Readdir performance improvements when doing a seekdir() immediately
     after opening the directory (common when re-exporting NFS).

   - NFS swap improvements from Neil Brown.

   - Loosen up memory allocation to permit direct reclaim and write back
     in cases where there is no danger of deadlocking the writeback code
     or NFS swap.

   - Avoid sillyrename when the NFSv4 server claims to support the
     necessary features to recover the unlinked but open file after
     reboot.

  Bugfixes:

   - Patch from Olga to add a mount option to control NFSv4.1 session
     trunking discovery, and default it to being off.

   - Fix a lockup in nfs_do_recoalesce().

   - Two fixes for list iterator variables being used when pointing to
     the list head.

   - Fix a kernel memory scribble when reading from a non-socket
     transport in /sys/kernel/sunrpc.

   - Fix a race where reconnecting to a server could leave the TCP
     socket stuck forever in the connecting state.

   - Patch from Neil to fix a shutdown race which can leave the SUNRPC
     transport timer primed after we free the struct xprt itself.

   - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2
     copy offload.

   - Sunrpc patch from Olga to avoid resending a task on an offlined
     transport.

  Cleanups:

   - Patches from Dave Wysochanski to clean up the fscache code"

* tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits)
  NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
  NFS: Don't loop forever in nfs_do_recoalesce()
  SUNRPC: Don't return error values in sysfs read of closed files
  SUNRPC: Do not dereference non-socket transports in sysfs
  NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
  SUNRPC don't resend a task on an offlined transport
  NFS: replace usage of found with dedicated list iterator variable
  SUNRPC: avoid race between mod_timer() and del_timer_sync()
  pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod
  pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
  NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  NFS: Avoid writeback threads getting stuck in mempool_alloc()
  NFS: nfsiod should not block forever in mempool_alloc()
  SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  SUNRPC: Fix unx_lookup_cred() allocation
  NFS: Fix memory allocation in rpc_alloc_task()
  NFS: Fix memory allocation in rpc_malloc()
  SUNRPC: Improve accuracy of socket ENOBUFS determination
  SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE
  SUNRPC: Fix socket waits for write buffer space
  ...
2022-03-29 18:55:37 -07:00
919edbadeb xfs: drop async cache flushes from CIL commits.
Jan Kara reported a performance regression in dbench that he
bisected down to commit bad77c375e8d ("xfs: CIL checkpoint
flushes caches unconditionally").

Whilst developing the journal flush/fua optimisations this cache was
part of, it appeared to made a significant difference to
performance. However, now that this patchset has settled and all the
correctness issues fixed, there does not appear to be any
significant performance benefit to asynchronous cache flushes.

In fact, the opposite is true on some storage types and workloads,
where additional cache flushes that can occur from fsync heavy
workloads have measurable and significant impact on overall
throughput.

Local dbench testing shows little difference on dbench runs with
sync vs async cache flushes on either fast or slow SSD storage, and
no difference in streaming concurrent async transaction workloads
like fs-mark.

Fast NVME storage.

From `dbench -t 30`, CIL scale:

clients		async			sync
		BW	Latency		BW	Latency
1		 935.18   0.855		 915.64   0.903
8		2404.51   6.873		2341.77   6.511
16		3003.42   6.460		2931.57   6.529
32		3697.23   7.939		3596.28   7.894
128		7237.43  15.495		7217.74  11.588
512		5079.24  90.587		5167.08  95.822

fsmark, 32 threads, create w/ 64 byte xattr w/32k logbsize

	create		chown		unlink
async   1m41s		1m16s		2m03s
sync	1m40s		1m19s		1m54s

Slower SATA SSD storage:

From `dbench -t 30`, CIL scale:

clients		async			sync
		BW	Latency		BW	Latency
1		  78.59  15.792		  83.78  10.729
8		 367.88  92.067		 404.63  59.943
16		 564.51  72.524		 602.71  76.089
32		 831.66 105.984		 870.26 110.482
128		1659.76 102.969		1624.73  91.356
512		2135.91 223.054		2603.07 161.160

fsmark, 16 threads, create w/32k logbsize

	create		unlink
async   5m06s		4m15s
sync	5m00s		4m22s

And on Jan's test machine:

                   5.18-rc8-vanilla       5.18-rc8-patched
Amean     1        71.22 (   0.00%)       64.94 *   8.81%*
Amean     2        93.03 (   0.00%)       84.80 *   8.85%*
Amean     4       150.54 (   0.00%)      137.51 *   8.66%*
Amean     8       252.53 (   0.00%)      242.24 *   4.08%*
Amean     16      454.13 (   0.00%)      439.08 *   3.31%*
Amean     32      835.24 (   0.00%)      829.74 *   0.66%*
Amean     64     1740.59 (   0.00%)     1686.73 *   3.09%*

Performance and cache flush behaviour is restored to pre-regression
levels.

As such, we can now consider the async cache flush mechanism an
unnecessary exercise in premature optimisation and hence we can
now remove it and the infrastructure it requires completely.

Fixes: bad77c375e8d ("xfs: CIL checkpoint flushes caches unconditionally")
Reported-and-tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:02 -07:00
5652ef3170 xfs: shutdown during log recovery needs to mark the log shutdown
When a checkpoint writeback is run by log recovery, corruption
propagated from the log can result in writeback verifiers failing
and calling xfs_force_shutdown() from
xfs_buf_delwri_submit_buffers().

This results in the mount being marked as shutdown, but the log does
not get marked as shut down because:

        /*
         * If this happens during log recovery then we aren't using the runtime
         * log mechanisms yet so there's nothing to shut down.
         */
        if (!log || xlog_in_recovery(log))
                return false;

If there are other buffers that then fail (say due to detecting the
mount shutdown), they will now hang in xfs_do_force_shutdown()
waiting for the log to shut down like this:

  __schedule+0x30d/0x9e0
  schedule+0x55/0xd0
  xfs_do_force_shutdown+0x1cd/0x200
  ? init_wait_var_entry+0x50/0x50
  xfs_buf_ioend+0x47e/0x530
  __xfs_buf_submit+0xb0/0x240
  xfs_buf_delwri_submit_buffers+0xfe/0x270
  xfs_buf_delwri_submit+0x3a/0xc0
  xlog_do_recovery_pass+0x474/0x7b0
  ? do_raw_spin_unlock+0x30/0xb0
  xlog_do_log_recovery+0x91/0x140
  xlog_do_recover+0x38/0x1e0
  xlog_recover+0xdd/0x170
  xfs_log_mount+0x17e/0x2e0
  xfs_mountfs+0x457/0x930
  xfs_fs_fill_super+0x476/0x830

xlog_force_shutdown() always needs to mark the log as shut down,
regardless of whether recovery is in progress or not, so that
multiple calls to xfs_force_shutdown() during recovery don't end
up waiting for the log to be shut down like this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:02 -07:00
3c4cb76bce xfs: xfs_trans_commit() path must check for log shutdown
If a shut races with xfs_trans_commit() and we have shut down the
filesystem but not the log, we will still cancel the transaction.
This can result in aborting dirty log items instead of committing and
pinning them whilst the log is still running. Hence we can end up
with dirty, unlogged metadata that isn't in the AIL in memory that
can be flushed to disk via writeback clustering.

This was discovered from a g/388 trace where an inode log item was
having IO completed on it and it wasn't in the AIL, hence tripping
asserts xfs_ail_check(). Inode cluster writeback started long after
the filesystem shutdown started, and long after the transaction
containing the dirty inode was aborted and the log item marked
XFS_LI_ABORTED. The inode was seen as dirty and unpinned, so it
was flushed. IO completion tried to remove the inode from the AIL,
at which point stuff went bad:

 XFS (pmem1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xa3/0xf0 (fs/xfs/xfs_fsops.c:500).  Shutting down filesystem.
 XFS: Assertion failed: in_ail, file: fs/xfs/xfs_trans_ail.c, line: 67
 XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
 Workqueue: xfs-buf/pmem1 xfs_buf_ioend_work
 RIP: 0010:assfail+0x27/0x2d
 Call Trace:
  <TASK>
  xfs_ail_check+0xa8/0x180
  xfs_ail_delete_one+0x3b/0xf0
  xfs_buf_inode_iodone+0x329/0x3f0
  xfs_buf_ioend+0x1f8/0x530
  xfs_buf_ioend_work+0x15/0x20
  process_one_work+0x1ac/0x390
  worker_thread+0x56/0x3c0
  kthread+0xf6/0x120
  ret_from_fork+0x1f/0x30
  </TASK>

xfs_trans_commit() needs to check log state for shutdown, not mount
state. It cannot abort dirty log items while the log is still
running as dirty items must remained pinned in memory until they are
either committed to the journal or the log has shut down and they
can be safely tossed away. Hence if the log has not shut down, the
xfs_trans_commit() path must allow completed transactions to commit
to the CIL and pin the dirty items even if a mount shutdown has
started.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:01 -07:00
41e6362183 xfs: xfs_do_force_shutdown needs to block racing shutdowns
When we call xfs_forced_shutdown(), the caller often expects the
filesystem to be completely shut down when it returns. However,
if we have racing xfs_forced_shutdown() calls, the first caller sets
the mount shutdown flag then goes to shutdown the log. The second
caller sees the mount shutdown flag and returns immediately - it
does not wait for the log to be shut down.

Unfortunately, xfs_forced_shutdown() is used in some places that
expect it to completely shut down the filesystem before it returns
(e.g. xfs_trans_log_inode()). As such, returning before the log has
been shut down leaves us in a place where the transaction failed to
complete correctly but we still call xfs_trans_commit(). This
situation arises because xfs_trans_log_inode() does not return an
error and instead calls xfs_force_shutdown() to ensure that the
transaction being committed is aborted.

Unfortunately, we have a race condition where xfs_trans_commit()
needs to check xlog_is_shutdown() because it can't abort log items
before the log is shut down, but it needs to use xfs_is_shutdown()
because xfs_forced_shutdown() does not block waiting for the log to
shut down.

To fix this conundrum, first we make all calls to
xfs_forced_shutdown() block until the log is also shut down. This
means we can then safely use xfs_forced_shutdown() as a mechanism
that ensures the currently running transaction will be aborted by
xfs_trans_commit() regardless of the shutdown check it uses.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:01 -07:00
b5f17bec12 xfs: log shutdown triggers should only shut down the log
We've got a mess on our hands.

1. xfs_trans_commit() cannot cancel transactions because the mount is
shut down - that causes dirty, aborted, unlogged log items to sit
unpinned in memory and potentially get written to disk before the
log is shut down. Hence xfs_trans_commit() can only abort
transactions when xlog_is_shutdown() is true.

2. xfs_force_shutdown() is used in places to cause the current
modification to be aborted via xfs_trans_commit() because it may be
impractical or impossible to cancel the transaction directly, and
hence xfs_trans_commit() must cancel transactions when
xfs_is_shutdown() is true in this situation. But we can't do that
because of #1.

3. Log IO errors cause log shutdowns by calling xfs_force_shutdown()
to shut down the mount and then the log from log IO completion.

4. xfs_force_shutdown() can result in a log force being issued,
which has to wait for log IO completion before it will mark the log
as shut down. If #3 races with some other shutdown trigger that runs
a log force, we rely on xfs_force_shutdown() silently ignoring #3
and avoiding shutting down the log until the failed log force
completes.

5. To ensure #2 always works, we have to ensure that
xfs_force_shutdown() does not return until the the log is shut down.
But in the case of #4, this will result in a deadlock because the
log Io completion will block waiting for a log force to complete
which is blocked waiting for log IO to complete....

So the very first thing we have to do here to untangle this mess is
dissociate log shutdown triggers from mount shutdowns. We already
have xlog_forced_shutdown, which will atomically transistion to the
log a shutdown state. Due to internal asserts it cannot be called
multiple times, but was done simply because the only place that
could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!)
and that could only call it once and once only.  So the first thing
we do is remove the asserts.

We then convert all the internal log shutdown triggers to call
xlog_force_shutdown() directly instead of xfs_force_shutdown(). This
allows the log shutdown triggers to shut down the log without
needing to care about mount based shutdown constraints. This means
we shut down the log independently of the mount and the mount may
not notice this until it's next attempt to read or modify metadata.
At that point (e.g. xfs_trans_commit()) it will see that the log is
shutdown, error out and shutdown the mount.

To ensure that all the unmount behaviours and asserts track
correctly as a result of a log shutdown, propagate the shutdown up
to the mount if it is not already set. This keeps the mount and log
state in sync, and saves a huge amount of hassle where code fails
because of a log shutdown but only checks for mount shutdowns and
hence ends up doing the wrong thing. Cleaning up that mess is
an exercise for another day.

This enables us to address the other problems noted above in
followup patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:01 -07:00
cd6f79d1fb xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
Brian reported a null pointer dereference failure during unmount in
xfs/006. He tracked the problem down to the AIL being torn down
before a log shutdown had completed and removed all the items from
the AIL. The failure occurred in this path while unmount was
proceeding in another task:

 xfs_trans_ail_delete+0x102/0x130 [xfs]
 xfs_buf_item_done+0x22/0x30 [xfs]
 xfs_buf_ioend+0x73/0x4d0 [xfs]
 xfs_trans_committed_bulk+0x17e/0x2f0 [xfs]
 xlog_cil_committed+0x2a9/0x300 [xfs]
 xlog_cil_process_committed+0x69/0x80 [xfs]
 xlog_state_shutdown_callbacks+0xce/0xf0 [xfs]
 xlog_force_shutdown+0xdf/0x150 [xfs]
 xfs_do_force_shutdown+0x5f/0x150 [xfs]
 xlog_ioend_work+0x71/0x80 [xfs]
 process_one_work+0x1c5/0x390
 worker_thread+0x30/0x350
 kthread+0xd7/0x100
 ret_from_fork+0x1f/0x30

This is processing an EIO error to a log write, and it's
triggering a force shutdown. This causes the log to be shut down,
and then it is running attached iclog callbacks from the shutdown
context. That means the fs and log has already been marked as
xfs_is_shutdown/xlog_is_shutdown and so high level code will abort
(e.g. xfs_trans_commit(), xfs_log_force(), etc) with an error
because of shutdown.

The umount would have been blocked waiting for a log force
completion inside xfs_log_cover() -> xfs_sync_sb(). The first thing
for this situation to occur is for xfs_sync_sb() to exit without
waiting for the iclog buffer to be comitted to disk. The
above trace is the completion routine for the iclog buffer, and
it is shutting down the filesystem.

xlog_state_shutdown_callbacks() does this:

{
        struct xlog_in_core     *iclog;
        LIST_HEAD(cb_list);

        spin_lock(&log->l_icloglock);
        iclog = log->l_iclog;
        do {
                if (atomic_read(&iclog->ic_refcnt)) {
                        /* Reference holder will re-run iclog callbacks. */
                        continue;
                }
                list_splice_init(&iclog->ic_callbacks, &cb_list);
>>>>>>           wake_up_all(&iclog->ic_write_wait);
>>>>>>           wake_up_all(&iclog->ic_force_wait);
        } while ((iclog = iclog->ic_next) != log->l_iclog);

        wake_up_all(&log->l_flush_wait);
        spin_unlock(&log->l_icloglock);

>>>>>>  xlog_cil_process_committed(&cb_list);
}

This wakes any thread waiting on IO completion of the iclog (in this
case the umount log force) before shutdown processes all the pending
callbacks.  That means the xfs_sync_sb() waiting on a sync
transaction in xfs_log_force() on iclog->ic_force_wait will get
woken before the callbacks attached to that iclog are run. This
results in xfs_sync_sb() returning an error, and so unmount unblocks
and continues to run whilst the log shutdown is still in progress.

Normally this is just fine because the force waiter has nothing to
do with AIL operations. But in the case of this unmount path, the
log force waiter goes on to tear down the AIL because the log is now
shut down and so nothing ever blocks it again from the wait point in
xfs_log_cover().

Hence it's a race to see who gets to the AIL first - the unmount
code or xlog_cil_process_committed() killing the superblock buffer.

To fix this, we just have to change the order of processing in
xlog_state_shutdown_callbacks() to run the callbacks before it wakes
any task waiting on completion of the iclog.

Reported-by: Brian Foster <bfoster@redhat.com>
Fixes: aad7272a9208 ("xfs: separate out log shutdown callback processing")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:00 -07:00
ab9c81ef32 xfs: shutdown in intent recovery has non-intent items in the AIL
generic/388 triggered a failure in RUI recovery due to a corrupted
btree record and the system then locked up hard due to a subsequent
assert failure while holding a spinlock cancelling intents:

 XFS (pmem1): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_trans.c:964).  Shutting down filesystem.
 XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
 XFS: Assertion failed: !xlog_item_is_intent(lip), file: fs/xfs/xfs_log_recover.c, line: 2632
 Call Trace:
  <TASK>
  xlog_recover_cancel_intents.isra.0+0xd1/0x120
  xlog_recover_finish+0xb9/0x110
  xfs_log_mount_finish+0x15a/0x1e0
  xfs_mountfs+0x540/0x910
  xfs_fs_fill_super+0x476/0x830
  get_tree_bdev+0x171/0x270
  ? xfs_init_fs_context+0x1e0/0x1e0
  xfs_fs_get_tree+0x15/0x20
  vfs_get_tree+0x24/0xc0
  path_mount+0x304/0xba0
  ? putname+0x55/0x60
  __x64_sys_mount+0x108/0x140
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Essentially, there's dirty metadata in the AIL from intent recovery
transactions, so when we go to cancel the remaining intents we assume
that all objects after the first non-intent log item in the AIL are
not intents.

This is not true. Intent recovery can log new intents to continue
the operations the original intent could not complete in a single
transaction. The new intents are committed before they are deferred,
which means if the CIL commits in the background they will get
inserted into the AIL at the head.

Hence if we shut down the filesystem while processing intent
recovery, the AIL may have new intents active at the current head.
Hence this check:

                /*
                 * We're done when we see something other than an intent.
                 * There should be no intents left in the AIL now.
                 */
                if (!xlog_item_is_intent(lip)) {
#ifdef DEBUG
                        for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
                                ASSERT(!xlog_item_is_intent(lip));
#endif
                        break;
                }

in both xlog_recover_process_intents() and
log_recover_cancel_intents() is simply not valid. It was valid back
when we only had EFI/EFD intents and didn't chain intents, but it
hasn't been valid ever since intent recovery could create and commit
new intents.

Given that crashing the mount task like this pretty much prevents
diagnosing what went wrong that lead to the initial failure that
triggered intent cancellation, just remove the checks altogether.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:00 -07:00
d2d7c04735 xfs: aborting inodes on shutdown may need buffer lock
Most buffer io list operations are run with the bp->b_lock held, but
xfs_iflush_abort() can be called without the buffer lock being held
resulting in inodes being removed from the buffer list while other
list operations are occurring. This causes problems with corrupted
bp->b_io_list inode lists during filesystem shutdown, leading to
traversals that never end, double removals from the AIL, etc.

Fix this by passing the buffer to xfs_iflush_abort() if we have
it locked. If the inode is attached to the buffer, we're going to
have to remove it from the buffer list and we'd have to get the
buffer off the inode log item to do that anyway.

If we don't have a buffer passed in (e.g. from xfs_reclaim_inode())
then we can determine if the inode has a log item and if it is
attached to a buffer before we do anything else. If it does have an
attached buffer, we can lock it safely (because the inode has a
reference to it) and then perform the inode abort.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:21:59 -07:00
1ec48f9551 A couple bug fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmJDGQUACgkQNqiEXrVA
 jGSENg/5AT5rY7GwLKUrzGcCrikLVt91iftSKfcEtCuZHQiQn/kdpxGwUWW3GJ6y
 GvYKyyUumd6r1WKZArhGwZ/4R/pu0sNSWB6C/cxWqr5r3vJ74cVtsyc49HZvy2w5
 bBoftcDgJPIDPj2POYFbXR14F9D9MjJbcS6Zie5blWhjwNtxTCfc6AwgV5G5cYbo
 X4r/RawUGuJsOLqGjro48v5fVW1za6g8h03y3iWzOcGPn7lsgUaIWzC+ZxK167C7
 qrc4poWpyVcxtCzA+Fwda1VrPYfQWagZcSp0r0bMe2tNw7rvTJKtWZPiIeDlhIhW
 12kbKA/UeJDTMCI4ZA16gUBW0BUvbKRpDYCGtC8OIj64XUblmSOuXHFj71AG7qPI
 X+Lj/1ZxQ1h31bHw5ay859hr36G5EH2yutH5v7wil9yiuBScj2D/D6HC5z7GVb4O
 B5PLRJCRW+Hc86vWUg0hutfwqWmNfcvtxs6nqY111EPV7HShBKsSVhyOrHL/mQOo
 byiFUcYMhbDcrI2dkl8HvELPGMRwn6juw1qL/GlgmCFG7VfR7GW3ilWS8KmjCaTb
 iUoXeNriab3mUkAH9N1+jC6HbRJQnwj8/Exx66CLecqsLeeAIQryPzvAWE8qEJMY
 cTzAUZqp67baoCZ/dmY7AuExkURFc++bu+ulgvye63QIyRwwCBw=
 =ppBo
 -----END PGP SIGNATURE-----

Merge tag 'jfs-5.18' of https://github.com/kleikamp/linux-shaggy

Pull jfs updates from Dave Kleikamp:
 "A couple bug fixes"

* tag 'jfs-5.18' of https://github.com/kleikamp/linux-shaggy:
  jfs: prevent NULL deref in diFree
  jfs: fix divide error in dbNextAG
2022-03-29 18:17:30 -07:00
1c24a18639 fs: fd tables have to be multiples of BITS_PER_LONG
This has always been the rule: fdtables have several bitmaps in them,
and as a result they have to be sized properly for bitmaps.  We walk
those bitmaps in chunks of 'unsigned long' in serveral cases, but even
when we don't, we use the regular kernel bitops that are defined to work
on arrays of 'unsigned long', not on some byte array.

Now, the distinction between arrays of bytes and 'unsigned long'
normally only really ends up being noticeable on big-endian systems, but
Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps()
could be called with an argument that wasn't even a multiple of
BITS_PER_BYTE.  And then it fails to do the proper copy even on
little-endian machines.

The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which
didn't actually sanitize the fdtable size sufficiently, and never made
sure it had the proper BITS_PER_LONG alignment.

That's partly because the alignment historically came not from having to
explicitly align things, but simply from previous fdtable sizes, and
from count_open_files(), which counts the file descriptors by walking
them one 'unsigned long' word at a time and thus naturally ends up doing
sizing in the proper 'chunks of unsigned long'.

But with the introduction of close_range(), we now have an external
source of "this is how many files we want to have", and so
sane_fdtable_size() needs to do a better job.

This also adds that explicit alignment to alloc_fdtable(), although
there it is mainly just for documentation at a source code level.  The
arithmetic we do there to pick a reasonable fdtable size already aligns
the result sufficiently.

In fact,clang notices that the added ALIGN() in that function doesn't
actually do anything, and does not generate any extra code for it.

It turns out that gcc ends up confusing itself by combining a previous
constant-sized shift operation with the variable-sized shift operations
in roundup_pow_of_two().  And probably due to that doesn't notice that
the ALIGN() is a no-op.  But that's a (tiny) gcc misfeature that doesn't
matter.  Having the explicit alignment makes sense, and would actually
matter on a 128-bit architecture if we ever go there.

This also adds big comments above both functions about how fdtable sizes
have to have that BITS_PER_LONG alignment.

Fixes: 60997c3d45d9 ("close_range: add CLOSE_RANGE_UNSHARE")
Reported-by: Fedor Pchelkin <aissur0002@gmail.com>
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: https://lore.kernel.org/all/20220326114009.1690-1-aissur0002@gmail.com/
Tested-and-acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-29 15:06:39 -07:00
3f1d52abf0 io_uring: defer msg-ring file validity check until command issue
In preparation for not using the file at prep time, defer checking if this
file refers to a valid io_uring instance until issue time.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-29 14:04:28 -06:00
9666d4206e io_uring: fail links if msg-ring doesn't succeeed
We must always call req_set_fail() if the request is failed, otherwise
we won't sever links for dependent chains correctly.

Fixes: 4f57f06ce218 ("io_uring: add support for IORING_OP_MSG_RING command")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-29 10:51:08 -06:00
1930a6e739 ptrace: Cleanups for v5.18
This set of changes removes tracehook.h, moves modification of all of
 the ptrace fields inside of siglock to remove races, adds a missing
 permission check to ptrace.c
 
 The removal of tracehook.h is quite significant as it has been a major
 source of confusion in recent years.  Much of that confusion was
 around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
 making the semantics clearer).
 
 For people who don't know tracehook.h is a vestiage of an attempt to
 implement uprobes like functionality that was never fully merged, and
 was later superseeded by uprobes when uprobes was merged.  For many
 years now we have been removing what tracehook functionaly a little
 bit at a time.  To the point where now anything left in tracehook.h is
 some weird strange thing that is difficult to understand.
 
 Eric W. Biederman (15):
       ptrace: Move ptrace_report_syscall into ptrace.h
       ptrace/arm: Rename tracehook_report_syscall report_syscall
       ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
       ptrace: Remove arch_syscall_{enter,exit}_tracehook
       ptrace: Remove tracehook_signal_handler
       task_work: Remove unnecessary include from posix_timers.h
       task_work: Introduce task_work_pending
       task_work: Call tracehook_notify_signal from get_signal on all architectures
       task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
       signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
       resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
       resume_user_mode: Move to resume_user_mode.h
       tracehook: Remove tracehook.h
       ptrace: Move setting/clearing ptrace_message into ptrace_stop
       ptrace: Return the signal to continue with from ptrace_stop
 
 Jann Horn (1):
       ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
 
 Yang Li (1):
       ptrace: Remove duplicated include in ptrace.c
 
  MAINTAINERS                          |   1 -
  arch/Kconfig                         |   5 +-
  arch/alpha/kernel/ptrace.c           |   5 +-
  arch/alpha/kernel/signal.c           |   4 +-
  arch/arc/kernel/ptrace.c             |   5 +-
  arch/arc/kernel/signal.c             |   4 +-
  arch/arm/kernel/ptrace.c             |  12 +-
  arch/arm/kernel/signal.c             |   4 +-
  arch/arm64/kernel/ptrace.c           |  14 +--
  arch/arm64/kernel/signal.c           |   4 +-
  arch/csky/kernel/ptrace.c            |   5 +-
  arch/csky/kernel/signal.c            |   4 +-
  arch/h8300/kernel/ptrace.c           |   5 +-
  arch/h8300/kernel/signal.c           |   4 +-
  arch/hexagon/kernel/process.c        |   4 +-
  arch/hexagon/kernel/signal.c         |   1 -
  arch/hexagon/kernel/traps.c          |   6 +-
  arch/ia64/kernel/process.c           |   4 +-
  arch/ia64/kernel/ptrace.c            |   6 +-
  arch/ia64/kernel/signal.c            |   1 -
  arch/m68k/kernel/ptrace.c            |   5 +-
  arch/m68k/kernel/signal.c            |   4 +-
  arch/microblaze/kernel/ptrace.c      |   5 +-
  arch/microblaze/kernel/signal.c      |   4 +-
  arch/mips/kernel/ptrace.c            |   5 +-
  arch/mips/kernel/signal.c            |   4 +-
  arch/nds32/include/asm/syscall.h     |   2 +-
  arch/nds32/kernel/ptrace.c           |   5 +-
  arch/nds32/kernel/signal.c           |   4 +-
  arch/nios2/kernel/ptrace.c           |   5 +-
  arch/nios2/kernel/signal.c           |   4 +-
  arch/openrisc/kernel/ptrace.c        |   5 +-
  arch/openrisc/kernel/signal.c        |   4 +-
  arch/parisc/kernel/ptrace.c          |   7 +-
  arch/parisc/kernel/signal.c          |   4 +-
  arch/powerpc/kernel/ptrace/ptrace.c  |   8 +-
  arch/powerpc/kernel/signal.c         |   4 +-
  arch/riscv/kernel/ptrace.c           |   5 +-
  arch/riscv/kernel/signal.c           |   4 +-
  arch/s390/include/asm/entry-common.h |   1 -
  arch/s390/kernel/ptrace.c            |   1 -
  arch/s390/kernel/signal.c            |   5 +-
  arch/sh/kernel/ptrace_32.c           |   5 +-
  arch/sh/kernel/signal_32.c           |   4 +-
  arch/sparc/kernel/ptrace_32.c        |   5 +-
  arch/sparc/kernel/ptrace_64.c        |   5 +-
  arch/sparc/kernel/signal32.c         |   1 -
  arch/sparc/kernel/signal_32.c        |   4 +-
  arch/sparc/kernel/signal_64.c        |   4 +-
  arch/um/kernel/process.c             |   4 +-
  arch/um/kernel/ptrace.c              |   5 +-
  arch/x86/kernel/ptrace.c             |   1 -
  arch/x86/kernel/signal.c             |   5 +-
  arch/x86/mm/tlb.c                    |   1 +
  arch/xtensa/kernel/ptrace.c          |   5 +-
  arch/xtensa/kernel/signal.c          |   4 +-
  block/blk-cgroup.c                   |   2 +-
  fs/coredump.c                        |   1 -
  fs/exec.c                            |   1 -
  fs/io-wq.c                           |   6 +-
  fs/io_uring.c                        |  11 +-
  fs/proc/array.c                      |   1 -
  fs/proc/base.c                       |   1 -
  include/asm-generic/syscall.h        |   2 +-
  include/linux/entry-common.h         |  47 +-------
  include/linux/entry-kvm.h            |   2 +-
  include/linux/posix-timers.h         |   1 -
  include/linux/ptrace.h               |  81 ++++++++++++-
  include/linux/resume_user_mode.h     |  64 ++++++++++
  include/linux/sched/signal.h         |  17 +++
  include/linux/task_work.h            |   5 +
  include/linux/tracehook.h            | 226 -----------------------------------
  include/uapi/linux/ptrace.h          |   2 +-
  kernel/entry/common.c                |  19 +--
  kernel/entry/kvm.c                   |   9 +-
  kernel/exit.c                        |   3 +-
  kernel/livepatch/transition.c        |   1 -
  kernel/ptrace.c                      |  47 +++++---
  kernel/seccomp.c                     |   1 -
  kernel/signal.c                      |  62 +++++-----
  kernel/task_work.c                   |   4 +-
  kernel/time/posix-cpu-timers.c       |   1 +
  mm/memcontrol.c                      |   2 +-
  security/apparmor/domain.c           |   1 -
  security/selinux/hooks.c             |   1 -
  85 files changed, 372 insertions(+), 495 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
 j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
 Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
 Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
 DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
 0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
 S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
 3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
 /HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
 4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
 eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
 disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
 =uEro
 -----END PGP SIGNATURE-----

Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace cleanups from Eric Biederman:
 "This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-28 17:29:53 -07:00
fdf59eb548 smb3: cleanup and clarify status of tree connections
Currently the way the tid (tree connection) status is tracked
is confusing.  The same enum is used for structs cifs_tcon
and cifs_ses and TCP_Server_info, but each of these three has
different states that they transition among.  The current
code also unnecessarily uses camelCase.

Convert from use of statusEnum to a new tid_status_enum for
tree connections.  The valid states for a tid are:

        TID_NEW = 0,
        TID_GOOD,
        TID_EXITING,
        TID_NEED_RECON,
        TID_NEED_TCON,
        TID_IN_TCON,
        TID_NEED_FILES_INVALIDATE, /* unused, considering removing in future */
        TID_IN_FILES_INVALIDATE

It also removes CifsNeedTcon, CifsInTcon, CifsNeedFilesInvalidate and
CifsInFilesInvalidate from the statusEnum used for session and
TCP_Server_Info since they are not relevant for those.

A follow on patch will fix the places where we use the
tcon->need_reconnect flag to be more consistent with the tid->status.

Also fixes a bug that was:
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-28 17:07:30 -05:00
266d17a8c0 Driver core changes for 5.18-rc1
Here is the set of driver core changes for 5.18-rc1.
 
 Not much here, primarily it was a bunch of cleanups and small updates:
 	- kobj_type cleanups for default_groups
 	- documentation updates
 	- firmware loader minor changes
 	- component common helper added and take advantage of it in many
 	  drivers (the largest part of this pull request).
 
 There will be a merge conflict in drivers/power/supply/ab8500_chargalg.c
 with your tree, the merge conflict should be easy (take all the
 changes).
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG6PA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylMFwCfSIyAU4oLEgj+/Rfmx4o45cAVIWMAnit3zbdU
 wUUCGqKcOnTJEcW6dMPh
 =1VVi
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core changes for 5.18-rc1.

  Not much here, primarily it was a bunch of cleanups and small updates:

   - kobj_type cleanups for default_groups

   - documentation updates

   - firmware loader minor changes

   - component common helper added and take advantage of it in many
     drivers (the largest part of this pull request).

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
  Documentation: update stable review cycle documentation
  drivers/base/dd.c : Remove the initial value of the global variable
  Documentation: update stable tree link
  Documentation: add link to stable release candidate tree
  devres: fix typos in comments
  Documentation: add note block surrounding security patch note
  samples/kobject: Use sysfs_emit instead of sprintf
  base: soc: Make soc_device_match() simpler and easier to read
  driver core: dd: fix return value of __setup handler
  driver core: Refactor sysfs and drv/bus remove hooks
  driver core: Refactor multiple copies of device cleanup
  scripts: get_abi.pl: Fix typo in help message
  kernfs: fix typos in comments
  kernfs: remove unneeded #if 0 guard
  ALSA: hda/realtek: Make use of the helper component_compare_dev_name
  video: omapfb: dss: Make use of the helper component_compare_dev
  power: supply: ab8500: Make use of the helper component_compare_dev
  ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of
  iommu/mediatek: Make use of the helper component_compare/release_of
  drm: of: Make use of the helper component_release_of
  ...
2022-03-28 12:41:28 -07:00
85bcfa26f9 xfs: don't report reserved bnobt space as available
On a modern filesystem, we don't allow userspace to allocate blocks for
data storage from the per-AG space reservations, the user-controlled
reservation pool that prevents ENOSPC in the middle of internal
operations, or the internal per-AG set-aside that prevents unwanted
filesystem shutdowns due to ENOSPC during a bmap btree split.

Since we now consider freespace btree blocks as unavailable for
allocation for data storage, we shouldn't report those blocks via statfs
either.  This makes the numbers that we return via the statfs f_bavail
and f_bfree fields a more conservative estimate of actual free space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:39:10 -07:00
82be38bcf8 xfs: fix overfilling of reserve pool
Due to cycling of m_sb_lock, it's possible for multiple callers of
xfs_reserve_blocks to race at changing the pool size, subtracting blocks
from fdblocks, and actually putting it in the pool.  The result of all
this is that we can overfill the reserve pool to hilarious levels.

xfs_mod_fdblocks, when called with a positive value, already knows how
to take freed blocks and either fill the reserve until it's full, or put
them in fdblocks.  Use that instead of setting m_resblks_avail directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:39:02 -07:00
0baa2657dc xfs: always succeed at setting the reserve pool size
Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
with freed blocks before adding to fdblocks.  Therefore, we can change
the behavior of xfs_reserve_blocks slightly -- setting the target size
of the pool should always succeed, since a deficiency will eventually
be made up as blocks get freed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:38:56 -07:00
15f04fdc75 xfs: remove infinite loop when reserving free block pool
Infinite loops in kernel code are scary.  Calls to xfs_reserve_blocks
should be rare (people should just use the defaults!) so we really don't
need to try so hard.  Simplify the logic here by removing the infinite
loop.

Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:38:50 -07:00
c8c5682597 xfs: don't include bnobt blocks when reserving free block pool
xfs_reserve_blocks controls the size of the user-visible free space
reserve pool.  Given the difference between the current and requested
pool sizes, it will try to reserve free space from fdblocks.  However,
the amount requested from fdblocks is also constrained by the amount of
space that we think xfs_mod_fdblocks will give us.  If we forget to
subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will
return ENOSPC and we'll hang the kernel at mount due to the infinite
loop.

In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
out the "free space" used by the free space btrees, because some portion
of the free space btrees hold in reserve space for future btree
expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
of blocks that it could request from xfs_mod_fdblocks was not updated to
include m_allocbt_blks, so if space is extremely low, the caller hangs.

Fix this by creating a function to estimate the number of blocks that
can be reserved from fdblocks, which needs to exclude the set-aside and
m_allocbt_blks.

Found by running xfs/306 (which formats a single-AG 20MB filesystem)
with an fstests configuration that specifies a 1k blocksize and a
specially crafted log size that will consume 7/8 of the space (17920
blocks, specifically) in that AG.

Cc: Brian Foster <bfoster@redhat.com>
Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:38:43 -07:00
7c9d845f06 NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
In nfs4_callback_devicenotify(), if we don't find a matching entry for
the deviceid, we're left with a pointer to 'struct nfs_server' that
actually points to the list of super blocks associated with our struct
nfs_client.
Furthermore, even if we have a valid pointer, nothing pins the super
block, and so the struct nfs_server could end up getting freed while
we're using it.

Since all we want is a pointer to the struct pnfs_layoutdriver_type,
let's skip all the iteration over super blocks, and just use APIs to
find the layout driver directly.

Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Fixes: 1be5683b03a7 ("pnfs: CB_NOTIFY_DEVICEID")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-28 08:36:34 -04:00
5f7b839d47 SUNRPC: Return true/false (not 1/0) from bool functions
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions.  This fixes the following warnings from coccicheck:

./fs/nfsd/nfs2acl.c:289:9-10: WARNING: return of 0/1 in function
'nfsaclsvc_encode_accessres' with return type bool
./fs/nfsd/nfs2acl.c:252:9-10: WARNING: return of 0/1 in function
'nfsaclsvc_encode_getaclres' with return type bool

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-27 23:25:52 -04:00