23277 Commits

Author SHA1 Message Date
e038dca803 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus
Conflicts:
	fs/btrfs/transaction.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 14:16:13 -04:00
01eff85b09 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: make log devices with write back caches work
  xfs: fix ->mknod() return value on xfs_get_acl() failure
2011-06-17 10:37:41 -07:00
7585717f30 Btrfs: fix relocation races
The recent commit to get rid of our trans_mutex introduced
some races with block group relocation.  The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations.  The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 13:36:58 -04:00
879669961b KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring
____call_usermodehelper() now erases any credentials set by the
subprocess_inf::init() function.  The problem is that commit
17f60a7da150 ("capabilites: allow the application of capability limits
to usermode helpers") creates and commits new credentials with
prepare_kernel_cred() after the call to the init() function.  This wipes
all keyrings after umh_keys_init() is called.

The best way to deal with this is to put the init() call just prior to
the commit_creds() call, and pass the cred pointer to init().  That
means that umh_keys_init() and suchlike can modify the credentials
_before_ they are published and potentially in use by the rest of the
system.

This prevents request_key() from working as it is prevented from passing
the session keyring it set up with the authorisation token to
/sbin/request-key, and so the latter can't assume the authority to
instantiate the key.  This causes the in-kernel DNS resolver to fail
with ENOKEY unconditionally.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-17 09:40:48 -07:00
8b97b21e0f Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
  proc: Fix Oops on stat of /proc/<zombie pid>/ns/net
2011-06-16 15:02:20 -07:00
ee7b75fc4f NFSv4: Fix a readdir regression
Commit 7ebb9315 (NFS: use secinfo when crossing mountpoints) introduces
a regression when decoding an NFSv4 readdir entry that sets the
rdattr_error field.
By treating the resulting value as if it is a decoding error, the current
code may cause us to skip valid readdir entries.

Reported-by: Andy Adamson <andros@netapp.com>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-16 13:24:47 -04:00
8dac6bee32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  AFS: Use i_generation not i_version for the vnode uniquifier
  AFS: Set s_id in the superblock to the volume name
  vfs: Fix data corruption after failed write in __block_write_begin()
  afs: afs_fill_page reads too much, or wrong data
  VFS: Fix vfsmount overput on simultaneous automount
  fix wrong iput on d_inode introduced by e6bc45d65d
  Delay struct net freeing while there's a sysfs instance refering to it
  afs: fix sget() races, close leak on umount
  ubifs: fix sget races
  ubifs: split allocation of ubifs_info into a separate function
  fix leak in proc_set_super()
2011-06-16 10:21:59 -07:00
a27a263bae xfs: make log devices with write back caches work
There's no reason not to support cache flushing on external log devices.
The only thing this really requires is flushing the data device first
both in fsync and log commits.  A side effect is that we also have to
remove the barrier write test during mount, which has been superflous
since the new FLUSH+FUA code anyway.  Also use the chance to flush the
RT subvolume write cache before the fsync commit, which is required
for correct semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-16 10:52:39 -05:00
d6e43f751f AFS: Use i_generation not i_version for the vnode uniquifier
Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version
number.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:48 -04:00
2e41ae225f AFS: Set s_id in the superblock to the volume name
Set s_id in the superblock to the name of the AFS volume that this superblock
corresponds to.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:47 -04:00
f9f07b6c13 vfs: Fix data corruption after failed write in __block_write_begin()
I've got a report of a file corruption from fsxlinux on ext3. The important
operations to the page were:
mapwrite to a hole
partial write to the page
read - found the page zeroed from the end of the normal write

The culprit seems to be that if get_block() fails in __block_write_begin()
(e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
of the page is needed and overwrites old data.  In fact, I don't see why we
should ever need to zero the uptodate bit here - either the page was uptodate
when we entered __block_write_begin() and it should stay so when we leave it,
or it was not uptodate and noone had right to set it uptodate during
__block_write_begin() so it remains !uptodate when we leave as well. So just
remove clearing of the bit.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:46 -04:00
5e7f23373b afs: afs_fill_page reads too much, or wrong data
afs_fill_page should read the page that is about to be written but
the current implementation has a number of issues. If we aren't
extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
are extending the file we try to read the entire file.

Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
clamped to i_size.

While here, avoid calling afs_fill_page when we are doing a
PAGE_CACHE_SIZE write.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:46 -04:00
8aef188452 VFS: Fix vfsmount overput on simultaneous automount
[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:28:16 -04:00
50338b889d fix wrong iput on d_inode introduced by e6bc45d65d
Git bisection shows that commit e6bc45d65df8599fdbae73be9cec4ceed274db53 causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
[ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
[ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
[ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
[ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
[ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
	struct inode *inode = NULL;
.....
-               if (nd.last.name[nd.last.len])
-                       goto slashes;
                inode = dentry->d_inode;
-               if (inode)
-                       ihold(inode);
+               if (nd.last.name[nd.last.len] || !inode)
+                       goto slashes;
+               ihold(inode)
...
	if (inode)
		iput(inode);	/* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50
Reported-by: Norbert Preining <preining@logic.at>
Reported-by: Török Edwin <edwintorok@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:27:39 -04:00
13fca640bb Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP"
This reverts commit 7f81c8890c15a10f5220bebae3b6dfae4961962a.

It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.

The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have

  arch/um/sys-x86_64/shared/sysdep/vm-flags.h:

	#define VM_STACK_DEFAULT_FLAGS \
		(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)

	#define VM_STACK_DEFAULT_FLAGS vm_stack_flags

(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")

It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 21:53:52 -07:00
7f81c8890c fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP
Commit a8bef8ff6ea1 ("mm: migration: avoid race between shift_arg_pages()
and rmap_walk() during migration by not migrating temporary stacks")
introduced a BUG_ON() to ensure that VM_STACK_FLAGS and
VM_STACK_INCOMPLETE_SETUP do not overlap.  The check is a compile time
one, so BUILD_BUG_ON is more appropriate.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:03:59 -07:00
793925334f proc: Fix Oops on stat of /proc/<zombie pid>/ns/net
Don't call iput with the inode half setup to be a namespace filedescriptor.
Instead rearrange the code so that we don't initialize ei->ns_ops until
after I ns_ops->get succeeds, preventing us from invoking ns_ops->put
when ns_ops->get failed.

Reported-by: Ingo Saitz <Ingo.Saitz@stud.uni-hannover.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-06-15 14:35:29 -07:00
9e2dfdb308 nfs4.1: mark layout as bad on error path in _pnfs_return_layout
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 14:36:33 -04:00
ed0ca14021 Btrfs: set no_trans_join after trying to expand the transaction
We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot.  This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set.  So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13.  It cannot be reproduced with this patch.  Thanks,

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15 13:24:47 -04:00
8351583e3f Btrfs: protect the pending_snapshots list with trans_lock
Currently there is nothing protecting the pending_snapshots list on the
transaction.  We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption.  So protect
this list with the trans_lock.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15 13:24:46 -04:00
71d7aed014 Btrfs: fix path leakage on subvol deletion
The delayed ref patch accidently removed the btrfs_free_path in
btrfs_unlink_subvol, this puts it back and means we don't leak a path.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15 13:24:45 -04:00
ea0ded748b nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
mark_matching_lsegs_invalid could put the last ref to the layout, so
the get_layout_hdr needs to be called first.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 12:39:23 -04:00
1ed3a8539a NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
We always get a reference on the layout header and we rely on
nfs4_layoutreturn_release to put it.  If we hit an allocation error
before starting the rpc proc we bail out early without dereferncing
the layout header properly.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:52:15 -04:00
c7fd06228b NFS: (d)printks should use %zd for ssize_t arguments
(d)printks should use %zd for ssize_t arguments not %ld, otherwise they might
get a warning.  I see the following with MN10300.

fs/nfs/objlayout/objlayout.c: In function 'objlayout_read_done':
fs/nfs/objlayout/objlayout.c:294: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:31 -04:00
d771e3a43e NFSv4.1: fix break condition in pnfs_find_lseg
The break condition to skip out of the loop got broken when cmp_layout
was change.  Essentially, we want to stop looking once we know no layout
on the remainder of the list can match the first byte of the looked-up
range.

Reported-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:31 -04:00
a2e1d4f2e5 nfs4.1: fix several problems with _pnfs_return_layout
_pnfs_return_layout had the following problems:

- it did not call pnfs_free_lseg_list on all paths
- it unintentionally did a forgetful return when there was no outstanding io
- it raced with concurrent LAYOUTGETS

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:30 -04:00
cec765cf58 NFSv4.1: allow zero fh array in filelayout decode layout
Signed-off-by: Andy Adamson <andros@netapp.com>
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:30 -04:00
533eb4611c NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
Commit 28331a46d88459788c8fca72dbb0415cd7f514c9 "Ensure we request the
ordinary fileid when doing readdirplus"
changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when
FATTR4_WORD1_MOUNTED_ON_FILED was requested.

Allow nfs_fhget to succeed with only a mounted on fileid when crossing
a mountpoint or a referral.

Ask for the fileid of the absent file system if mounted_on_fileid is not
supported.

Signed-off-by: Andy Adamson <andros@netapp.com>
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:29 -04:00
1d92a08da2 NFSv4.1: Fix a refcounting issue in the pNFS device id cache
When we add something to the global device id cache, we need to bump the
reference count, so that the cache itself holds a reference.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:29 -04:00
c9c30dd5f7 NFSv4.1: deprecate headerpadsz in CREATE_SESSION
We don't support header padding yet so better off ditching it

Reported-by: Sid Moore <learnmost@gmail.com>
Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:28 -04:00
0f66b5984d NFS41: do not update isize if inode needs layoutcommit
nfs_update_inode will update isize if there is no queued pages. For pNFS,
layoutcommit is supposed to change file size on server, the same effect as queued
pages. nfs_update_inode may be called when dirty pages are written back (nfsi->npages==0)
but layoutcommit is not sent, and it will change client file size according to server
file size. Then client ends up losing what it just writes back in pNFS path.
So we should skip updating client file size if file needs layoutcommit.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Cc: stable@kernel.org   [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:28 -04:00
0b760113a3 NLM: Don't hang forever on NLM unlock requests
If the NLM daemon is killed on the NFS server, we can currently end up
hanging forever on an 'unlock' request, instead of aborting. Basically,
if the rpcbind request fails, or the server keeps returning garbage, we
really want to quit instead of retrying.

Tested-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-06-15 11:24:27 -04:00
9e3bd4e24e NFS: fix umount of pnfs filesystems
Unmounting a pnfs filesystem hangs using filelayout and possibly others.
This fixes the use of the rcu protected node by making use of a new 'tmpnode'
for the temporary purge list. Also, the spinlock shouldn't be held when calling
synchronize_rcu().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:23:02 -04:00
1252b3013b [CIFS] update cifs version to 1.73
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-14 16:19:54 +00:00
c46a131c0c xfs: fix ->mknod() return value on xfs_get_acl() failure
->mknod() should return negative on errors and PTR_ERR() gives
already negative value...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-14 11:02:13 -05:00
040d15c867 [CIFS] trivial cleanup fscache cFYI and cERROR messages
... for uniformity and cleaner debug logs.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-14 15:51:18 +00:00
1123d93963 timerfd: Fix wakeup of processes when timer is cancelled on clock change
Currently processes waiting with poll on cancelable timerfd timers are
not woken up when the timers are canceled. When the system time is set
the clock_was_set() function calls timerfd_clock_was_set() to cancel
and wake up processes waiting on potential cancelable timerfd
timers. However the wake up currently has no effect because in the
case of timerfd_read it is dependent on ctx->ticks not being
0. timerfd_poll also requires ctx->ticks being non zero. As a
consequence processes waiting on cancelable timers only get woken up
when the timers expire. This patch fixes this by incrementing
ctx->ticks before calling wake_up.

Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com>
Cc: kay.sievers@vrfy.org
Cc: virtuoso@slind.org
Cc: johnstul <johnstul@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1307985512.4710.41.camel@w-amax.beaverton.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-06-14 11:46:14 +02:00
d7f124f129 ceph: fix sync and dio writes across stripe boundaries
We were iterating across stripe boundaries properly, but not moving the
write buffer pointer forward.  This caused us to rewrite the same data
after the break.  Fix by adjusting the data pointer forward, and
recalculating the io and buffer alignment after the break.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-13 16:26:22 -07:00
773e9b4426 ceph: fix page alignment corrections
dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1
 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1

Reported-by: Henry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-13 16:26:10 -07:00
8d1bca328b cifs: correctly handle NULL tcon pointer in CIFSTCon
Long ago (in commit 00e485b0), I added some code to handle share-level
passwords in CIFSTCon. That code ignored the fact that it's legit to
pass in a NULL tcon pointer when connecting to the IPC$ share on the
server.

This wasn't really a problem until recently as we only called CIFSTCon
this way when the server returned -EREMOTE. With the introduction of
commit c1508ca2 however, it gets called this way on every mount, causing
an oops when share-level security is in effect.

Fix this by simply treating a NULL tcon pointer as if user-level
security were in effect. I'm not aware of any servers that protect the
IPC$ share with a specific password anyway. Also, add a comment to the
top of CIFSTCon to ensure that we don't make the same mistake again.

Cc: <stable@kernel.org>
Reported-by: Martijn Uffing <mp3project@sarijopen.student.utwente.nl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:34 +00:00
3e71551364 cifs: show sec= option in /proc/mounts
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:34 +00:00
7fdbaa1b8d cifs: don't allow cifs_reconnect to exit with NULL socket pointer
It's possible for the following set of events to happen:

cifsd calls cifs_reconnect which reconnects the socket. A userspace
process then calls cifs_negotiate_protocol to handle the NEGOTIATE and
gets a reply. But, while processing the reply, cifsd calls
cifs_reconnect again.  Eventually the GlobalMid_Lock is dropped and the
reply from the earlier NEGOTIATE completes and the tcpStatus is set to
CifsGood. cifs_reconnect then goes through and closes the socket and sets the
pointer to zero, but because the status is now CifsGood, the new socket
is not created and cifs_reconnect exits with the socket pointer set to
NULL.

Fix this by only setting the tcpStatus to CifsGood if the tcpStatus is
CifsNeedNegotiate, and by making sure that generic_ip_connect is always
called at least once in cifs_reconnect.

Note that this is not a perfect fix for this issue. It's still possible
that the NEGOTIATE reply is handled after the socket has been closed and
reconnected. In that case, the socket state will look correct but it no
NEGOTIATE was performed on it be for the wrong socket. In that situation
though the server should just shut down the socket on the next attempted
send, rather than causing the oops that occurs today.

Cc: <stable@kernel.org> # .38.x: fd88ce9: [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
Reported-and-Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:33 +00:00
cd51875d53 CIFS: Fix sparse error
cifs_sb_master_tlink was declared as inline, but without a definition.
Remove the declaration and move the definition up.

Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:33 +00:00
de1b794130 jbd2: Fix oops in jbd2_journal_remove_journal_head()
jbd2_journal_remove_journal_head() can oops when trying to access
journal_head returned by bh2jh(). This is caused for example by the
following race:

	TASK1					TASK2
  jbd2_journal_commit_transaction()
    ...
    processing t_forget list
      __jbd2_journal_refile_buffer(jh);
      if (!jh->b_transaction) {
        jbd_unlock_bh_state(bh);
					jbd2_journal_try_to_free_buffers()
					  jbd2_journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  jbd2_journal_put_journal_head(jh)
        jbd2_journal_remove_journal_head(bh);

jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
buffer is not part of any transaction and thus frees journal_head
before TASK1 gets to doing so. Note that even buffer_head can be
released by try_to_free_buffers() after
jbd2_journal_put_journal_head() which adds even larger opportunity for
oops (but I didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head
reference (in b_jcount). That way we don't have to remove journal_head
explicitely via jbd2_journal_remove_journal_head() and instead just
remove journal_head when b_jcount drops to zero. The result of this is
that [__]jbd2_journal_refile_buffer(),
[__]jbd2_journal_unfile_buffer(), and
__jdb2_journal_remove_checkpoint() can free journal_head which needs
modification of a few callers. Also we have to be careful because once
journal_head is removed, buffer_head might be freed as well. So we
have to get our own buffer_head reference where it matters.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-13 15:38:22 -04:00
ffdb8f1bfb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: unwind canceled flock state
  ceph: fix ENOENT logic in striped_read
  ceph: fix short sync reads from the OSD
  ceph: fix sync vs canceled write
  ceph: use ihold when we already have an inode ref
2011-06-13 11:21:50 -07:00
c78a9b9b8e Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: Revert 8ab2b7efd ftrace: Remove unnecessary disabling of irqs
  kprobes/trace: Fix kprobe selftest for gcc 4.6
  ftrace: Fix possible undefined return code
  oprofile, dcookies: Fix possible circular locking dependency
  oprofile: Fix locking dependency in sync_start()
  oprofile: Free potentially owned tasks in case of errors
  oprofile, x86: Add comments to IBS LVT offset initialization
2011-06-13 10:45:49 -07:00
33a538833f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix problem in setting checkpoint interval
  nilfs2: fix missing block address termination in btree node shrinking
  nilfs2: fix incorrect block address termination in node concatenation
2011-06-13 10:32:24 -07:00
f4c4401621 Btrfs: drop the delalloc_bytes check in shrink_delalloc
Even when delalloc_bytes is zero, we may need to sleep while waiting
for delalloc space.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-13 11:30:47 -04:00
ac08aedfa5 Btrfs: check the return value from set_anon_super
Al Viro noticed we weren't checking for set_anon_super failures.  This
adds the required checks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-13 11:28:50 -04:00
d4c208b86b block: use the passed in @bdev when claiming if partno is zero
6b4517a791 (block: implement bd_claiming and claiming block)
introduced claiming block to support O_EXCL blkdev opens properly.

bd_start_claiming() looks up the part 0 bdev and starts claiming
block.  The function assumed that there is only one part 0 bdev and
always used bdget_disk(disk, 0) to look it up; unfortunately, this
isn't true for some drivers (floppy) which use multiple block devices
to denote different operating parameters for the same physical device.
There can be multiple part 0 bdev's for the same device number.

This incorrect assumption caused the wrong bdev to be used during
claiming leading to unbalanced bd_holders as reported in the following
bug.

  https://bugzilla.kernel.org/show_bug.cgi?id=28522

This patch updates bd_start_claiming() such that it uses the bdev
specified as argument if its partno is zero.

Note that this means that different bdev's can be used for the same
device and O_EXCL check can be effectively bypassed.  It has always
been broken that way and floppy is fortunately on its way out.  Leave
that breakage alone.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
Tested-by: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
Cc: stable@kernel.org	# >= v2.6.36
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-06-13 12:45:48 +02:00