Commit Graph

23558 Commits

Author SHA1 Message Date
Linus Torvalds
420c1c572d Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
  posix-clocks: Check write permissions in posix syscalls
  hrtimer: Remove empty hrtimer_init_hres_timer()
  hrtimer: Update hrtimer->state documentation
  hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
  timers: Export CLOCK_BOOTTIME via the posix timers interface
  timers: Add CLOCK_BOOTTIME hrtimer base
  time: Extend get_xtime_and_monotonic_offset() to also return sleep
  time: Introduce get_monotonic_boottime and ktime_get_boottime
  hrtimers: extend hrtimer base code to handle more then 2 clockids
  ntp: Remove redundant and incorrect parameter check
  mn10300: Switch do_timer() to xtimer_update()
  posix clocks: Introduce dynamic clocks
  posix-timers: Cleanup namespace
  posix-timers: Add support for fd based clocks
  x86: Add clock_adjtime for x86
  posix-timers: Introduce a syscall for clock tuning.
  time: Splitout compat timex accessors
  ntp: Add ADJ_SETOFFSET mode bit
  time: Introduce timekeeping_inject_offset
  posix-timer: Update comment
  ...

Fix up new system-call-related conflicts in
	arch/x86/ia32/ia32entry.S
	arch/x86/include/asm/unistd_32.h
	arch/x86/include/asm/unistd_64.h
	arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
	kernel/time.c
2011-03-15 18:53:35 -07:00
Linus Torvalds
9620639b7e Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
  sched: Resched proper CPU on yield_to()
  sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
  sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
  sched: Clean up the IRQ_TIME_ACCOUNTING code
  sched: Add #ifdef around irq time accounting functions
  sched, autogroup: Stop claiming ownership of the root task group
  sched, autogroup: Stop going ahead if autogroup is disabled
  sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
  sched: Fix the group_imb logic
  sched: Clean up some f_b_g() comments
  sched: Clean up remnants of sd_idle
  sched: Wholesale removal of sd_idle logic
  sched: Add yield_to(task, preempt) functionality
  sched: Use a buddy to implement yield_task_fair()
  sched: Limit the scope of clear_buddies
  sched: Check the right ->nr_running in yield_task_fair()
  sched: Avoid expensive initial update_cfs_load(), on UP too
  sched: Fix switch_from_fair()
  sched: Simplify the idle scheduling class
  softirqs: Account ksoftirqd time as cpustat softirq
  ...
2011-03-15 18:37:30 -07:00
Linus Torvalds
a926021cb1 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
  perf probe: Clean up probe_point_lazy_walker() return value
  tracing: Fix irqoff selftest expanding max buffer
  tracing: Align 4 byte ints together in struct tracer
  tracing: Export trace_set_clr_event()
  tracing: Explain about unstable clock on resume with ring buffer warning
  ftrace/graph: Trace function entry before updating index
  ftrace: Add .ref.text as one of the safe areas to trace
  tracing: Adjust conditional expression latency formatting.
  tracing: Fix event alignment: skb:kfree_skb
  tracing: Fix event alignment: mce:mce_record
  tracing: Fix event alignment: kvm:kvm_hv_hypercall
  tracing: Fix event alignment: module:module_request
  tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
  tracing: Remove lock_depth from event entry
  perf header: Stop using 'self'
  perf session: Use evlist/evsel for managing perf.data attributes
  perf top: Don't let events to eat up whole header line
  perf top: Fix events overflow in top command
  ring-buffer: Remove unused #include <linux/trace_irq.h>
  tracing: Add an 'overwrite' trace_option.
  ...
2011-03-15 18:31:30 -07:00
Linus Torvalds
0586bed3e8 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rtmutex: tester: Remove the remaining BKL leftovers
  lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause
  rtmutex: Simplify PI algorithm and make highest prio task get lock
  rwsem: Remove redundant asmregparm annotation
  rwsem: Move duplicate function prototypes to linux/rwsem.h
  rwsem: Unify the duplicate rwsem_is_locked() inlines
  rwsem: Move duplicate init macros and functions to linux/rwsem.h
  rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h
  x86: Cleanup rwsem_count_t typedef
  rwsem: Cleanup includes
  locking: Remove deprecated lock initializers
  cred: Replace deprecated spinlock initialization
  kthread: Replace deprecated spinlock initialization
  xtensa: Replace deprecated spinlock initialization
  um: Replace deprecated spinlock initialization
  sparc: Replace deprecated spinlock initialization
  mips: Replace deprecated spinlock initialization
  cris: Replace deprecated spinlock initialization
  alpha: Replace deprecated spinlock initialization
  rtmutex-tester: Remove BKL tests
2011-03-15 18:28:30 -07:00
Linus Torvalds
b80cd62b7d Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
  futex: Deobfuscate handle_futex_death()
  plist: Add priority list test
  plist: Shrink struct plist_head
  futex,plist: Remove debug lock assignment from plist_node
  futex,plist: Pass the real head of the priority list to plist_del()
  futex: Sanitize futex ops argument types
  futex: Sanitize cmpxchg_futex_value_locked API
  futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()
  futex: Avoid redudant evaluation of task_pid_vnr()
  futex: Update futex_wait_setup comments about locking
2011-03-15 18:23:52 -07:00
Linus Torvalds
c345f60a5f Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debugobjects: Add hint for better object identification
2011-03-15 18:23:25 -07:00
Linus Torvalds
422e6c4bc4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
  tidy the trailing symlinks traversal up
  Turn resolution of trailing symlinks iterative everywhere
  simplify link_path_walk() tail
  Make trailing symlink resolution in path_lookupat() iterative
  update nd->inode in __do_follow_link() instead of after do_follow_link()
  pull handling of one pathname component into a helper
  fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
  Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
  readlinkat(), fchownat() and fstatat() with empty relative pathnames
  Allow O_PATH for symlinks
  New kind of open files - "location only".
  ext4: Copy fs UUID to superblock
  ext3: Copy fs UUID to superblock.
  vfs: Export file system uuid via /proc/<pid>/mountinfo
  unistd.h: Add new syscalls numbers to asm-generic
  x86: Add new syscalls for x86_64
  x86: Add new syscalls for x86_32
  fs: Remove i_nlink check from file system link callback
  fs: Don't allow to create hardlink for deleted file
  vfs: Add open by file handle support
  ...
2011-03-15 15:48:13 -07:00
Linus Torvalds
397fae0818 Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well.
  xen: Use IRQF_FORCE_RESUME
  xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.
  xen: Fix compile error introduced by "switch to new irq_chip functions"
  xen: Switch to new irq_chip functions
  xen: Remove stale irq_chip.end
  xen: events: do not free legacy IRQs
  xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges.
  xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq
  xen:events: move find_unbound_irq inside CONFIG_PCI_MSI
  xen: handled remapped IRQs when enabling a pcifront PCI device.
  genirq: Add IRQF_FORCE_RESUME

* 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code.
  pci/xen: Cleanup: convert int** to int[]
  pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq
  xen-pcifront: Sanity check the MSI/MSI-X values
  xen-pcifront: don't use flush_scheduled_work()
2011-03-15 10:47:16 -07:00
Al Viro
1abf0c718f New kind of open files - "location only".
New flag for open(2) - O_PATH.  Semantics:
	* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
	* almost all operations on the resulting descriptors shall
fail with -EBADF.  Exceptions are:
	1) operations on descriptors themselves (i.e.
		close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
		fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
		fcntl(fd, F_SETFD, ...))
	2) fcntl(fd, F_GETFL), for a common non-destructive way to
		check if descriptor is open
	3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
		points of pathname resolution
	* closing such descriptor does *NOT* affect dnotify or
posix locks.
	* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself.  Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).

fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them.  That protects
existing code from dealing with those things.

There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
93f1c20bc8 vfs: Export file system uuid via /proc/<pid>/mountinfo
We add a per superblock uuid field. File systems should
update the uuid in the fill_super callback

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
becfd1f375 vfs: Add open by file handle support
[AV: duplicate of open() guts removed; file_open_root() used instead]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
990d6c2d7a vfs: Add name to file handle conversion support
The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:37 -04:00
Al Viro
f52e0c1130 New AT_... flag: AT_EMPTY_PATH
For name_to_handle_at(2) we'll want both ...at()-style syscall that
would be usable for non-directory descriptors (with empty relative
pathname).  Introduce new flag (AT_EMPTY_PATH) to deal with that and
corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
deal with the latter.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 19:12:20 -04:00
Linus Torvalds
5f40d42094 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: NFSROOT should default to "proto=udp"
  nfs4: remove duplicated #include
  NFSv4: nfs4_state_mark_reclaim_nograce() should be static
  NFSv4: Fix the setlk error handler
  NFSv4.1: Fix the handling of the SEQUENCE status bits
  NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
  NFSv4.1 reclaim complete must wait for completion
  NFSv4: remove duplicate clientid in struct nfs_client
  NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
  sunrpc: Propagate errors from xs_bind() through xs_create_sock()
  (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
  nfs: fix compilation warning
  nfs: add kmalloc return value check in decode_and_add_ds
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfs: close NFSv4 COMMIT vs. CLOSE race
  SUNRPC: Close a race in __rpc_wait_for_completion_task()
2011-03-14 11:19:50 -07:00
Aneesh Kumar K.V
5fe0c23788 exportfs: Return the minimum required handle size
The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Al Viro
c8b91accfa clean statfs-like syscalls up
New helpers: user_statfs() and fd_statfs(), taking userland pathname and
descriptor resp. and filling struct kstatfs.  Syscalls of statfs family
(native, compat and foreign - osf and hpux on alpha and parisc resp.)
switched to those.  Removes some boilerplate code, simplifies cleanup
on errors...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Al Viro
73d049a40f open-style analog of vfs_path_lookup()
new function: file_open_root(dentry, mnt, name, flags) opens the file
vfs_path_lookup would arrive to.

Note that name can be empty; in that case the usual requirement that
dentry should be a directory is lifted.

open-coded equivalents switched to it, may_open() got down exactly
one caller and became static.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Al Viro
5b6ca027d8 reduce vfs_path_lookup() to do_path_lookup()
New lookup flag: LOOKUP_ROOT.  nd->root is set (and held) by caller,
path_init() starts walking from that place and all pathname resolution
machinery never drops nd->root if that flag is set.  That turns
vfs_path_lookup() into a special case of do_path_lookup() *and*
gets us down to 3 callers of link_path_walk(), making it finally
feasible to rip the handling of trailing symlink out of link_path_walk().
That will not only simply the living hell out of it, but make life
much simpler for unionfs merge.  Trailing symlink handling will
become iterative, which is a good thing for stack footprint in
a lot of situations as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:27 -04:00
Al Viro
70e9b35711 get rid of nd->file
Don't stash the struct file * used as starting point of walk in nameidata;
pass file ** to path_init() instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
47c805dc2d switch do_filp_open() to struct open_flags
take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
16c2cd7179 untangle the "need_reval_dot" mess
instead of ad-hackery around need_reval_dot(), do the following:
set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
symlink traversal, on ".." and on procfs-style symlinks.  Clear on
normal components, leave unchanged on ".".  Non-nested callers of
link_path_walk() call handle_reval_path(), which checks that flag
is set and that fs does want the final revalidate thing, then does
->d_revalidate().  In link_path_walk() all the return_reval stuff
is gone.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
c9c6cac0c2 kill path_lookup()
all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:23 -04:00
Trond Myklebust
0400a6b0cb NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
nfs4_schedule_state_recovery() should only be used when we need to force
the state manager to check the lease. If we just want to start the
state manager in order to handle a state recovery situation, we should be
using nfs4_schedule_state_manager().

This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
its use with a set of helper functions that do the right thing.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:22 -05:00
Lai Jiangshan
bf6a9b8336 plist: Shrink struct plist_head
struct plist_head is used in struct task_struct as well as struct
rtmutex. If we can make it smaller, it will also make these structures
smaller as well.

The field prio_list in struct plist_head is seldom used and we can get
its information from the plist_nodes. Removing this field will decrease
the size of plist_head by half.

Signed-off-by:  Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4D107982.9090700@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-11 15:13:26 -05:00
Andy Adamson
114f64b5f2 NFSv4: remove duplicate clientid in struct nfs_client
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:05:00 -05:00
Trond Myklebust
bf294b41ce SUNRPC: Close a race in __rpc_wait_for_completion_task()
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:52 -05:00
Thomas Gleixner
53370d2e8c hrtimer: Update hrtimer->state documentation
We changed some of the state bits and combinations thereof over time,
but never updated the documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-10 19:15:58 +01:00
Steven Rostedt
e6e1e25935 tracing: Remove lock_depth from event entry
The lock_depth field in the event headers was added as a temporary
data point for help in removing the BKL. Now that the BKL is pretty
much been removed, we can remove this field.

This in turn changes the header from 12 bytes to 8 bytes,
removing the 4 byte buffer that gcc would insert if the first field
in the data load was 8 bytes in size.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:31:48 -05:00
Linus Torvalds
ab02a95405 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules
2011-03-09 16:45:02 -08:00
Stephen Rothwell
684adca4f8 sysctl: the include of rcupdate.h is only needed in the kernel
Fixes this build-check error:

  include/linux/sysctl.h:28: included file 'linux/rcupdate.h' is not exported

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-09 16:43:24 -08:00
Vasiliy Kulikov
8909c9ad8f net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules
Since a8f80e8ff9 any process with
CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.

This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases.  This fixes CVE-2011-1019.

Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".

Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	fffffff800001000
    CapEff:	fffffff800001000
    CapBnd:	fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0      Link encap:IPv6-in-IPv4
	      NOARP  MTU:1480  Metric:1

    root@albatros:~# lsmod | grep sit
    sit                    10457  0
    tunnel4                 2957  1 sit

For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	ffffffffffffffff
    CapEff:	ffffffffffffffff
    CapBnd:	ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs                   745319  0

Reference: https://lkml.org/lkml/2011/2/24/203

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-03-10 10:25:19 +11:00
Linus Torvalds
78833dd706 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  nd->inode is not set on the second attempt in path_walk()
  unfuck proc_sysctl ->d_compare()
  minimal fix for do_filp_open() race
2011-03-09 13:55:51 -08:00
David Sharp
750912fa36 tracing: Add an 'overwrite' trace_option.
Add an "overwrite" trace_option for ftrace to control whether the buffer should
be overwritten on overflow or not. The default remains to overwrite old events
when the buffer is full. This patch adds the option to instead discard newest
events when the buffer is full. This is useful to get a snapshot of traces just
after enabling traces. Dropping the current event is also a simpler code path.

Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-09 13:52:27 -05:00
Ingo Molnar
86cb2ec7b2 Merge commit 'v2.6.38-rc8' into perf/core
Merge reason: Merge latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-08 17:21:52 +01:00
Stanislaw Gruszka
9977728840 debugobjects: Add hint for better object identification
In complex subsystems like mac80211 structures can contain several
timers and work structs, so identifying a specific instance from the
call trace and object type output of debugobjects can be hard.

Allow the subsystems which support debugobjects to provide a hint
function. This function returns a pointer to a kernel address
(preferrably the objects callback function) which is printed along
with the debugobjects type.

Add hint methods for timer_list, work_struct and hrtimer.

[ tglx: Massaged changelog, made it compile ]

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <20110307085809.GA9334@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-08 16:10:38 +01:00
Al Viro
dfef6dcd35 unfuck proc_sysctl ->d_compare()
a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-08 02:22:27 -05:00
Linus Torvalds
fb62c00a6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: no .snap inside of snapped namespace
  libceph: fix msgr standby handling
  libceph: fix msgr keepalive flag
  libceph: fix msgr backoff
  libceph: retry after authorization failure
  libceph: fix handling of short returns from get_user_pages
  ceph: do not clear I_COMPLETE from d_release
  ceph: do not set I_COMPLETE
  Revert "ceph: keep reference to parent inode on ceph_dentry"
2011-03-05 10:43:22 -08:00
Andi Kleen
236344d6b4 mm: add alloc_page_vma_node()
Add a alloc_page_vma_node that allows passing the "local" node in.  Used
in a followon patch.

Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:53:39 -08:00
Andi Kleen
2f5f9486f8 mm: change alloc_pages_vma to pass down the policy node for local policy
Currently alloc_pages_vma() always uses the local node as policy node for
the LOCAL policy.  Pass this node down as an argument instead.

No behaviour change from this patch, but will be needed for followons.

Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:53:39 -08:00
Sage Weil
e76661d0a5 libceph: fix msgr keepalive flag
There was some broken keepalive code using a dead variable.  Shift to using
the proper bit flag.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:24:31 -08:00
Sage Weil
60bf8bf881 libceph: fix msgr backoff
With commit f363e45f we replaced a bunch of hacky workqueue mutual
exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
that the exponential backoff breaks in certain cases:

 * con_work attempts to connect.
 * we get an immediate failure, and the socket state change handler queues
   immediate work.
 * con_work calls con_fault, we decide to back off, but can't queue delayed
   work.

In this case, we add a BACKOFF bit to make con_work reschedule delayed work
next time it runs (which should be immediately).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:24:28 -08:00
Linus Torvalds
e3e89cc535 Mark ptrace_{traceme,attach,detach} static
They are only used inside kernel/ptrace.c, and have been for a long
time.  We don't want to go back to the bad-old-days when architectures
did things on their own, so make them static and private.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 09:23:30 -08:00
Andi Kleen
a7e3ed1e47 perf: Add support for supplementary event registers
Change logs against Andi's original version:

- Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra)
- Fixed a major event scheduling issue. There cannot be a ref++ on an
  event that has already done ref++ once and without calling
  put_constraint() in between. (Stephane Eranian)
- Use thread_cpumask for percore allocation. (Lin Ming)
- Use MSR names in the extra reg lists. (Lin Ming)
- Remove redundant "c = NULL" in intel_percore_constraints
- Fix comment of perf_event_attr::config1

Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event
that can be used to monitor any offcore accesses from a core.
This is a very useful event for various tunings, and it's
also needed to implement the generic LLC-* events correctly.

Unfortunately this event requires programming a mask in a separate
register. And worse this separate register is per core, not per
CPU thread.

This patch:

- Teaches perf_events that OFFCORE_RESPONSE needs extra parameters.
  The extra parameters are passed by user space in the
  perf_event_attr::config1 field.

- Adds support to the Intel perf_event core to schedule per
  core resources. This adds fairly generic infrastructure that
  can be also used for other per core resources.
  The basic code has is patterned after the similar AMD northbridge
  constraints code.

Thanks to Stephane Eranian who pointed out some problems
in the original version and suggested improvements.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04 11:32:53 +01:00
Ingo Molnar
e0a92c1747 Merge branch 'sched/urgent' into sched/core
Merge reason: Add fixes before applying dependent patches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04 11:12:26 +01:00
Ingo Molnar
888a8a3e9d Merge branch 'perf/urgent' into perf/core
Merge reason: Pick up updates before queueing up dependent patches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04 10:40:25 +01:00
Linus Torvalds
fb4b10ab5f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: kill loop_mutex
  blktrace: Remove blk_fill_rwbs_rq.
  block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()
  block: add @force_kblockd to __blk_run_queue()
  block: fix kernel-doc format for blkdev_issue_zeroout
  blk-throttle: Do not use kblockd workqueue for throtl work
2011-03-03 15:42:35 -08:00
Tao Ma
2d3a8497f8 blktrace: Remove blk_fill_rwbs_rq.
If we enable trace events to trace block actions, We use
blk_fill_rwbs_rq to analyze the corresponding actions
in request's cmd_flags, but we only choose the minor 2 bits
from it, so most of other flags(e.g, REQ_SYNC) are missing.
For example, with a sync write we get:
write_test-2409  [001]   160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
8 [write_test]

Since now we have integrated the flags of both bio and request,
it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
blk_fill_rwbs_rq isn't needed any more.

With this patch, after a sync write we get:
write_test-2417  [000]   226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
 8 [write_test]

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-03 10:53:20 -05:00
Frederic Weisbecker
c09d7a3d2e Merge branch '/tip/perf/filter' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git into perf/core 2011-03-03 04:29:25 +01:00
Tejun Heo
1654e7411a block: add @force_kblockd to __blk_run_queue()
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd.  Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
        blk-flush implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02 08:48:05 -05:00
Mark Brown
77bd70e900 mfd: Don't suspend WM8994 if the CODEC is not suspended
ASoC supports keeping the audio subsysetm active over suspend in order
to support use cases such as audio passthrough from a cellular modem
with the main CPU suspended. Ensure that we don't power down the CODEC
when this is happening by checking to see if VMID is up and skipping
suspend and resume when it is. If the CODEC has suspended then it'll
turn VMID off before the core suspend() gets called.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-02 10:57:50 +01:00