13495 Commits

Author SHA1 Message Date
916f676f8d x86, efi: Retain boot service code until after switching to virtual mode
UEFI stands for "Unified Extensible Firmware Interface", where "Firmware"
is an ancient African word meaning "Why do something right when you can
do it so wrong that children will weep and brave adults will cower before
you", and "UEI" is Celtic for "We missed DOS so we burned it into your
ROMs". The UEFI specification provides for runtime services (ie, another
way for the operating system to be forced to depend on the firmware) and
we rely on these for certain trivial tasks such as setting up the
bootloader. But some hardware fails to work if we attempt to use these
runtime services from physical mode, and so we have to switch into virtual
mode. So far so dreadful.

The specification makes it clear that the operating system is free to do
whatever it wants with boot services code after ExitBootServices() has been
called. SetVirtualAddressMap() can't be called until ExitBootServices() has
been. So, obviously, a whole bunch of EFI implementations call into boot
services code when we do that. Since we've been charmingly naive and
trusted that the specification may be somehow relevant to the real world,
we've already stuffed a picture of a penguin or something in that address
space. And just to make things more entertaining, we've also marked it
non-executable.

This patch allocates the boot services regions during EFI init and makes
sure that they're executable. Then, after SetVirtualAddressMap(), it
discards them and everyone lives happily ever after. Except for the ones
who have to work on EFI, who live sad lives haunted by the knowledge that
someone's eventually going to write yet another firmware specification.

[ hpa: adding this to urgent with a stable tag since it fixes currently-broken
  hardware.  However, I do not know what the dependencies are and so I do
  not know which -stable versions this may be a candidate for. ]

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1306331593-28715-1-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@kernel.org>
2011-05-25 17:03:53 -07:00
0d098a7d1e x86/ftrace: Fix compiler warning in ftrace.c
Due to commit dc326fca2b64 (x86, cpu: Clean up and unify the NOP selection infrastructure), we get the following warning:

arch/x86/kernel/ftrace.c: In function ‘ftrace_make_nop’:
arch/x86/kernel/ftrace.c:308:6: warning: assignment discards qualifiers from pointer target type
arch/x86/kernel/ftrace.c: In function ‘ftrace_make_call’:
arch/x86/kernel/ftrace.c:318:6: warning: assignment discards qualifiers from pointer target type

ftrace_nop_replace() now returns const unsigned char *, so change its associated function/variable to its compatible type to keep compiler clam.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Link: http://lkml.kernel.org/r/1305221620.7986.4.camel@localhost.localdomain

[ updated for change of const void *src in probe_kernel_write() ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-25 19:56:26 -04:00
0c63e38a12 Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: New driver for the SMSC EMC6W201
  hwmon: (abituguru) Depend on DMI
  hwmon: (it87) Use request_muxed_region
  hwmon: (sch5627) Trigger Vbat measurements
  hwmon: (sch5627) Add sch5627_send_cmd function
  i8k: Integrate with the hwmon subsystem
  hwmon: (max6650) Properly support the MAX6650
  hwmon: (max6650) Drop device detection
  Move ACPI power meter driver to hwmon
  hwmon: (f71882fg) Add support for F71808A
  hwmon: (f71882fg) Split has_beep in fan_has_beep and temp_has_beep
  hwmon: (asc7621) Drop duplicate dependency
  hwmon: (jc42) Change detection class
  hwmon: Add driver for AMD family 15h processor power information
  hwmon: (k10temp) Add support for Fam15h (Bulldozer)
  hwmon: Use helper functions to set and get driver data
  i8k: Avoid lahf in 64-bit code
2011-05-25 16:52:50 -07:00
8b27f2ff7a x86: Remove unnecessary check in detect_ht()
This patch removes a check that causes incorrect scheduler
domain setup (SMP instead of SMT) and bootlog warning messages
when cpuid extensions for topology enumeration are not supported
and the number of processors reported to the OS is smaller than
smp_num_siblings.

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Nikhil P Rao <nikhil.rao@intel.com>
Link: http://lkml.kernel.org/r/1306343921.19325.1.camel@fedora13
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-25 23:01:08 +02:00
949a9d7002 i8k: Integrate with the hwmon subsystem
Let i8k create an hwmon class device so that libsensors will expose
the CPU temperature and fan speeds to monitoring applications.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Massimo Dal Zotto <dz@debian.org>
2011-05-25 20:43:33 +02:00
5ca43f6c3b lib: consolidate DEBUG_STACK_USAGE option
Most arches define CONFIG_DEBUG_STACK_USAGE exactly the same way.  Move it
to lib/Kconfig.debug so each arch doesn't have to define it.  This
obviously makes the option generic, but that's fine because the config is
already used in generic code.

It's not obvious to me that sysrq-P actually does anything caution by
keeping the most inclusive wording.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:54 -07:00
44ec7abe35 lib: consolidate DEBUG_PER_CPU_MAPS
DEBUG_PER_CPU_MAPS is used in lib/cpumask.c as well as in
inlcude/linux/cpumask.h and thus it has outgrown its use within x86 and
powerpc alone.  Any arch with SMP support may want to get some more
debugging, so make this option generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:53 -07:00
162a7e7500 printk: allocate kernel log buffer earlier
On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
the static log buffer overflows before the larger one specified by the
log_buf_len param is allocated.  Minimize the overflow by allocating the
new log buffer as soon as possible.

On kernels without memblock, a later call to setup_log_buf from
kernel/init.c is the fallback.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:48 -07:00
dbee8a0aff x86: remove 32-bit versions of readq()/writeq()
The presense of a writeq() implementation on 32-bit x86 that splits the
64-bit write into two 32-bit writes turns out to break the mpt2sas driver
(and in general is risky for drivers as was discussed in
<http://lkml.kernel.org/r/adaab6c1h7c.fsf@cisco.com>).  To fix this,
revert 2c5643b1c5c7 ("x86: provide readq()/writeq() on 32-bit too") and
follow-on cleanups.

This unfortunately leads to pushing non-atomic definitions of readq() and
write() to various x86-only drivers that in the meantime started using the
definitions in the x86 version of <asm/io.h>.  However as discussed
exhaustively, this is actually the right thing to do, because the right
way to split a 64-bit transaction is hardware dependent and therefore
belongs in the hardware driver (eg mpt2sas needs a spinlock to make sure
no other accesses occur in between the two halves of the access).

Build tested on 32- and 64-bit x86 allmodconfig.

Link: http://lkml.kernel.org/r/x86-32-writeq-is-broken@mdm.bga.com
Acked-by: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Acked-by: James Bottomley <James.Bottomley@parallels.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:44 -07:00
1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
de03c72cfc mm: convert mm->cpu_vm_cpumask into cpumask_var_t
cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
It might lead to reduce cache hit ratio.

This patch has two change.
1) Move the place of cpumask into last of mm_struct. Because usually cpumask
   is accessed only front bits when the system has cpu-hotplug capability
2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
   footprint if cpumask_size() will use nr_cpumask_bits properly in future.

In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
It may help to detect out of tree cpu_vm_mask users.

This patch has no functional change.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:21 -07:00
3d48ae45e7 mm: Convert i_mmap_lock to a mutex
Straightforward conversion of i_mmap_lock to a mutex.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:18 -07:00
1c39517696 mm: now that all old mmu_gather code is gone, remove the storage
Fold all the mmu_gather rework patches into one for submission

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:16 -07:00
37b23e0525 x86,mm: make pagefault killable
When an oom killing occurs, almost all processes are getting stuck at the
following two points.

	1) __alloc_pages_nodemask
	2) __lock_page_or_retry

1) is not very problematic because TIF_MEMDIE leads to an allocation
failure and getting out from page allocator.

2) is more problematic.  In an OOM situation, zones typically don't have
page cache at all and memory starvation might lead to greatly reduced IO
performance.  When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly,
meaning that a fork bomb may create new process quickly rather than the
oom-killer killing it.  Then, the system may become livelocked.

This patch makes the pagefault interruptible by SIGKILL.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:08 -07:00
af6a25f0e1 x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct
Reorder mm_context_t to remove alignment padding on 64 bit
builds shrinking its size from 64 to 56 bytes.

This allows mm_struct to shrink from 840 to 832 bytes, so using
one fewer cache lines, and getting more objects per slab when
using slub.

slabinfo mm_struct reports
before :-

    Sizes (bytes)     Slabs
    -----------------------------------
    Object :     840  Total  :       7
    SlabObj:     896  Full   :       1
    SlabSiz:   16384  Partial:       4
    Loss   :      56  CpuSlab:       2
    Align  :      64  Objects:      18

after :-

    Sizes (bytes)     Slabs
    ----------------------------------
    Object :     832  Total  :       7
    SlabObj:     832  Full   :       1
    SlabSiz:   16384  Partial:       4
    Loss   :       0  CpuSlab:       2
    Align  :      64  Objects:      19

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: wilsons@start.ca
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Link: http://lkml.kernel.org/r/1306244999.1999.5.camel@castor.rsk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-25 16:16:41 +02:00
f073cc8f39 x86, UV: Clean up uv_tlb.c
SGI UV's uv_tlb.c driver has become rather hard to read, with overly large
functions, non-standard coding style and (way) too long variable, constant
and function names and non-obvious code flow sequences.

This patch improves the readability and maintainability of the driver
significantly, by doing the following strict code cleanups with no side
effects:

 - Split long functions into shorter logical functions.

 - Shortened some variable and structure member names.

 - Added special functions for reads and writes of MMR regs with
   very long names.

 - Added the 'tunables' table to shortened tunables_write().

 - Added the 'stat_description' table to shorten uv_ptc_proc_write().

 - Pass fewer 'stat' arguments where it can be derived from the 'bcp'
   argument.

 - Function definitions consistent on one line, and inline in few (short) cases.

 - Moved some small structures and an atomic inline function to the header file.

 - Moved some local variables to the blocks where they are used.

 - Updated the copyright date.

 - Shortened uv_write_global_mmr64() etc. using some aliasing; no
   line breaks. Renamed many uv_.. functions that are not exported.

 - Aligned structure fields.
    [ note that not all structures are aligned the same way though; I'd like
      to keep the extensive commenting in some of them. ]

 - Shortened some long structure names.

 - Standard pass/fail exit from init_per_cpu()

 - Vertical alignment for mass initializations.

 - More separation between blocks of code.

Tested on a 16-processor Altix UV.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: penberg@kernel.org
Link: http://lkml.kernel.org/r/E1QOw12-0004MN-Lp@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-25 14:20:14 +02:00
2a919596c1 x86, UV: Add support for SGI UV2 hub chip
This patch adds support for a new version of the SGI UV hub
chip. The hub chip is the node controller that connects multiple
blades into a larger coherent SSI.

For the most part, UV2 is compatible with UV1. The majority of
the changes are in the addresses of MMRs and in a few cases, the
contents of MMRs. These changes are the result in changes in the
system topology such as node configuration, processor types,
maximum nodes, physical address sizes, etc.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/20110511175028.GA18006@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-25 14:20:13 +02:00
7ccafc5f75 x86, cpufeature: Update CPU feature RDRND to RDRAND
The Intel manual changed the name of the CPUID bit to match the
instruction name. We should follow suit for sanity's sake. (See Intel SDM
Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".)

[ hpa: we can only do this at this time because there are currently no CPUs
  with this feature on the market, hence this is pre-hardware enabling.
  However, Cc:'ing stable so that stable can present a consistent ABI. ]

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110524232926.GA27728@outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@kernel.org> v2.6.36-39
2011-05-24 16:37:27 -07:00
354258011e PM / Hibernate: Remove arch_prepare_suspend()
All architectures supporting hibernation define
arch_prepare_suspend() as an empty function, so remove it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-24 23:35:55 +02:00
5129df03d0 Merge branch 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: Unify input section names
  percpu: Avoid extra NOP in percpu_cmpxchg16b_double
  percpu: Cast away printk format warning
  percpu: Always align percpu output section to PAGE_SIZE

Fix up fairly trivial conflict in arch/x86/include/asm/percpu.h as per Tejun
2011-05-24 11:53:42 -07:00
2f344d2e51 x86, ioapic: Restore ioapic entries during resume properly
In mask/restore_ioapic_entries() we should be restoring ioapic
entries when ioapics[apic].saved_registers is not NULL.

Fix the typo and address the resume hang regression reported by
Linus.

This was not found sooner because the systems where these
changes were tested on kept the IO-APIC entries intact over
resume.

Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Daniel J Blueman <daniel.blueman@gmail.com>
Link: http://lkml.kernel.org/r/1306259131.7171.7.camel@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-24 20:32:41 +02:00
c4dbe54ed7 seqlock: Get rid of SEQLOCK_UNLOCKED
All static seqlock should be initialized with the lockdep friendly
__SEQLOCK_UNLOCKED() macro.

Remove legacy SEQLOCK_UNLOCKED() macro.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 15:22:17 +02:00
973aa8181e x86-64: Optimize vDSO time()
This function just reads a 64-bit variable that's updated
atomically, so we don't need any locks.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C40e2700f8cda4d511e5910be1e633025d28b36c2.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:29 +02:00
f144a6b4d1 x86-64: Add time to vDSO
The only fast implementation of time(2) we expose is through the
vsyscall page and we want to get userspace to stop using the
vsyscall page.  So make it available through the vDSO as well.

This is essentially a cut-n-paste job.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3Cbf963bac5207de4b29613f27c42705e4371812a8.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:29 +02:00
6447facba3 x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
The vDSO isn't part of the kernel, so profiling and kernel
backtraces don't really matter.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C23087b738c037342abb53f2f07b9bef89ceaeea3.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:29 +02:00
44259b1abf x86-64: Move vread_tsc into a new file with sensible options
vread_tsc is short and hot, and it's userspace code so the usual
reasons to enable -pg and turn off sibling calls don't apply.

(OK, turning off sibling calls has no effect.  But it might
someday...)

As an added benefit, tsc.c is profilable now.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C99c6d7f5efa3ccb65b4ac6eb443e1ab7bad47d7b.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:29 +02:00
0f51f2852c x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
vclock_gettime's do_monotonic helper can't ever generate a negative
nsec value, so it doesn't need to check whether it's negative.  In
the CLOCK_MONOTONIC_COARSE case, ns can't ever exceed 2e9-1, so we
can avoid the loop entirely.  This saves a single easily-predicted
branch.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3Cd6d528d32c7a21618057cfc9005942a0fe5cb54a.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:28 +02:00
3729db5ca2 x86-64: Don't generate cmov in vread_tsc
vread_tsc checks whether rdtsc returns something less than
cycle_last, which is an extremely predictable branch.  GCC likes
to generate a cmov anyway, which is several cycles slower than
a predicted branch.  This saves a couple of nanoseconds.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C561280649519de41352fcb620684dfb22bad6bac.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:28 +02:00
057e6a8c66 x86-64: Remove unnecessary barrier in vread_tsc
RDTSC is completely unordered on modern Intel and AMD CPUs.  The
Intel manual says that lfence;rdtsc causes all previous instructions
to complete before the tsc is read, and the AMD manual says to use
mfence;rdtsc to do the same thing.

From a decent amount of testing [1] this is enough to make rdtsc
be ordered with respect to subsequent loads across a wide variety
of CPUs.

On Sandy Bridge (i7-2600), this improves a loop of
clock_gettime(CLOCK_MONOTONIC) by more than 5 ns/iter.

[1] https://lkml.org/lkml/2011/4/18/350

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C1c158b9d74338aa5361f96dd473d0e6a58235302.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:28 +02:00
8c49d9a74b x86-64: Clean up vdso/kernel shared variables
Variables that are shared between the vdso and the kernel are
currently a bit of a mess.  They are each defined with their own
magic, they are accessed differently in the kernel, the vsyscall page,
and the vdso, and one of them (vsyscall_clock) doesn't even really
exist.

This changes them all to use a common mechanism.  All of them are
delcared in vvar.h with a fixed address (validated by the linker
script).  In the kernel (as before), they look like ordinary
read-write variables.  In the vsyscall page and the vdso, they are
accessed through a new macro VVAR, which gives read-only access.

The vdso is now loaded verbatim into memory without any fixups.  As a
side bonus, access from the vdso is faster because a level of
indirection is removed.

While we're at it, pack jiffies and vgetcpu_mode into the same
cacheline.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:51:28 +02:00
1b4ac2a935 x86: Get rid of asmregparm
As UML does no longer need asmregparm we can remove it.

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: namhyung@gmail.com
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/%3C1306189085-29896-1-git-send-email-richard%40nod.at%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 14:33:35 +02:00
6988f20fe0 Merge branch 'fixes-2.6.39' into for-2.6.40 2011-05-24 09:59:36 +02:00
5e152b4c9e Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits)
  PCI: Don't use dmi_name_in_vendors in quirk
  PCI: remove unused AER functions
  PCI/sysfs: move bus cpuaffinity to class dev_attrs
  PCI: add rescan to /sys/.../pci_bus/.../
  PCI: update bridge resources to get more big ranges when allocating space (again)
  KVM: Use pci_store/load_saved_state() around VM device usage
  PCI: Add interfaces to store and load the device saved state
  PCI: Track the size of each saved capability data area
  PCI/e1000e: Add and use pci_disable_link_state_locked()
  x86/PCI: derive pcibios_last_bus from ACPI MCFG
  PCI: add latency tolerance reporting enable/disable support
  PCI: add OBFF enable/disable support
  PCI: add ID-based ordering enable/disable support
  PCI hotplug: acpiphp: assume device is in state D0 after powering on a slot.
  PCI: Set PCIE maxpayload for card during hotplug insertion
  PCI/ACPI: Report _OSC control mask returned on failure to get control
  x86/PCI: irq and pci_ids patch for Intel Panther Point DeviceIDs
  PCI: handle positive error codes
  PCI: check pci_vpd_pci22_wait() return
  PCI: Use ICH6_GPIO_EN in ich6_lpc_acpi_gpio
  ...

Fix up trivial conflicts in include/linux/pci_ids.h: commit a6e5e2be4461
moved the intel SMBUS ID definitons to the i2c-i801.c driver.
2011-05-23 15:39:34 -07:00
ea2b50ef4c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, apic: Include module.h header in apic_flat_64.c
  x86, apic: Make apic drivers static
  x86, apic: Clean up bigsmp apic selection code
  x86, apic: Use .apicdrivers section for the apic drivers list
  x86, apic: Introduce .apicdrivers section to find the list of apic drivers
  x86, x2apic: Move the common bits to x2apic.h
  x86, x2apic: Minimize IPI register writes using cluster groups
  x86, x2apic: Track the x2apic cluster sibling map
  x86, x2apic: Remove duplicate code for IPI mask routines
  x86, apic: Use probe routines to simplify apic selection
  x86, ioapic: Consolidate mp_ioapic_routing[] into 'struct ioapic'
  x86, ioapic: Consolidate gsi routing info into 'struct ioapic'
  x86, ioapic: Consolidate mp_ioapics[] into 'struct ioapic'
  x86, ioapic: Consolidate ioapic_saved_data[] into 'struct ioapic'
  x86, ioapic: Add struct ioapic
  x86, ioapic: Remove duplicate code for saving/restoring RTEs
  x86, ioapic: Use ioapic_saved_data while enabling intr-remapping
  x86, ioapic: Allocate ioapic_saved_data early
  x86, ioapic: Fix potential resume deadlock
2011-05-23 12:54:15 -07:00
b18bf0948e x86, apic: Include module.h header in apic_flat_64.c
apic_flat_64.c needs to include module.h because it uses
EXPORT_SYMBOL_GPL().

This fixes these warnings on some !SMP randconfigs:

  arch/x86/kernel/apic/apic_flat_64.c:31: warning: data definition has no type or storage class
  arch/x86/kernel/apic/apic_flat_64.c:31: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
  arch/x86/kernel/apic/apic_flat_64.c:31: warning: parameter names (without types) in function declaration

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20110523104300.dd532a99.randy.dunlap@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-23 21:27:35 +02:00
19504828b4 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Fix sample size bit operations
  perf tools: Fix ommitted mmap data update on remap
  watchdog: Change the default timeout and configure nmi watchdog period based on watchdog_thresh
  watchdog: Disable watchdog when thresh is zero
  watchdog: Only disable/enable watchdog if neccessary
  watchdog: Fix rounding bug in get_sample_period()
  perf tools: Propagate event parse error handling
  perf tools: Robustify dynamic sample content fetch
  perf tools: Pre-check sample size before parsing
  perf tools: Move evlist sample helpers to evlist area
  perf tools: Remove junk code in mmap size handling
  perf tools: Check we are able to read the event size on mmap
2011-05-23 09:25:52 -07:00
57d19e80f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  b43: fix comment typo reqest -> request
  Haavard Skinnemoen has left Atmel
  cris: typo in mach-fs Makefile
  Kconfig: fix copy/paste-ism for dell-wmi-aio driver
  doc: timers-howto: fix a typo ("unsgined")
  perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
  md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
  treewide: fix a few typos in comments
  regulator: change debug statement be consistent with the style of the rest
  Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
  audit: acquire creds selectively to reduce atomic op overhead
  rtlwifi: don't touch with treewide double semicolon removal
  treewide: cleanup continuations and remove logging message whitespace
  ath9k_hw: don't touch with treewide double semicolon removal
  include/linux/leds-regulator.h: fix syntax in example code
  tty: fix typo in descripton of tty_termios_encode_baud_rate
  xtensa: remove obsolete BKL kernel option from defconfig
  m68k: fix comment typo 'occcured'
  arch:Kconfig.locks Remove unused config option.
  treewide: remove extra semicolons
  ...
2011-05-23 09:12:26 -07:00
d7ef64a9f9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Eliminate various 'set but not used' warnings
  x86, SMEP: Fix section mismatch warnings
  x86, amd: Use _safe() msr access for GartTlbWlk disable code
2011-05-23 08:51:55 -07:00
f4b10bc60a Merge branch 'kvm-updates/2.6.40' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.40' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (131 commits)
  KVM: MMU: Use ptep_user for cmpxchg_gpte()
  KVM: Fix kvm mmu_notifier initialization order
  KVM: Add documentation for KVM_CAP_NR_VCPUS
  KVM: make guest mode entry to be rcu quiescent state
  KVM: x86 emulator: Make jmp far emulation into a separate function
  KVM: x86 emulator: Rename emulate_grpX() to em_grpX()
  KVM: x86 emulator: Remove unused arg from emulate_pop()
  KVM: x86 emulator: Remove unused arg from writeback()
  KVM: x86 emulator: Remove unused arg from read_descriptor()
  KVM: x86 emulator: Remove unused arg from seg_override()
  KVM: Validate userspace_addr of memslot when registered
  KVM: MMU: Clean up gpte reading with copy_from_user()
  KVM: PPC: booke: add sregs support
  KVM: PPC: booke: save/restore VRSAVE (a.k.a. USPRG0)
  KVM: PPC: use ticks, not usecs, for exit timing
  KVM: PPC: fix exit accounting for SPRs, tlbwe, tlbsx
  KVM: PPC: e500: emulate SVR
  KVM: VMX: Cache vmcs segment fields
  KVM: x86 emulator: consolidate segment accessors
  KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions
  ...
2011-05-23 08:42:08 -07:00
4eec42f392 watchdog: Change the default timeout and configure nmi watchdog period based on watchdog_thresh
Before the conversion of the NMI watchdog to perf event, the
watchdog timeout was 5 seconds. Now it is 60 seconds. For my
particular application, netbooks, 5 seconds was a better
timeout. With a short timeout, we catch faults earlier and are
able to send back a panic. With a 60 second timeout, the user is
unlikely to wait and will instead hit the power button, causing
us to lose the panic info.

This change configures the NMI period to watchdog_thresh and
sets the softlockup_thresh to watchdog_thresh * 2. In addition,
watchdog_thresh was reduced to 10 seconds as suggested by Ingo
Molnar.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-4-git-send-email-msb@chromium.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071642.GF22305@elte.hu>
2011-05-23 11:58:59 +02:00
82da65dab5 x86: setup_smep needs to be __cpuinit
The setup_smep function gets calle at resume time too, and is thus not a
pure __init function.  When marked as __init, it gets thrown out after
the kernel has initialized, and when the kernel is suspended and
resumed, the code will no longer be around, and we'll get a nice "kernel
tried to execute NX-protected page" oops because the page is no longer
marked executable.

Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-22 21:44:13 -07:00
c8cfbb555e KVM: MMU: Use ptep_user for cmpxchg_gpte()
The address of the gpte was already calculated and stored in ptep_user
before entering cmpxchg_gpte().

This patch makes cmpxchg_gpte() to use that to make it clear that we
are using the same address during walk_addr_generic().

Note that the unlikely annotations are used to show that the conditions
are something unusual rather than for performance.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-22 08:48:14 -04:00
d2f62766d5 KVM: x86 emulator: Make jmp far emulation into a separate function
We introduce em_jmp_far().

We also call this from em_grp45() to stop treating modrm_reg == 5 case
separately in the group 5 emulation.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:48:06 -04:00
51187683cb KVM: x86 emulator: Rename emulate_grpX() to em_grpX()
The prototypes are changed appropriately.

We also replaces "goto grp45;" with simple em_grp45() call.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:48:05 -04:00
3b9be3bf2e KVM: x86 emulator: Remove unused arg from emulate_pop()
The opt of emulate_grp1a() is also removed.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:48:03 -04:00
adddcecf92 KVM: x86 emulator: Remove unused arg from writeback()
Remove inline at this chance.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:48:00 -04:00
509cf9fe11 KVM: x86 emulator: Remove unused arg from read_descriptor()
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:59 -04:00
c1ed6dea81 KVM: x86 emulator: Remove unused arg from seg_override()
In addition, one comma at the end of a statement is replaced with a
semicolon.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:57 -04:00
fa3d315a4c KVM: Validate userspace_addr of memslot when registered
This way, we can avoid checking the user space address many times when
we read the guest memory.

Although we can do the same for write if we check which slots are
writable, we do not care write now: reading the guest memory happens
more often than writing.

[avi: change VERIFY_READ to VERIFY_WRITE]

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:56 -04:00
12cb814f3b KVM: MMU: Clean up gpte reading with copy_from_user()
When we optimized walk_addr_generic() by not using the generic guest
memory reader, we replaced copy_from_user() with get_user():

  commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
  KVM: MMU: Optimize guest page table walk

  commit 15e2ac9a43d4d7d08088e404fddf2533a8e7d52e
  KVM: MMU: Fix 64-bit paging breakage on x86_32

But as Andi pointed out later, copy_from_user() does the same as
get_user() as long as we give a constant size to it.

So we use copy_from_user() to clean up the code.

The only, noticeable, regression introduced by this is 64-bit gpte
reading on x86_32 hosts needed for PAE guests.

But this can be mitigated by implementing 8-byte get_user() for x86_32,
if needed.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:54 -04:00