Commit Graph

95021 Commits

Author SHA1 Message Date
Ingo Molnar
42ebd27bcb Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-25 10:04:22 +02:00
Stephane Eranian
9f7ff8931e perf/x86: Fix RAPL rdmsrl_safe() usage
This patch fixes a bug introduced by:

  2422365780 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU")

The rdmsrl_safe() function returns 0 on success.
The current code was failing to detect the RAPL PMU
on real hardware  (missing /sys/devices/power) because
the return value of rdmsrl_safe() was misinterpreted.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 08:12:41 +02:00
Linus Torvalds
39bfe90706 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso fix from Peter Anvin:
 "This is a single build fix for building with gold as opposed to GNU
  ld.  It got queued up separately and was expected to be pushed during
  the merge window, but it got left behind"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Make the vdso linker script compatible with Gold
2014-04-22 09:09:06 -07:00
Anton Ivanov
0565103d1a um: Memory corruption on startup
The reverse case of this race (you must msync before read) is
well known. This is the not so common one.

It can be triggered only on systems which do a lot of task
switching and only at UML startup. If you are starting 200+ UMLs
~ 0.5% will always die without this fix.

Signed-off-by: Anton Ivanov <antivano@cisco.com>
[rw: minor whitespace fixes]
Signed-off-by: Richard Weinberger <richard@nod.at>
2014-04-20 23:57:21 +02:00
Anton Ivanov
9fcb663be4 um: Missing pipe handling
UML does not handle sigpipe. As a result when running it under
expect or redirecting the IO from the console to an external program
it will crash if the program stops or exits.

Signed-off-by: Anton Ivanov <antivano@cisco.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2014-04-20 23:13:30 +02:00
Tristan Schmelcher
0d71832e30 uml: Simplify tempdir logic.
Inferring the mount hierarchy correctly from /proc/mounts is hard when MS_MOVE
may have been used, and the previous code did it wrongly. This change simplifies
the logic to only require that /dev/shm be _on_ tmpfs (which can be checked
trivially with statfs) rather than that it be a _mountpoint_ of tmpfs, since
there isn't a compelling reason to be that strict. We also now check for tmpfs
on whatever directory we ultimately use so that the user is better informed.

This change also moves the more standard TMPDIR environment variable check ahead
of the others.

Applies to 3.12.

Signed-off-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2014-04-20 23:10:44 +02:00
Linus Torvalds
6d4596905b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "This fixes the preemption-count imbalance crash reported by Owen
  Kibel"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix CMCI preemption bugs
2014-04-19 10:41:43 -07:00
Linus Torvalds
8de3f7a705 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kernel side fixes:

   - an Intel uncore PMU driver potential crash fix
   - a kprobes/perf-call-graph interaction fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
  kprobes/x86: Fix page-fault handling logic
2014-04-19 10:40:11 -07:00
Linus Torvalds
ea2388f281 Merge branch 'akpm' (incoming from Andrew)
Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  thp: close race between split and zap huge pages
  mm: fix new kernel-doc warning in filemap.c
  mm: fix CONFIG_DEBUG_VM_RB description
  mm: use paravirt friendly ops for NUMA hinting ptes
  mips: export flush_icache_range
  mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
  wait: explain the shadowing and type inconsistencies
  Shiraz has moved
  Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
  powerpc/mm: fix ".__node_distance" undefined
  kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
  init/Kconfig: move the trusted keyring config option to general setup
  vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()
2014-04-18 16:40:31 -07:00
Kees Cook
8229f1a044 mips: export flush_icache_range
The lkdtm module performs tests against executable memory ranges, so it
needs to flush the icache for proper behaviors.  Other architectures
already export this, so do the same for MIPS.

[akpm@linux-foundation.org: relocate export sites]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-18 16:40:09 -07:00
Viresh Kumar
9cc236827f Shiraz has moved
shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
company.  Replace ST's id with shiraz.linux.kernel@gmail.com.

It also updates .mailmap file to fix address for 'git shortlog'.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-18 16:40:08 -07:00
Mike Qiu
12c743eb22 powerpc/mm: fix ".__node_distance" undefined
CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  ...
  Building modules, stage 2.
WARNING: 1 bad relocations
c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
  WRAP    arch/powerpc/boot/zImage.pseries
  WRAP    arch/powerpc/boot/zImage.epapr
  MODPOST 1849 modules
ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....

The reason is symbol "__node_distance" not been exported in powerpc.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-18 16:40:08 -07:00
Vineet Gupta
64ee9f32c3 ARC: Delete stale barrier.h
Commit 93ea02bb84 ("arch: Clean up asm/barrier.h implementations")
wired generic barrier.h for ARC, but failed to delete the existing file.

In 3.15, due to rcupdate.h updates, this causes a build breakage on ARC:

      CC      arch/arc/kernel/asm-offsets.s
    In file included from include/linux/sched.h:45:0,
                     from arch/arc/kernel/asm-offsets.c:9:
    include/linux/rculist.h: In function __list_add_rcu:
    include/linux/rculist.h:54:2: error: implicit declaration of function smp_store_release [-Werror=implicit-function-declaration]
      rcu_assign_pointer(list_next_rcu(prev), new);
      ^

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-18 13:49:15 -07:00
Linus Torvalds
674366e90e PCI updates for v3.15:
Host bridge drivers
     - Fix OF interrupt mapping for DesignWare, R-Car, Tegra (Lucas Stach)
     - Fix DesignWare iATU programming (Mohit Kumar)
 
   Miscellaneous
     - Fix powerpc NULL dereference from list_for_each_entry() update (Mike Qiu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTUUMAAAoJEFmIoMA60/r8OrUQAKXhI6kXjJ36sSHiioQBFxD9
 fgJXDHS8gA4UD7ZV0BKQeA6kDM2v9A80UWLA2yLt+BWr0+DEgljsnRNWZjsrzoEn
 JZantkukqKfdo6LseIwfFJM7LTWqP/q6enTMXIp9UjLxnBK+j0K7qU0+u/VjH0Ab
 O8wouFu1B8REJDL5TvSjiEXO6uDtFPAuHQ1oAs5EeT35ZV54f8gSGyyUxLh1Fk18
 qrD+ERN5VrVI5K2ENJwIOljAoaHMiseSZPFA8nQu/XMZ5N0dZuU2gXGFMt3yuZID
 jTzq88umyEEMcbo+QqAW+rcoljq3GExBnOu2SF+0dvDWlXewsZz0WNdnKkUh5HCi
 vnnwvkSB4bpnv+rtu0QQgOFoFDkQV+EYSy6dUEijtVTOijZ2ZxnzOO8/y4pocpnB
 +hKehz0qNVl/RE9WMbCQ+TAb4K7+uECcilIUqq581BnmYFrH3FarfcDaHdWySI4O
 AIs2RtgwtlZv1MBpejM4OS7REav2IUFj8CWsvRIIrsgfvKvl/1Rxg2eQE8KPFSdc
 pi72CNKNvThzgAvpWjb0F/Wz7+0aWO6Nm0flYCBvcIB472x9dAzhVpoiKMf2BAII
 S/uebuA7hCZtoGEWOSwtEBYm+i3UR07YCL2UFv8Pj523DaOeTFgSYp7YFFJV+WqO
 AosR0V0L8mr2HMBK0KhN
 =VgYj
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "These are fixes for a powerpc NULL pointer dereference, an OF
  interrupt mapping issue on some of the new host bridges, and a
  DesignWare iATU issue.

  Host bridge drivers
   - Fix OF interrupt mapping for DesignWare, R-Car, Tegra (Lucas Stach)
   - Fix DesignWare iATU programming (Mohit Kumar)

  Miscellaneous
    - Fix powerpc NULL dereference from list_for_each_entry() update (Mike Qiu)"

* tag 'pci-v3.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: tegra: Use new OF interrupt mapping when possible
  PCI: rcar: Use new OF interrupt mapping when possible
  PCI: designware: Use new OF interrupt mapping when possible
  PCI: designware: Fix iATU programming for cfg1, io and mem viewport
  PCI: designware: Fix comment for setting number of lanes
  powerpc/PCI: Fix NULL dereference in sys_pciconfig_iobase() list traversal
2014-04-18 10:56:27 -07:00
Yan, Zheng
4a3dc121d3 perf/x86: Export perf_assign_events()
export perf_assign_events to allow building perf Intel uncore driver
as module

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1395133004-23205-3-git-send-email-zheng.z.yan@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:54:46 +02:00
Ingo Molnar
1111b680d3 Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:55 +02:00
Venkatesh Srinivas
2422365780 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:26 +02:00
Linus Torvalds
81cef0fe19 Merge branch 'parisc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
 "There are two major changes in this patchset:

  The major fix is that the epoll_pwait() syscall for 32bit userspace
  was not using the compat wrapper on a 64bit kernel.

  Secondly we changed the value of SHMLBA from 4MB to PAGE_SIZE to
  reflect that we can actually mmap to any multiple of PAGE_SIZE.  The
  only thing which needs care is that shared mmaps need to be mapped at
  the same offset inside the 4MB cache window"

* 'parisc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix epoll_pwait syscall on compat kernel
  parisc: change value of SHMLBA from 0x00400000 to PAGE_SIZE
  parisc: Replace __get_cpu_var uses for address calculation
2014-04-17 13:21:35 -07:00
Oleg Nesterov
6cc5e7ff2c uprobes/x86: Emulate relative conditional "near" jmp's
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
condition.

Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
correct no matter if this jmp is "near" or "short".

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:25 +02:00
Oleg Nesterov
8f95505bc1 uprobes/x86: Emulate relative conditional "short" jmp's
Teach branch_emulate_op() to emulate the conditional "short" jmp's which
check regs->flags.

Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
They all are rel8 and thus they can't trigger the problem, but perhaps
we will add the support in future just for completeness.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
8e89c0be17 uprobes/x86: Emulate relative call's
See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).

Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
d241006354 uprobes/x86: Emulate nop's using ops->emulate()
Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.

Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
7ba6db2d68 uprobes/x86: Emulate unconditional relative jmp's
Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.

However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.

Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.

TODO: move ->fixup into the union along with rip_rela_target_address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
8faaed1b9f uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()
1. Add the trivial sizeof_long() helper and change other callers of
   is_ia32_task() to use it.

   TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
   not necessarily mean 32bit. Fortunately syscall-like insns can't be
   probed so it actually works, but it would be better to rename and
   use is_ia32_frame().

2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
   and adjust_ret_addr() should be named "nleft". And in fact only the
   last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
   needs to inspect the non-zero error code.

TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:21 +02:00
Oleg Nesterov
75f9ef0b7f uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible
SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.

Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:21 +02:00
Oleg Nesterov
014940bad8 uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails
Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.

1. Change handle_singlestep() to loudly complain and send SIGILL.

   Note: this only affects x86, ppc/arm can't fail.

2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
   avoid TF games if it is going to return an error.

   This can help to to analyze the problem, if nothing else we should
   not report ->ip = xol_slot in the core-file.

   Note: this means that handle_riprel_post_xol() can be called twice,
   but this is fine because it is idempotent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:20 +02:00
Oleg Nesterov
e55848a4f8 uprobes/x86: Conditionalize the usage of handle_riprel_insn()
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.

We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.

Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:20 +02:00
Oleg Nesterov
8ad8e9d3fd uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:19 +02:00
Oleg Nesterov
34e7317d6a uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks
No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.

Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:18 +02:00
Oleg Nesterov
d20737c07a uprobes/x86: Gather "riprel" functions together
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
the closely related handle_riprel_insn(). This way it is simpler to
read and understand this code, and this lessens the number of ifdef's.

While at it, update the comment in handle_riprel_post_xol() as Jim
suggested.

TODO: rename them somehow to make the naming consistent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:17 +02:00
Oleg Nesterov
59078d4b96 uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg
Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.

Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:17 +02:00
Oleg Nesterov
ddb69f276c uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()
No functional changes, preparation.

Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:

	- Do not call insn_get_opcode() again, it was already called
	  by validate_insn_bits().

	- Move "case 0xea" up. This way "case 0xff" can fall through
	  to default case.

	- change "case 0xff" to use the nested "switch (MODRM_REG)",
	  this way the code looks a bit simpler.

	- Make the comments look consistent.

While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.

TODO: cleanup/fix the mess in validate_insn_bits() paths:

	- validate_insn_64bits() and validate_insn_32bits() should be
	  unified.

	- "ifdef" is not used consistently; if good_insns_64 depends
	  on CONFIG_X86_64, then probably good_insns_32 should depend
	  on CONFIG_X86_32/EMULATION

	- the usage of mm->context.ia32_compat looks wrong if the task
	  is TIF_X32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:16 +02:00
Linus Torvalds
88764e0a3e Xen regression and bug fixes for 3.15-rc1.
- Fix completely broken 32-bit PV guests caused by x86 refactoring
   32-bit thread_info.
 - Only enable ticketlock slow path on Xen (not bare metal).
 - Fix two bugs with PV guests not shutting down when requested.
 - Fix a minor memory leak in xen-pciback error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQEcBAABAgAGBQJTT/hQAAoJEFxbo/MsZsTR6sMIAJs7mJXSqDQn3Z8O+TemRa53
 p92ZomTNYALjUMglXcxJ2Zua6IsZMWdu7jcV1GoXC70V4YLmUs8KaBgZmI5ayUQy
 bBpK+6WIAJyBkJdNH5fK3wggJ2UZjw0/twPNgd9gACwjUiYhx8iHN/hTGvu4qPBJ
 MGAIlg6wdnGwRydi72uk9Am/xpebEdQy4DRD20vjwA/qUkT4uHVv/AA4hc4AK29w
 ToK8qFSisgAlahcmq8/T4+OBFEKz78b9dQcdsGWyAk0ofWILfwD1l53xhzUin25s
 JUVevWhhLCKRZBOq4Ykc5qyqnLff4m56rm/THQ6f0oRdJn/OR+SWOImda2Qqmvs=
 =Gxpq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from David Vrabel:
 "Xen regression and bug fixes for 3.15-rc1:

   - fix completely broken 32-bit PV guests caused by x86 refactoring
     32-bit thread_info.
   - only enable ticketlock slow path on Xen (not bare metal)
   - fix two bugs with PV guests not shutting down when requested
   - fix a minor memory leak in xen-pciback error path"

* tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/manage: Poweroff forcefully if user-space is not yet up.
  xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  xen/spinlock: Don't enable them unconditionally.
  xen-pciback: silence an unwanted debug printk
  xen: fix memory leak in __xen_pcibk_add_pci_dev()
  x86/xen: Fix 32-bit PV guests's usage of kernel_stack
2014-04-17 10:54:07 -07:00
Masami Hiramatsu
6381c24cd6 kprobes/x86: Fix page-fault handling logic
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).

In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.

But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.

 ----
 [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40

To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17 10:57:02 +02:00
Ingo Molnar
ea431643d6 x86/mce: Fix CMCI preemption bugs
The following commit:

  27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

 - machine_check_poll() does a get_cpu_var() without a matching
   put_cpu_var(), which causes preemption imbalance and crashes upon
   bootup.

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.

Reported-by: Owen Kibel <qmewlo@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
 arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-17 10:28:42 +02:00
Linus Torvalds
6ca2a88ad8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various fixes:

   - reboot regression fix
   - build message spam fix
   - GPU quirk fix
   - 'make kvmconfig' fix

  plus the wire-up of the renameat2() system call on i386"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove the PCI reboot method from the default chain
  x86/build: Supress "Nothing to be done for ..." messages
  x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
  x86/platform: Fix "make O=dir kvmconfig"
  i386: Wire up the renameat2() syscall
2014-04-16 16:40:18 -07:00
Linus Torvalds
2a83dc7e37 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes, plus a simple hardware-enablement patch for the Intel
  RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
  still fine at this stage"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Instead of redirecting flex output, use -o
  perf tools: Fix double free in perf test 21 (code-reading.c)
  perf stat: Initialize statistics correctly
  perf bench: Set more defaults in the 'numa' suite
  perf bench: Fix segfault at the end of an 'all' execution
  perf bench: Update manpage to mention numa and futex
  perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
  perf probe: Fix to handle errors in line_range searching
  perf probe: Fix --line option behavior
  perf tools: Pick up libdw without explicit LIBDW_DIR
  MAINTAINERS: Change e-mail to kernel.org one
  perf callchains: Disable unwind libraries when libelf isn't found
  tools lib traceevent: Do not call warning() directly
  tools lib traceevent: Print event name when show warning if possible
  perf top: Fix documentation of invalid -s option
  perf/x86: Enable DRAM RAPL support on Intel Haswell
2014-04-16 16:38:57 -07:00
Linus Torvalds
5f63517cbf A first set of pin control fixes for the v3.15 series:
- Fix a couple of barnsjukdomar on the Rockchip driver.
 
 - Remove an idiotic debug print I happened to leave behind
   in the Nomadik driver.
 
 - Fixup the Qualcomm MSM interrupt handling code for the
   TLMM v2.
 
 - Three patches renaming the Broadcom Capri driver to
   BCM28155. This has been falling between the chairs for
   some time due to some cross-tree synchronization
   misunderstandings, now I'm fed up with this and just
   rename it in this -rc1 phase.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTTPPzAAoJEEEQszewGV1zSBgP/iMS9y9LAouPmzkXseCeT4aM
 eb2pEi1wtOSWCeGNMIx4X1GRkbHS+T+5Wk6dmh46RUn/8b1HY4gkLoJLnrQGML1l
 zS9tdDyURGmGYuVAr0ghLq0LTruvcViCQageRzO8yllTkJb8Tf6rfKE2+y9BsGRH
 CdBIE9/XYP2Z2Wwd0fLAPMFpa9wPz8eNeF7XGyQ20+DSuRzNMmDq6AUhmlCfIMnL
 TxlJIT1vpAAt/e3wcRvmr2n/Nrlz28ajP/VmzSm2dSqTajy8ofWqgFwQLiItJJ3q
 VuJto3eKy+xGT4IVO+ozXCZd0kDgaAeiz7PNWHpbFBec0y4QFmVFxtPpw/Zff7RH
 136Vh0KahX47TaJ1GGvB5622OLjsQzwH2TY24Sn9WRzNT8VS0pv2F3RQk8tVrWrd
 fquQksuMEaCay+MHcBhI1mJlQcgTFJNsenmY8KOIjFykeBz5x6bOtBkbOCjzuogr
 yVgaOW/zV7b0zFQ6Vv9eZFf7WYhBXE7w1Im5D2EMR9mJpNBWbj9A8GQpCjc0+VXB
 jN2hWmj5qQtQW6z67VEn3l8Mqzpazsu61zbcB3F4Ma0m247vakIkk5I8GMmLW3/c
 UMr6RG2IHIaECNdS1Ir1UkBnDCFb7CQq3j0Bh2UynyU9jKGtRZU/P7SdD7LqHsi+
 Q56kdNSbYdRPpjPWmZUA
 =i5qY
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pincontrol fixes from Linus Walleij:
 "A first set of pin control fixes for the v3.15 series:

   - Fix a couple of barnsjukdomar on the Rockchip driver.

   - Remove an idiotic debug print I happened to leave behind in the
     Nomadik driver.

   - Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.

   - Three patches renaming the Broadcom Capri driver to BCM28155.  This
     has been falling between the chairs for some time due to some
     cross-tree synchronization misunderstandings, now I'm fed up with
     this and just rename it in this -rc1 phase"

* tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: fix typo in bindings documentation
  Update bcm_defconfig with new pinctrl CONFIG
  pinctrl: Rename Broadcom Capri pinctrl driver
  pinctrl: msm: Correct interrupt code for TLMM v2
  pinctrl: nomadik: delete stray debug print
  pinctrl: rockchip: handle first half of rk3188-bank0 correctly
  pinctrl: rockchip: add return value to rockchip_set_mux
  pinctrl: rockchip: fix offset of mux registers for rk3188
2014-04-16 15:59:16 -07:00
Linus Torvalds
0f689a33ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "An update to the oops output with additional information about the
  crash.  The renameat2 system call is enabled.  Two patches in regard
  to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp_vt220: Fix kernel panic due to early terminal input
  s390/compat: fix typo
  s390/uaccess: fix possible register corruption in strnlen_user_srst()
  s390: add 31 bit warning message
  s390: wire up sys_renameat2
  s390: show_registers() should not map user space addresses to kernel symbols
  s390/mm: print control registers and page table walk on crash
  s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
  s390: fix control register update
2014-04-16 11:28:25 -07:00
Linus Torvalds
7d38cc0290 Small workaround for a rare, but annoying, erratum
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTTsl0AAoJEKurIx+X31iBlZgP/1B01Zsqbq/Oh2JFRwjnLJbW
 FSMDMoAYHbIoB+4tIvQdQf6vGxPz++oBtJq95CeDeez2wh/BPvGFj/LrdBfYYgHo
 uutrB4c6P0WBOQJnfRcXS/MC4frihzV5IaA+VXAXTIBTVKlEi3PGUBM3/WgUA8QS
 Aga73Q0ze3XCjbtGeqjwE3YYzuiOzELeb4MJbNjH3a40ZkXCCmZcerZT8Zp0O5Qi
 hYZr6OTPmmoXrgX4MeBX7kFcwhHe6zb86TvdRAoPIY8fwPJt/zrhQlqrTfoUzRH5
 xyX3y4ByU7HU0CEk0g8AMiiR6EZNmUBo8onei0xVwRP8Z2u884RKQ1cC6D059ars
 isyMO/7lH6Rn+48OOTiXb0e2NOmejr/TRZ3asrvtfvb9uyMGFkG+6g2/pYxbnTb2
 QC8jknY07PhCB9J738RTllAFX6OFRvhk8b3DtrtXC5GTfIiA7HkdkTQ3HISNTgif
 C5WalPhyjh6iCJbtFtdFg0uwEpXgQ5QX5sN08NQOkuDXiI9obftD7pw5EIPiZCTt
 GIIxXh1KDWiktoOHrFAiIm4bh+CAxNRG10WvbZQj3CY+m9GYcI30h/XFmkoil4ds
 IoqVMan42yNK5AwqSa8L9Eq6t5Za1Uic1OgOofX2iYEEqp84aklUI0m5b8woN+/u
 t0BK8ThF4msbaWlJgKRK
 =lloU
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull itanium erratum fix from Tony Luck:
 "Small workaround for a rare, but annoying, erratum #237"

* tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
2014-04-16 11:22:45 -07:00
Tony Luck
c0b5a64d93 [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
April 2014 Itanium processor specification update:

http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html

describes this erratum:

=========================================================================
237. Under a complex set of conditions, store to load forwarding for a
sub 8-byte load may complete incorrectly

Problem: A load instruction may complete incorrectly when a code sequence
using 4-byte or smaller load and store operations to the same address
is executed in combination with specific timing of all the following
concurrent conditions: store to load forwarding, alignment checking
enabled, a mis-predicted branch, and complex cache utilization activity.

Implication: The affected sub 8-byte instruction may complete
incorrectly resulting in unpredictable system behavior. There is an
extremely low probability of exposure due to the significant number of
complex microarchitectural concurrent conditions required to encounter
the erratum.

Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
Hyper-Threading will significantly reduce exposure to the conditions
that contribute to encountering the erratum.

Status: See the Summary Table of Changes for the affected steppings.
=========================================================================

[Table of changes essentially lists all models from McKinley to Tukwila]

The PSR.ac bit controls whether the processor will always generate
an unaligned reference trap (0x5a00) for a misaligned data access
(when PSR.ac=1) or if it will let the access succeed when running
on a cpu that implements logic to handle some unaligned accesses.

Way back in 2008 in commit b704882e70
  [IA64] Rationalize kernel mode alignment checking
we made the decision to always enable strict checking. We were
already doing so in trap/interrupt context because the common
preamble code set this bit - but the rest of supervisor code
(and by inheritance user code) ran with PSR.ac=0.

We now reverse that decision and set PSR.ac=0 everywhere in the
kernel (also inherited by user processes). This will avoid the
erratum using the method described in the Itanium specification
update.  Net effect for users is that the processor will handle
unaligned access when it can (typically with a tiny performance
bubble in the pipeline ... but much less invasive than taking a
trap and having the OS perform the access).

Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-04-16 10:20:34 -07:00
Ingo Molnar
5be44a6fb1 x86: Remove the PCI reboot method from the default chain
Steve reported a reboot hang and bisected it back to this commit:

  a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

  reboot=t       # triple fault                  ok
  reboot=k       # keyboard ctrl                 FAIL
  reboot=b       # BIOS                          ok
  reboot=a       # ACPI                          FAIL
  reboot=e       # EFI                           FAIL   [system has no EFI]
  reboot=p       # PCI 0xcf9                     FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

 - it orders the actual reboot logic to follow the reboot ordering
   pattern - it was in a pretty random order before for no good
   reason.

 - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
   BOOT_CF9_SAFE flags to make the code more obvious.

 - it tries the BIOS reboot method before the PCI reboot method.
   (Since 'BIOS' is a terminal reboot method resulting in a hang
    if it does not work, this is essentially equivalent to removing
    the PCI reboot method from the default reboot chain.)

 - just for the miraculous possibility of terminal (resulting
   in hang) reboot methods of triple fault or BIOS returning
   without having done their job, there's an ordering between
   them as well.

Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-16 08:56:09 +02:00
Konrad Rzeszutek Wilk
e0fc17a936 xen/spinlock: Don't enable them unconditionally.
The git commit a945928ea2
('xen: Do not enable spinlocks before jump_label_init() has executed')
was added to deal with the jump machinery. Earlier the code
that turned on the jump label was only called by Xen specific
functions. But now that it had been moved to the initcall machinery
it gets called on Xen, KVM, and baremetal - ouch!. And the detection
machinery to only call it on Xen wasn't remembered in the heat
of merge window excitement.

This means that the slowpath is enabled on baremetal while it should
not be.

Reported-by: Waiman Long <waiman.long@hp.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 17:41:28 +01:00
Boris Ostrovsky
4461bbc05b x86/xen: Fix 32-bit PV guests's usage of kernel_stack
Commit 198d208df4 ("x86: Keep
thread_info on thread stack in x86_32") made 32-bit kernels use
kernel_stack to point to thread_info. That change missed a couple of
updates needed by Xen's 32-bit PV guests:

1. kernel_stack needs to be initialized for secondary CPUs

2. GET_THREAD_INFO() now uses %fs register which may not be the
   kernel's version when executing xen_iret().

With respect to the second issue, we don't need GET_THREAD_INFO()
anymore: we used it as an intermediate step to get to per_cpu xen_vcpu
and avoid referencing %fs. Now that we are going to use %fs anyway we
may as well go directly to xen_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 15:00:14 +01:00
Linus Torvalds
55101e2d6c Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
 - Fix for guest triggerable BUG_ON (CVE-2014-0155)
 - CR4.SMAP support
 - Spurious WARN_ON() fix

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: remove WARN_ON from get_kernel_ns()
  KVM: Rename variable smep to cr4_smep
  KVM: expose SMAP feature to guest
  KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
  KVM: Add SMAP support when setting CR4
  KVM: Remove SMAP bit from CR4_RESERVED_BITS
  KVM: ioapic: try to recover if pending_eoi goes out of range
  KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14 16:21:28 -07:00
Mike Qiu
140ab6452c powerpc/PCI: Fix NULL dereference in sys_pciconfig_iobase() list traversal
3bc955987f ("powerpc/PCI: Use list_for_each_entry() for bus traversal")
caused a NULL pointer dereference because the loop body set the iterator to
NULL:

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc000000000041d78
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [c000000000041d78] .sys_pciconfig_iobase+0x68/0x1f0
  LR [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0
  Call Trace:
  [c0000003b4787db0] [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable)
  [c0000003b4787e30] [c000000000009ed8] syscall_exit+0x0/0x98

Fix it by using a temporary variable for the iterator.

[bhelgaas: changelog, drop tmp_bus initialization]
Fixes: 3bc955987f powerpc/PCI: Use list_for_each_entry() for bus traversal
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-04-14 16:33:49 -06:00
Marcelo Tosatti
b351c39cc9 KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14 17:50:43 -03:00
Feng Wu
66386ade2a KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:40 -03:00
Feng Wu
de935ae15b KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:37 -03:00
Feng Wu
e1e746b3c5 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:35 -03:00