Commit Graph

150614 Commits

Author SHA1 Message Date
Paolo Bonzini
2cf7ea9f40 KVM: VMX: hide flexpriority from guest when disabled at the module level
As of commit 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls
have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
whereas previously KVM would allow a nested guest to enable
VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware.  That is,
KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
(always) allow setting it when kvm-intel.flexpriority=0, and may even
initially allow the control and then clear it when the nested guest
writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
functional issues.

Hide the control completely when the module parameter is cleared.

reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 13:40:44 +02:00
Sean Christopherson
fd6b6d9b82 KVM: VMX: check for existence of secondary exec controls before accessing
Return early from vmx_set_virtual_apic_mode() if the processor doesn't
support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
which reside in SECONDARY_VM_EXEC_CONTROL.  This eliminates warnings
due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
on processors without secondary exec controls.

Remove the similar check for TPR shadowing as it is incorporated in the
flexpriority_enabled check and the APIC-related code in
vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.

Reported-by: Gerhard Wiesinger <redhat@wiesinger.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 13:40:21 +02:00
Sean Christopherson
daa07cbc9a KVM: x86: fix L1TF's MMIO GFN calculation
One defense against L1TF in KVM is to always set the upper five bits
of the *legal* physical address in the SPTEs for non-present and
reserved SPTEs, e.g. MMIO SPTEs.  In the MMIO case, the GFN of the
MMIO SPTE may overlap with the upper five bits that are being usurped
to defend against L1TF.  To preserve the GFN, the bits of the GFN that
overlap with the repurposed bits are shifted left into the reserved
bits, i.e. the GFN in the SPTE will be split into high and low parts.
When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO
access, get_mmio_spte_gfn() unshifts the affected bits and restores
the original GFN for comparison.  Unfortunately, get_mmio_spte_gfn()
neglects to mask off the reserved bits in the SPTE that were used to
store the upper chunk of the GFN.  As a result, KVM fails to detect
MMIO accesses whose GPA overlaps the repurprosed bits, which in turn
causes guest panics and hangs.

Fix the bug by generating a mask that covers the lower chunk of the
GFN, i.e. the bits that aren't shifted by the L1TF mitigation.  The
alternative approach would be to explicitly zero the five reserved
bits that are used to store the upper chunk of the GFN, but that
requires additional run-time computation and makes an already-ugly
bit of code even more inscrutable.

I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to
warn if GENMASK_ULL() generated a nonsensical value, but that seemed
silly since that would mean a system that supports VMX has less than
18 bits of physical address space...

Reported-by: Sakari Ailus <sakari.ailus@iki.fi>
Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
Cc: Junaid Shahid <junaids@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Junaid Shahid <junaids@google.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:41:00 +02:00
Liran Alon
62cf9bd811 KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only
when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls.

Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which
is L1 IA32_BNDCFGS.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:59 +02:00
Liran Alon
503234b3fd KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
Commit a87036add0 ("KVM: x86: disable MPX if host did not enable
MPX XSAVE features") introduced kvm_mpx_supported() to return true
iff MPX is enabled in the host.

However, that commit seems to have missed replacing some calls to
kvm_x86_ops->mpx_supported() to kvm_mpx_supported().

Complete original commit by replacing remaining calls to
kvm_mpx_supported().

Fixes: a87036add0 ("KVM: x86: disable MPX if host did not enable
MPX XSAVE features")

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:57 +02:00
Liran Alon
5f76f6f5ff KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
Before this commit, KVM exposes MPX VMX controls to L1 guest only based
on if KVM and host processor supports MPX virtualization.
However, these controls should be exposed to guest only in case guest
vCPU supports MPX.

Without this change, a L1 guest running with kernel which don't have
commit 691bd4340b ("kvm: vmx: allow host to access guest
MSR_IA32_BNDCFGS") asserts in QEMU on the following:
	qemu-kvm: error: failed to set MSR 0xd90 to 0x0
	qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs:
	Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed'
This is because L1 KVM kvm_init_msr_list() will see that
vmx_mpx_supported() (As it only checks MPX VMX controls support) and
therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS.
However, later when L1 will attempt to set this MSR via KVM_SET_MSRS
IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu).

Therefore, fix the issue by exposing MPX VMX controls to L1 guest only
when vCPU supports MPX.

Fixes: 36be0b9deb ("KVM: x86: Add nested virtualization support for MPX")

Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:57 +02:00
Paolo Bonzini
4679b61f26 KVM: x86: never trap MSR_KERNEL_GS_BASE
KVM has an old optimization whereby accesses to the kernel GS base MSR
are trapped when the guest is in 32-bit and not when it is in 64-bit mode.
The idea is that swapgs is not available in 32-bit mode, thus the
guest has no reason to access the MSR unless in 64-bit mode and
32-bit applications need not pay the price of switching the kernel GS
base between the host and the guest values.

However, this optimization adds complexity to the code for little
benefit (these days most guests are going to be 64-bit anyway) and in fact
broke after commit 678e315e78 ("KVM: vmx: add dedicated utility to
access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base
can be corrupted across SMIs and UEFI Secure Boot is therefore broken
(a secure boot Linux guest, for example, fails to reach the login prompt
about half the time).  This patch just removes the optimization; the
kernel GS base MSR is now never trapped by KVM, similarly to the FS and
GS base MSRs.

Fixes: 678e315e78
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-24 18:34:13 +02:00
Greg Kroah-Hartman
a27fb6d983 This pull request is slightly bigger than usual at this stage, but
I swear I would have sent it the same to Linus!  The main cause for
 this is that I was on vacation until two weeks ago and it took a while
 to sort all the pending patches between 4.19 and 4.20, test them and
 so on.
 
 It's mostly small bugfixes and cleanups, mostly around x86 nested
 virtualization.  One important change, not related to nested
 virtualization, is that the ability for the guest kernel to trap CPUID
 instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now
 masked by default.  This is because the feature is detected through an
 MSR; a very bad idea that Intel seems to like more and more.  Some
 applications choke if the other fields of that MSR are not initialized
 as on real hardware, hence we have to disable the whole MSR by default,
 as was the case before Linux 4.12.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbpPo1AAoJEL/70l94x66DdxgH/is0qe6ZBtzb6Qc0W+8mHHD7
 nxIkWAs2V5NsouJ750YwRQ+0Ym407+wlNt30acdBUEoXhrnA5/TvyGq999XvCL96
 upWEIxpIgbvTMX/e2nLhe4wQdhsboUK4r0/B9IFgVFYrdCt5uRXjB2G4ewxcqxL/
 GxxqrAKhaRsbQG9Xv0Fw5Vohh/Ls6fQDJcyuY1EBnbMpVenq2QDLI6cOAPXncyFb
 uLN6ov4GNCWIPckwxejri5XhZesUOsafrmn48sApShh4T6TrisrdtSYdzl+DGza+
 j5vhIEwdFO5kulZ3viuhqKJOnS2+F6wvfZ75IKT0tEKeU2bi+ifGDyGRefSF6Q0=
 =YXLw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Paolo writes:
  "It's mostly small bugfixes and cleanups, mostly around x86 nested
   virtualization.  One important change, not related to nested
   virtualization, is that the ability for the guest kernel to trap
   CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
   now masked by default.  This is because the feature is detected
   through an MSR; a very bad idea that Intel seems to like more and
   more.  Some applications choke if the other fields of that MSR are
   not initialized as on real hardware, hence we have to disable the
   whole MSR by default, as was the case before Linux 4.12."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
  KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
  kvm: selftests: Add platform_info_test
  KVM: x86: Control guest reads of MSR_PLATFORM_INFO
  KVM: x86: Turbo bits in MSR_PLATFORM_INFO
  nVMX x86: Check VPID value on vmentry of L2 guests
  nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
  KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
  KVM: VMX: check nested state and CR4.VMXE against SMM
  kvm: x86: make kvm_{load|put}_guest_fpu() static
  x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
  KVM: VMX: use preemption timer to force immediate VMExit
  KVM: VMX: modify preemption timer bit only when arming timer
  KVM: VMX: immediately mark preemption timer expired only for zero value
  KVM: SVM: Switch to bitmap_zalloc()
  KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
  kvm: selftests: use -pthread instead of -lpthread
  KVM: x86: don't reset root in kvm_mmu_setup()
  kvm: mmu: Don't read PDPTEs when paging is not enabled
  x86/kvm/lapic: always disable MMIO interface in x2APIC mode
  KVM: s390: Make huge pages unavailable in ucontrol VMs
  ...
2018-09-21 16:21:42 +02:00
Liran Alon
26b471c7e2 KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set
their return value in "r" local var and break out of switch block
when they encounter some error.
This is because vcpu_load() is called before the switch block which
have a proper cleanup of vcpu_put() afterwards.

However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return
immediately on error without performing above mentioned cleanup.

Thus, change these handlers to behave as expected.

Fixes: 8fcc4b5923 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")

Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 18:54:08 +02:00
Drew Schmitt
6fbbde9a19 KVM: x86: Control guest reads of MSR_PLATFORM_INFO
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:46 +02:00
Drew Schmitt
d84f1cff90 KVM: x86: Turbo bits in MSR_PLATFORM_INFO
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
the CPUID faulting bit was settable. But now any bit in
MSR_PLATFORM_INFO would be settable. This can be used, for example, to
convey frequency information about the platform on which the guest is
running.

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:46 +02:00
Krish Sadhukhan
ba8e23db59 nVMX x86: Check VPID value on vmentry of L2 guests
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
following check needs to be enforced on vmentry of L2 guests:

    If the 'enable VPID' VM-execution control is 1, the value of the
    of the VPID VM-execution control field must not be 0000H.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:45 +02:00
Krish Sadhukhan
6de84e581c nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
According to section "Checks on VMX Controls" in Intel SDM vol 3C,
the following check needs to be enforced on vmentry of L2 guests:

   - Bits 5:0 of the posted-interrupt descriptor address are all 0.
   - The posted-interrupt descriptor address does not set any bits
     beyond the processor's physical-address width.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:44 +02:00
Liran Alon
e6c67d8cf1 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
it is possible for a vCPU to be blocked while it is in guest-mode.

According to Intel SDM 26.6.5 Interrupt-Window Exiting and
Virtual-Interrupt Delivery: "These events wake the logical processor
if it just entered the HLT state because of a VM entry".
Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
should be waken from the HLT state and injected with the interrupt.

In addition, if while the vCPU is blocked (while it is in guest-mode),
it receives a nested posted-interrupt, then the vCPU should also be
waken and injected with the posted interrupt.

To handle these cases, this patch enhances kvm_vcpu_has_events() to also
check if there is a pending interrupt in L2 virtual APICv provided by
L1. That is, it evaluates if there is a pending virtual interrupt for L2
by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
Evaluation of Pending Interrupts.

Note that this also handles the case of nested posted-interrupt by the
fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
kvm_vcpu_running() -> vmx_check_nested_events() ->
vmx_complete_nested_posted_interrupt().

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:44 +02:00
Paolo Bonzini
5bea5123cb KVM: VMX: check nested state and CR4.VMXE against SMM
VMX cannot be enabled under SMM, check it when CR4 is set and when nested
virtualization state is restored.

This should fix some WARNs reported by syzkaller, mostly around
alloc_shadow_vmcs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:43 +02:00
Sebastian Andrzej Siewior
822f312d47 kvm: x86: make kvm_{load|put}_guest_fpu() static
The functions
	kvm_load_guest_fpu()
	kvm_put_guest_fpu()

are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit
e5bb40251a ("KVM: Drop kvm_{load,put}_guest_fpu() exports").

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:43 +02:00
Vitaly Kuznetsov
a1efa9b700 x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
These structures are going to be used from KVM code so let's make
their names reflect their Hyper-V origin.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:42 +02:00
Sean Christopherson
d264ee0c2e KVM: VMX: use preemption timer to force immediate VMExit
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest.  Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI.  This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).

When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI.  Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed.  But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0.  Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.

For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing.  The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter.  Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.

Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate.  There are no known errata affecting timer value of '0'.

[1] I/O SMIs also have guarantees on when they arrive, but I have
    no idea if/how those are emulated in KVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:42 +02:00
Sean Christopherson
f459a707ed KVM: VMX: modify preemption timer bit only when arming timer
Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths.  For example, the preemption timer can be used to force an
immediate VMExit.  Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:41 +02:00
Sean Christopherson
4c008127e4 KVM: VMX: immediately mark preemption timer expired only for zero value
A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest.  This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.

Whether by design or coincidence, commit f4124500c2 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit.  Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.

Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:46 +02:00
Andy Shevchenko
a101c9d63e KVM: SVM: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:45 +02:00
Tianyu Lan
9a9845867c KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
This patch is to fix the commit.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:45 +02:00
Wei Yang
83b20b28c6 KVM: x86: don't reset root in kvm_mmu_setup()
Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.

    kvm_vm_ioctl_create_vcpu()
        kvm_arch_vcpu_create()
            vmx_create_vcpu()
                kvm_vcpu_init()
                    kvm_arch_vcpu_init()
                        kvm_mmu_create()
        kvm_arch_vcpu_setup()
            kvm_mmu_setup()
                kvm_init_mmu()

This patch set reset_roots to false in kmv_mmu_setup().

Fixes: 50c28f21d0
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:44 +02:00
Junaid Shahid
d35b34a9a7 kvm: mmu: Don't read PDPTEs when paging is not enabled
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
CR4.PAE = 1.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:43 +02:00
Vitaly Kuznetsov
d176620277 x86/kvm/lapic: always disable MMIO interface in x2APIC mode
When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):

PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014

The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.

When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.

Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.

The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:43 +02:00
Greg Kroah-Hartman
4ca719a338 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Crypto stuff from Herbert:
  "This push fixes a potential boot hang in ccp and an incorrect
   CPU capability check in aegis/morus on x86."

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
  crypto: ccp - add timeout support in the SEV command
2018-09-19 08:29:42 +02:00
Paolo Bonzini
1795f81f61 Second set of PPC KVM fixes for 4.19
Two fixes for KVM on POWER machines.  Both of these relate to memory
 corruption and host crashes seen when transparent huge pages are
 enabled.  The first fixes a host crash that can occur when a DMA
 mapping is removed by the guest and the page mapped was part of a
 transparent huge page; the second fixes corruption that could occur
 when a hypervisor page fault for a radix guest is being serviced at
 the same time that the backing page is being collapsed or split.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbmFIyAAoJEJ2a6ncsY3Gf+L8H/jRQ0ONUpv2xrgirXdPmfuVv
 xIVejn5chiygpo3ZY2YkRGjqMoX8usA5pDQONk9duoc48FedSjjmurfAkSA8NESI
 y6DSRGB6pir/reP/7tBVk0eeeMBjbYnHPA7KfI8ijK424VmRpCT5stiUm7gQvSEm
 LSRUSLwWKfCCjU78HVtiTuK865WZifrOCy6wiNEl79F1K6T1A+LeGaKrcDLjeK/Q
 GsNSbwBK37BOvcsm0W1xrlnCmYtR/nVrhjTFMc5noBuc4znQd3wxitgiInFsOH5V
 LUWL6IStFkbGKSxVZuJilkhVF58AAisrJnwvlZsjrExWYf1J42kbyvVoURt0O8I=
 =blkZ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

Second set of PPC KVM fixes for 4.19

Two fixes for KVM on POWER machines.  Both of these relate to memory
corruption and host crashes seen when transparent huge pages are
enabled.  The first fixes a host crash that can occur when a DMA
mapping is removed by the guest and the page mapped was part of a
transparent huge page; the second fixes corruption that could occur
when a hypervisor page fault for a radix guest is being serviced at
the same time that the backing page is being collapsed or split.
2018-09-18 15:13:00 +02:00
Paolo Bonzini
cb5fb87a2f KVM: s390: Fixes for 4.19
- more fallout from the hugetlbfs enablement
 - bugfix for vma handling
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbm68DAAoJEBF7vIC1phx8i3cP/R3UwXGrTq3u5+78PfUu+Nvm
 zTSgHID25/pCsytvRcK7SSJQt+h53RjuiQlOcrtSJZvq3Btt23gl+pFltqvWGZDM
 PYgy+LaaaoIXNeOJqwkwcnNsq4usQ0N1jQ8EgNUxEc2EBkffA6mKcTS0lWDm47AE
 Yxsyz0RsvwcbYCOj10s1b4D9h97+ZkqAC2RDnafzQOb54j/aOH7blsStNDXfdkte
 xHJ7mELOwAWre2QhD7OeTX1aCjBfKUJzWK2sPxm7qt5nEtH18o9F6j2Y/d8dnWg8
 kqthyyb+nH0aHgv3G+7sKBOp1Ra7ftHAVrrbh3m9hceaW5+VuEhWSN6zVZKVdWta
 4bjiWa9I1gKS/Wz479ztdbpt+koRAYSg5CLHAD7YsBR3hHtnAZwuRO38S4Xg+Tfb
 Hp4Lhf8gnq9yQqrcyfKWQKtU9mb84oNtEw/wQPUvr2tZyeoB5NbnVy5UDa9V1DRS
 FnF1/MIZ/MszkwbNEnhk+WWFy41h+uduizfXwQ/YO/xciEEdM2TaDHuLIpBegiSO
 4kFfHRSMxqbwNhQSsJI8dIFpSHqXrtn2p6JTsWdi94B0QBYaNm4XMNpgEZIs5CsE
 aNGhQa+IZ5jk6YsGMx4mqAmV2jN9bMEoKR/SaOvrSkPvAO9QxZP4V6W72YGoi6le
 RZK8Z+HGxg6qpeaAiG0t
 =TT85
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 4.19

- more fallout from the hugetlbfs enablement
- bugfix for vma handling
2018-09-18 15:12:51 +02:00
Greg Kroah-Hartman
5211da9ca5 Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net
Dave writes:
  "Various fixes, all over the place:

   1) OOB data generation fix in bluetooth, from Matias Karhumaa.

   2) BPF BTF boundary calculation fix, from Martin KaFai Lau.

   3) Don't bug on excessive frags, to be compatible in situations mixing
      older and newer kernels on each end.  From Juergen Gross.

   4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger.

   5) Zero keying information in TLS layer before freeing copies
      of them, from Sabrina Dubroca.

   6) Fix NULL deref in act_sample, from Davide Caratti.

   7) Orphan SKB before GRO in veth to prevent crashes with XDP,
      from Toshiaki Makita.

   8) Fix use after free in ip6_xmit, from Eric Dumazet.

   9) Fix VF mac address regression in bnxt_en, from Micahel Chan.

   10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann.

   11) Programming adjustments to r8169 which fix not being to enter deep
       sleep states on some machines, from Kai-Heng Feng and Hans de
       Goede.

   12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter
       Oskolkov."

* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits)
  net/ipv6: do not copy dst flags on rt init
  qmi_wwan: set DTR for modems in forced USB2 mode
  clk: x86: Stop marking clocks as CLK_IS_CRITICAL
  r8169: Get and enable optional ether_clk clock
  clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
  r8169: enable ASPM on RTL8106E
  r8169: Align ASPM/CLKREQ setting function with vendor driver
  Revert "kcm: remove any offset before parsing messages"
  kcm: remove any offset before parsing messages
  net: ethernet: Fix a unused function warning.
  net: dsa: mv88e6xxx: Fix ATU Miss Violation
  tls: fix currently broken MSG_PEEK behavior
  hv_netvsc: pair VF based on serial number
  PCI: hv: support reporting serial number as slot information
  bnxt_en: Fix VF mac address regression.
  ipv6: fix possible use-after-free in ip6_xmit()
  net: hp100: fix always-true check for link up state
  ARM: dts: at91: add new compatibility string for macb on sama5d3
  net: macb: disable scatter-gather for macb on sama5d3
  net: mvpp2: let phylink manage the carrier state
  ...
2018-09-18 09:31:53 +02:00
Nicolas Ferre
321cc359d8 ARM: dts: at91: add new compatibility string for macb on sama5d3
We need this new compatibility string as we experienced different behavior
for this 10/100Mbits/s macb interface on this particular SoC.
Backward compatibility is preserved as we keep the alternative strings.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 07:54:40 -07:00
Linus Torvalds
27c5a778df Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingol Molnar:
 "Misc fixes:

   - EFI crash fix

   - Xen PV fixes

   - do not allow PTI on 2-level 32-bit kernels for now

   - documentation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/APM: Fix build warning when PROC_FS is not enabled
  Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
  x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
  x86/xen: Disable CPU0 hotplug for Xen PV
  x86/EISA: Don't probe EISA bus for Xen PV guests
  x86/doc: Fix Documentation/x86/earlyprintk.txt
2018-09-15 08:02:46 -10:00
Linus Torvalds
c0be92b5b1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also breakpoint and x86 PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  perf tools: Fix maps__find_symbol_by_name()
  tools headers uapi: Update tools's copy of linux/if_link.h
  tools headers uapi: Update tools's copy of linux/vhost.h
  tools headers uapi: Update tools's copies of kvm headers
  tools headers uapi: Update tools's copy of drm/drm.h
  tools headers uapi: Update tools's copy of asm-generic/unistd.h
  tools headers uapi: Update tools's copy of linux/perf_event.h
  perf/core: Force USER_DS when recording user stack data
  perf/UAPI: Clearly mark __PERF_SAMPLE_CALLCHAIN_EARLY as internal use
  perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs
  perf annotate: Fix parsing aarch64 branch instructions after objdump update
  perf probe powerpc: Ignore SyS symbols irrespective of endianness
  perf event-parse: Use fixed size string for comms
  perf util: Fix bad memory access in trace info.
  perf tools: Streamline bpf examples and headers installation
  perf evsel: Fix potential null pointer dereference in perf_evsel__new_idx()
  perf arm64: Fix include path for asm-generic/unistd.h
  perf/hw_breakpoint: Simplify breakpoint enable in perf_event_modify_breakpoint
  perf/hw_breakpoint: Enable breakpoint in modify_user_hw_breakpoint
  perf/hw_breakpoint: Remove superfluous bp->attr.disabled = 0
  ...
2018-09-15 06:44:32 -10:00
Randy Dunlap
002b87d2aa x86/APM: Fix build warning when PROC_FS is not enabled
Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:

../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
 static int proc_apm_show(struct seq_file *m, void *v)

Fixes: 3f3942aca6 ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org
2018-09-15 10:16:25 +02:00
Linus Torvalds
eae4f8851f Xtensa fixes and cleanups for v4.19:
- don't allocate memory in platform_setup as the memory allocator is not
   initialized at that point yet;
 - remove unnecessary ifeq KBUILD_SRC from arch/xtensa/Makefile;
 - enable SG chaining in arch/xtensa/Kconfig.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAlucAIETHGpjbXZia2Jj
 QGdtYWlsLmNvbQAKCRBR+cyR+D+gRNAAD/wME84VqljbhQpzUNmYTb7d5Pst8I3Z
 Mwbx4DqDaCAPu8trhMUtSeV6QBal9xxlyl0b7KWrkb5uX1L9ZdaanMC1W9be6f7B
 DiwKLrGOhnv0kyLZ4/Rrni6GN90TUo1/2UKyJfaNKJoaqojbLyOY8vGtdtxcQlvT
 bQPQ1wPRdfiRTDZkW6dygFF6tY2VT6ao3NqapLhvFUqsm7pzKW5Y1jVkY7PTo8t5
 Qk1XyXdCdeoTUqvP12mAe4mst0yGelLPj8phTkXt7OZUXaubky+Rp/enSLhcJdpu
 HJKFC7lInh/C6IroINFYErndckSK19mOTm2SUq6R5saR8R/6OMniilaVJBvfm1BH
 HhO/6Gzy6K9XQ0VscNAClxBEU7AKbalvHyfI8WcQC+y5Zqpa9nSk9jG1V9U4mLs/
 cnL0w+qJ3nh9mJBB/+14S7GwXITu0DlZEOxhsW6xblr/nfmFmvyFSug7i+WAu2Y3
 Y4KD9tM7+tBnztggblUPQQm+orPAfpXZDZkGU+e7du0BWZikwcjWeIar2GtFGODz
 vtxKIIZyvuAOY4M6Ld+2U9GTcIdOhc+orzJFuNAqbzPnez+0chjFp8OYCj/Fdgj2
 YhbiYcMxJRAFdrbcWhHMlmXOCkUrvKyDY+bpx9TzNXyZp0qb290uGshCxM9/q7CR
 UB/JZ/up28v5jw==
 =YJfc
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20180914' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes and cleanups from Max Filippov:

 - don't allocate memory in platform_setup as the memory allocator is
   not initialized at that point yet;

 - remove unnecessary ifeq KBUILD_SRC from arch/xtensa/Makefile;

 - enable SG chaining in arch/xtensa/Kconfig.

* tag 'xtensa-20180914' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: enable SG chaining in Kconfig
  xtensa: remove unnecessary KBUILD_SRC ifeq conditional
  xtensa: ISS: don't allocate memory in platform_setup
2018-09-14 12:56:42 -10:00
Linus Torvalds
3e153256d9 arm64 fixes
- Fix ioport_map() mapping the wrong physical address for some I/O BARs
 
 - Remove direct use of "asm goto", since some compilers don't like that
 
 - Ensure kimage_voffset is always present in vmcoreinfo PT_NOTE
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJbm68LAAoJELescNyEwWM0Zt4H/2nhsMtmQCLNXvA9gd1SEBba
 Jrw4J+apYQBG7F7aY5mUZrNtCXECeNr1ukyxTEDMuSB0oTKWTFsQ52Sz4chsjl5u
 JdNzWRq8LtkWbxxmnzTcYvjHZeDfDx/GQqAHJp047NBI4+6GxXVuEbrPPFDvQuE4
 xvMhGpIg9R/jfkhmRKCstalp6aQPDr+Glz/Z+HwyXtuu8A5Q8uCUrlTflHXNVsrn
 5GtR69eTseFb/UJtLbi7mesKP/awMm8wBub4wSs/5zKLucgBkTdOCWOjPtQZ6u7F
 7hwTBk4ZLihdw2fMNOHGf8iZgPvXV+zVfDPtZWfJGlFJpX8FefbRDVZkSGxXHVw=
 =dLyK
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The trickle of arm64 fixes continues to come in.

  Nothing that's the end of the world, but we've got a fix for PCI IO
  port accesses, an accidental naked "asm goto" and a fix to the
  vmcoreinfo PT_NOTE merged this time around which we'd like to get
  sorted before it becomes ABI.

   - Fix ioport_map() mapping the wrong physical address for some I/O
     BARs

   - Remove direct use of "asm goto", since some compilers don't like
     that

   - Ensure kimage_voffset is always present in vmcoreinfo PT_NOTE"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  asm-generic: io: Fix ioport_map() for !CONFIG_GENERIC_IOMAP && CONFIG_INDIRECT_PIO
  arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE
  arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto"
2018-09-14 12:42:02 -10:00
Joerg Roedel
61a6bd83ab Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
This reverts commit 1f40a46cf4.

It turned out that this patch is not sufficient to enable PTI on 32 bit
systems with legacy 2-level page-tables. In this paging mode the huge-page
PTEs are in the top-level page-table directory, where also the mirroring to
the user-space page-table happens. So every huge PTE exits twice, in the
kernel and in the user page-table.

That means that accessed/dirty bits need to be fetched from two PTEs in
this mode to be safe, but this is not trivial to implement because it needs
changes to generic code just for the sake of enabling PTI with 32-bit
legacy paging. As all systems that need PTI should support PAE anyway,
remove support for PTI when 32-bit legacy paging is used.

Fixes: 7757d607c6 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
2018-09-14 17:08:45 +02:00
Ondrej Mosnacek
24568b47d4 crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
It turns out OSXSAVE needs to be checked only for AVX, not for SSE.
Without this patch the affected modules refuse to load on CPUs with SSE2
but without AVX support.

Fixes: 877ccce7cb ("crypto: x86/aegis,morus - Fix and simplify CPUID checks")
Cc: <stable@vger.kernel.org> # 4.18
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-14 14:08:27 +08:00
Linus Torvalds
72d4c6e589 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel
Pull hexagon fixes from Richard Kuo:
 "Some fixes for compile warnings"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
  hexagon: modify ffs() and fls() to return int
  arch/hexagon: fix kernel/dma.c build warning
2018-09-13 16:33:26 -10:00
Linus Torvalds
1d176582c7 s390 fixes for 4.19-rc4
One fix for the zcrypt driver to correctly handle incomplete
 encryption/decryption operations.
 
 A cleanup for the aqmask/apmask parsing to avoid variable
 length arrays on the stack.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbmkYUAAoJEDjwexyKj9rgrugH/21Uf1S0mkkRfDeYxD6lJva6
 zNmEZ3V+GXc8L/0CisFWQOcIU1fO+jozp9HPGkQxeTvAuqIfVhBRVoMMxiTRaZb3
 xdSBel8EvAGxlZq/6eq6fU59HPWGm+N53rC9J5MMQQqgpSmq8F2QeO5CoidflRh8
 bdio9cliLsjPu+3P2JU3noolhb/f577J3dgP4gKARRpOfh8vUI7NLU3Mham+7886
 ASjG8s/zr9spPnrErJusloOLDJt4M94J8KrIbB/WAT1wZv7GxClaGsCoyCuk4cDt
 TV8zIgMK9TChedwMOO0T8WMaxq+XJV+iI0dC1eZ7xNUlaIBw+UbifxCG335L3E0=
 =dHq1
 -----END PGP SIGNATURE-----

Merge tag 's390-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

 - One fix for the zcrypt driver to correctly handle incomplete
   encryption/decryption operations.

 - A cleanup for the aqmask/apmask parsing to avoid variable length
   arrays on the stack.

* tag 's390-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: remove VLA usage from the AP bus
  s390/crypto: Fix return code checking in cbc_paes_crypt()
2018-09-13 16:22:24 -10:00
Linus Torvalds
67b076095d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix up several Kconfig dependencies in netfilter, from Martin Willi
    and Florian Westphal.

 2) Memory leak in be2net driver, from Petr Oros.

 3) Memory leak in E-Switch handling of mlx5 driver, from Raed Salem.

 4) mlx5_attach_interface needs to check for errors, from Huy Nguyen.

 5) tipc_release() needs to orphan the sock, from Cong Wang.

 6) Need to program TxConfig register after TX/RX is enabled in r8169
    driver, not beforehand, from Maciej S. Szmigiero.

 7) Handle 64K PAGE_SIZE properly in ena driver, from Netanel Belgazal.

 8) Fix crash regression in ip_do_fragment(), from Taehee Yoo.

 9) syzbot can create conditions where kernel log is flooded with
    synflood warnings due to creation of many listening sockets, fix
    that. From Willem de Bruijn.

10) Fix RCU issues in rds socket layer, from Cong Wang.

11) Fix vlan matching in nfp driver, from Pieter Jansen van Vuuren.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
  nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
  nfp: flower: fix vlan match by checking both vlan id and vlan pcp
  tipc: check return value of __tipc_dump_start()
  s390/qeth: don't dump past end of unknown HW header
  s390/qeth: use vzalloc for QUERY OAT buffer
  s390/qeth: switch on SG by default for IQD devices
  s390/qeth: indicate error when netdev allocation fails
  rds: fix two RCU related problems
  r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
  erspan: fix error handling for erspan tunnel
  erspan: return PACKET_REJECT when the appropriate tunnel is not found
  tcp: rate limit synflood warnings further
  MIPS: lantiq: dma: add dev pointer
  netfilter: xt_hashlimit: use s->file instead of s->private
  netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
  netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
  netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
  netfilter: conntrack: reset tcp maxwin on re-register
  qmi_wwan: Support dynamic config on Quectel EP06
  ethernet: renesas: convert to SPDX identifiers
  ...
2018-09-12 17:32:50 -10:00
Guenter Roeck
cf40361ede x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
Commit eeb89e2bb1 ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
moved loading the fixmap in efi_call_phys_epilog() after load_cr3() since
it was assumed to be more logical.

Turns out this is incorrect: In efi_call_phys_prolog(), the gdt with its
physical address is loaded first, and when the %cr3 is reloaded in _epilog
from initial_page_table to swapper_pg_dir again the gdt is no longer
mapped.  This results in a triple fault if an interrupt occurs after
load_cr3() and before load_fixmap_gdt(0). Calling load_fixmap_gdt(0) first
restores the execution order prior to commit eeb89e2bb1 and fixes the
problem.

Fixes: eeb89e2bb1 ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-efi@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1536689892-21538-1-git-send-email-linux@roeck-us.net
2018-09-12 21:53:34 +02:00
Juergen Gross
999696752d x86/xen: Disable CPU0 hotplug for Xen PV
Xen PV guests don't allow CPU0 hotplug, so disable it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20180912174122.24282-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-12 21:15:02 +02:00
Linus Torvalds
96eddb810b RISC-V: A single fix for 4.19-rc3
This tag contains what I hope to be the last RISC-V patch for 4.19.  It
 fixes a bug in our initramfs support by removing some broken and
 obselete code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAluPHykTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQX7iD/41XF5oXHJeRUfhiHVa/kiqFaCw1aEx
 YPp5escHEypGshWQJGd+ite5cEz0nrggsbmXJrQnfpU8fqgpkvguaIbOb9JAtOdj
 Y5hjQ5QgiQcsUrLnhy7yK62fpC27WwWPGfT73cLRgir2oDEI3F7CkaA0uX3y2kLF
 9TEN2v+DL+89Y/Rq9mzRwwPOryZRNXZkxI6tqTVa7wZZzi7fSUMCG2msjeZRszQe
 0IPyBtVR7OECzEaRwSETgC05KTFxCQ2JHMjHz1TatjvJmGU3ToP0uRZ1oYXDXcR3
 AM2QfjBQDmBOjRRKBbwaiUzfX209eGrn/JK3j6BZZredX9MCP+qduuQV+7GsvRT2
 ryCoWN56AAIMZJvmp57lG9jfDptxnS6zZCqw+mufsD1s3c/78zUv7Q5PPUdcfzuP
 qt7iVdUUP5QWDFgM0QumeZ9JuekoA0Kpsmg4Nq6M6YHimW63Y2+CJPqPfh1oY93t
 UoabFgz7FZ0WLo1jHtGVteihq78SKxTe4WYEDzjH++qrVPuYnbNH3Hfqwynj6Wsy
 fvNxmnjg1AVhD9MPSBJLDbQivxW4pEwuxV99MpwLhVdwGXDTAgt9t9mUP5xAaLna
 60jszx1GM8HVMeQ0LNAGRWa8FH0bvn2kpLOBjvdMHl8Y/Oq/IuaINRJCKq59j/4X
 Qx963ajYY5QUHg==
 =XdJZ
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V fix from Palmer Dabbelt:
 "This contains what I hope to be the last RISC-V patch for 4.19.

  It fixes a bug in our initramfs support by removing some broken and
  obselete code"

* tag 'riscv-for-linus-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  riscv: Do not overwrite initrd_start and initrd_end
2018-09-12 06:51:27 -10:00
Janosch Frank
40ebdb8e59 KVM: s390: Make huge pages unavailable in ucontrol VMs
We currently do not notify all gmaps when using gmap_pmdp_xchg(), due
to locking constraints. This makes ucontrol VMs, which is the only VM
type that creates multiple gmaps, incompatible with huge pages. Also
we would need to hold the guest_table_lock of all gmaps that have this
vmaddr maped to synchronize access to the pmd.

ucontrol VMs are rather exotic and creating a new locking concept is
no easy task. Hence we return EINVAL when trying to active
KVM_CAP_S390_HPAGE_1M and report it as being not available when
checking for it.

Fixes: a4499382 ("KVM: s390: Add huge page enablement control")
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20180801112508.138159-1-frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-09-12 14:46:37 +02:00
Janosch Frank
1843abd032 s390/mm: Check for valid vma before zapping in gmap_discard
Userspace could have munmapped the area before doing unmapping from
the gmap. This would leave us with a valid vmaddr, but an invalid vma
from which we would try to zap memory.

Let's check before using the vma.

Fixes: 1e133ab296 ("s390/mm: split arch/s390/mm/pgtable.c")
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20180816082432.78828-1-frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-09-12 14:46:37 +02:00
Hauke Mehrtens
2d946e5bcd MIPS: lantiq: dma: add dev pointer
dma_zalloc_coherent() now crashes if no dev pointer is given.
Add a dev pointer to the ltq_dma_channel structure and fill it in the
driver using it.

This fixes a bug introduced in kernel 4.19.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:33:19 -07:00
Max Filippov
4a7f50f78c xtensa: enable SG chaining in Kconfig
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-09-11 22:12:59 -07:00
Masahiro Yamada
8e966fab8e xtensa: remove unnecessary KBUILD_SRC ifeq conditional
You can always prefix variant/platform header search paths with
$(srctree)/ because $(srctree) is '.' for in-tree building.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-09-11 22:09:48 -07:00
Nicholas Piggin
71d29f43b6 KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size
THP paths can defer splitting compound pages until after the actual
remap and TLB flushes to split a huge PMD/PUD. This causes radix
partition scope page table mappings to get out of synch with the host
qemu page table mappings.

This results in random memory corruption in the guest when running
with THP. The easiest way to reproduce is use KVM balloon to free up
a lot of memory in the guest and then shrink the balloon to give the
memory back, while some work is being done in the guest.

Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-09-12 08:50:50 +10:00
Alexey Kardashevskiy
425333bf3a KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
which in turn reads the old TCE and if it was a valid entry, marks
the physical page dirty if it was mapped for writing. Since it is in
real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
to get the page struct. However SetPageDirty() itself reads the compound
page head and returns a virtual address for the head page struct and
setting dirty bit for that kills the system.

This adds additional dirty bit tracking into the MM/IOMMU API for use
in the real mode. Note that this does not change how VFIO and
KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
- use the lowest bit of the cached host phys address to carry
the dirty bit;
- mark pages dirty when they are unpinned which happens when
the preregistered memory is released which always happens in virtual
mode;
- add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
- change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
in the new mm_iommu_ua_mark_dirty_rm() helper;
- move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
caller anyway) to reduce the real mode KVM and IOMMU knowledge
across different subsystems.

This removes realmode_pfn_to_page() as it is not used anymore.

While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
the real mode only and modules cannot call it anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-09-12 08:49:54 +10:00