Commit Graph

1015817 Commits

Author SHA1 Message Date
Ricardo Koller
6f2f86ec28 KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc
Commit a75a895e64 ("KVM: selftests: Unconditionally use memslot 0 for
vaddr allocations") removed the memslot parameters from vm_vaddr_alloc.
It addressed all callers except one under lib/aarch64/, due to a race
with commit e3db7579ef ("KVM: selftests: Add exception handling
support for aarch64")

Fix the vm_vaddr_alloc call in lib/aarch64/processor.c.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Message-Id: <20210702201042.4036162-1-ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15 10:19:41 -04:00
Pavel Skripkin
004d62eb4e kvm: debugfs: fix memory leak in kvm_create_vm_debugfs
In commit bc9e9e672d ("KVM: debugfs: Reuse binary stats descriptors")
loop for filling debugfs_stat_data was copy-pasted 2 times, but
in the second loop pointers are saved over pointers allocated
in the first loop.  All this causes is a memory leak, fix it.

Fixes: bc9e9e672d ("KVM: debugfs: Reuse binary stats descriptors")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210701195500.27097-1-paskripkin@gmail.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15 10:19:40 -04:00
Like Xu
7234c362cc KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
The AMD platform does not support the functions Ah CPUID leaf. The returned
results for this entry should all remain zero just like the native does:

AMD host:
   0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
(uncanny) AMD guest:
   0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000

Fixes: cadbaa039b ("perf/x86/intel: Make anythread filter support conditional")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20210628074354.33848-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:56 -04:00
Kefeng Wang
23fa2e46a5 KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio
BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269

CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x110/0x164 lib/dump_stack.c:118
 print_address_description+0x78/0x5c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report+0x148/0x1e4 mm/kasan/report.c:562
 check_memory_region_inline mm/kasan/generic.c:183 [inline]
 __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
 kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Allocated by task 4269:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
 kmem_cache_alloc_trace include/linux/slab.h:450 [inline]
 kmalloc include/linux/slab.h:552 [inline]
 kzalloc include/linux/slab.h:664 [inline]
 kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146
 kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Freed by task 4269:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track+0x38/0x6c mm/kasan/common.c:56
 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kfree+0x104/0x38c mm/slub.c:4124
 coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102
 kvm_iodevice_destructor include/kvm/iodev.h:61 [inline]
 kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374
 kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186
 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor()
inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list
and free the dev, but kvm_iodevice_destructor() is called again, it will lead
the above issue.

Let's check the the return value of kvm_io_bus_unregister_dev(), only call
kvm_iodevice_destructor() if the return value is 0.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 5d3c4c7938 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:56 -04:00
Sean Christopherson
76ff371b67 KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware.  Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.

Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.

  Note that because guest physical addresses are always translated
  through the nested page tables, the size of the guest physical address
  space is not impacted by any physical address space reduction indicated
  in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
  the guest physical address space is effectively reduced by 1 bit.

And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.

  If the C-bit is an address bit, this bit is masked from the guest
  physical address when it is translated through the nested page tables.
  Consequently, the hypervisor does not need to be aware of which pages
  the guest has chosen to mark private.

Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d04 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").

Fixes: 0ede79e132 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210625020354.431829-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:56 -04:00
Sean Christopherson
fc9bf2e087 KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs
Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks.  The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".

For non-SEV guests, not dropping bits is the correct behavior.  Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits.  And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.

For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted.  Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM.  The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.

Fixes: d0ec49d4de ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-5-seanjc@google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:56 -04:00
Sean Christopherson
e39f00f60e KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).

When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.

Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.

Fixes: 9af9b94068 ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:55 -04:00
Sean Christopherson
4bf48e3c0a KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled
Ignore the guest MAXPHYADDR reported by CPUID.0x8000_0008 if TDP, i.e.
NPT, is disabled, and instead use the host's MAXPHYADDR.  Per AMD'S APM:

  Maximum guest physical address size in bits. This number applies only
  to guests using nested paging. When this field is zero, refer to the
  PhysAddrSize field for the maximum guest physical address size.

Fixes: 24c82e576b ("KVM: Sanitize cpuid")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:55 -04:00
Sean Christopherson
f0414b078d Revert "KVM: x86: WARN and reject loading KVM if NX is supported but not enabled"
Let KVM load if EFER.NX=0 even if NX is supported, the analysis and
testing (or lack thereof) for the non-PAE host case was garbage.

If the kernel won't be using PAE paging, .Ldefault_entry in head_32.S
skips over the entire EFER sequence.  Hopefully that can be changed in
the future to allow KVM to require EFER.NX, but the motivation behind
KVM's requirement isn't yet merged.  Reverting and revisiting the mess
at a later date is by far the safest approach.

This reverts commit 8bbed95d2c.

Fixes: 8bbed95d2c ("KVM: x86: WARN and reject loading KVM if NX is supported but not enabled")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210625001853.318148-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:17:55 -04:00
Marc Zyngier
f8f0edabcc KVM: selftests: x86: Address missing vm_install_exception_handler conversions
Commit b78f4a5966 ("KVM: selftests: Rename vm_handle_exception")
raced with a couple of new x86 tests, missing two vm_handle_exception
to vm_install_exception_handler conversions.

Help the two broken tests to catch up with the new world.

Cc: Andrew Jones <drjones@redhat.com>
CC: Ricardo Koller <ricarkol@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20210701071928.2971053-1-maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14 12:15:05 -04:00
Paolo Bonzini
f3cf800778 Merge tag 'kvm-s390-master-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: selftests: Fixes

- provide memory model for  IBM z196 and zEC12
- do not require 64GB of memory
2021-07-14 12:14:27 -04:00
Christian Borntraeger
cd4220d23b KVM: selftests: do not require 64GB in set_memory_region_test
Unless the user sets overcommit_memory or has plenty of swap, the latest
changes to the testcase will result in ENOMEM failures for hosts with
less than 64GB RAM. As we do not use much of the allocated memory, we
can use MAP_NORESERVE to avoid this error.

Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: vkuznets@redhat.com
Cc: wanghaibin.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 309505dd56 ("KVM: selftests: Fix mapping length truncation in m{,un}map()")
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/kvm/20210701160425.33666-1-borntraeger@de.ibm.com/
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-07-06 10:06:20 +02:00
Christian Borntraeger
fb5dad4084 KVM: selftests: introduce P44V64 for z196 and EC12
Older machines like z196 and zEC12 do only support 44 bits of physical
addresses. Make this the default and check via IBC if we are on a later
machine. We then add P47V64 as an additional model.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/kvm/20210701153853.33063-1-borntraeger@de.ibm.com/
Fixes: 1bc603af73 ("KVM: selftests: introduce P47V64 for s390x")
2021-07-06 10:05:16 +02:00
Paolo Bonzini
b8917b4ae4 KVM/arm64 updates for v5.14.
- Add MTE support in guests, complete with tag save/restore interface
 - Reduce the impact of CMOs by moving them in the page-table code
 - Allow device block mappings at stage-2
 - Reduce the footprint of the vmemmap in protected mode
 - Support the vGIC on dumb systems such as the Apple M1
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 - Add selftests for the debug architecture
 - The usual crop of PMU fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDV2bEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDEr8P/ivwROx5NwGcHGmU5RfUCT3aFqhtVHHwD/lu
 jPcgoO61kz9TelOu6QRaVuK+mVHxcq3iP4R8nPq/QCkUlEXTmK2xkyhXhGXSYpH4
 6jM8+BbC3eG7iAxx6H0UM4JTl4Riwat6ZZtXpWEWs9TKqOHOQYFpMkxSttwVZ1CZ
 SjbtFvXLEdzKn6PzUWnKdBNMV/mHsdAtohZit9oJOc4ttc8072XxETQ4TFQ+MSvA
 j9zY9QPmWzgcZnotqRRu9sbTGO2vxtXuUtY3sjdD8+C9OgSe9qvpnNjymcmfwaMu
 1fBkfh65oaO4ItJBdGOUOoEcFqwN5imPiI7CB/O+ZYkO9sBCuTUPSQwPkyiwXb9r
 bUkTaQw2nZiNWsqR1x07fQ2sGYbMp5mnmgmqiV4MUWkLmFp9LZATCWYTTn24cBNS
 6SjVP6/8S0r3EhLnYjH0Pn1we5PooU1EF6RlCAd3ewYoo+9fPnwjNYwIWH5i5wB7
 +tnei44NACAw9cfbos+BYQQ/dY15OSFzLzIMomlabB7OpXOdDg3H6tJnPbFwWwXb
 9nF8XdHqxeDVVVrDCAx1BSodSXm9xqgnQM2RDGTUnpVcAfqAr3MXX6VsyKQDzj8T
 QXF9qOVCBAABv6BXAvSQ6mvMJZDUVbUPEPhf7kXzF46JsRd6A7wWoU/OnMGHQ/w7
 wjvH8HVy
 =fWBV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Paolo Bonzini
79b1e56509 KVM: s390: Features for 5.14
- new HW facilities for guests
 - make inline assembly more robust with KASAN and co
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJg1ZfMAAoJEBF7vIC1phx8uv0P/0glFasUp3GEUWUzjcTycFFf
 SAPiyrk4ucU/8eOJQVWMLPL0pESTgGZkxaa5+rChJA4K00Pf+KWEDRMqNpZ5/eOY
 SVq4XqHUZtRKHWH1z6B7Sfx3GliIAqsEmJz1dOcXp11CxIzumBD9gAHNaYLBqkKt
 5b5UFn/GkyutnL+CEBYVIOXvd1QBrEKOtiIfnIPDZAJCpjUh68lFBjW4SOsd7fz7
 9VDUjZrIRN+CWb/AfWEnInzlBoyjgIbwfxQIKXcpeZsKWpYzQJ+Oti0ZFoKPtfdV
 G7zzgwyPG5vXbJETxBg58M8NddW0Ft+jttz/GJ7NtzWi2a046Mp02Udk47vpL1AW
 DzZgatOQasFP5PBOBpOn460BhuUdYkSrHOXZbRO3/rlrFd7UbJiTBIaV7lYaeZ6T
 nImP5/Rd8NPFPfJB990inFjqyburfA7rCWv8oB2a2n3YduV4bI4t5d71Giz9ibaH
 gm/zWJdIZYHaMvE7sCWiXnStXEs1DEOeMvZpkBpOUf2/DEvfCUIOiaepOnQrl7GW
 jMFACO471PCh7xDvohNxo0tbs59+Ctfglo/gy12yZMtsfpgq4iHP1BrnfOTB6Xig
 rzJT2rSWgPw1nViuZQOqypd8ZhkrfoHZg1xnjwJ7tiWnlFNhpWJkZpqYmz7qtiUn
 W5Svlo06FWoGKfGwvZf6
 =HiDo
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features for 5.14

- new HW facilities for guests
- make inline assembly more robust with KASAN and co
2021-06-25 10:50:11 -04:00
Marc Zyngier
188982cda0 Merge branch kvm-arm64/mmu/mte into kvmarm-master/next
Last minute fix for MTE, making sure the pages are
flagged as MTE before they are released.

* kvm-arm64/mmu/mte:
  KVM: arm64: Set the MTE tag bit before releasing the page

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-25 14:25:56 +01:00
Maxim Levitsky
a01b45e9d3 KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
This better reflects the purpose of this variable on AMD, since
on AMD the AVIC's memory slot can be enabled and disabled dynamically.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210623113002.111448-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:49 -04:00
Aaron Lewis
88213da235 kvm: x86: disable the narrow guest module parameter on unload
When the kvm_intel module unloads the module parameter
'allow_smaller_maxphyaddr' is not cleared because the backing variable is
defined in the kvm module.  As a result, if the module parameter's state
was set before kvm_intel unloads, it will also be set when it reloads.
Explicitly clear the state in vmx_exit() to prevent this from happening.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20210623203426.1891402-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
2021-06-24 18:00:49 -04:00
Aaron Lewis
39bbcc3a4e selftests: kvm: Allows userspace to handle emulation errors.
This test exercises the feature KVM_CAP_EXIT_ON_EMULATION_FAILURE.  When
enabled, errors in the in-kernel instruction emulator are forwarded to
userspace with the instruction bytes stored in the exit struct for
KVM_EXIT_INTERNAL_ERROR.  So, when the guest attempts to emulate an
'flds' instruction, which isn't able to be emulated in KVM, instead
of failing, KVM sends the instruction to userspace to handle.

For this test to work properly the module parameter
'allow_smaller_maxphyaddr' has to be set.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20210510144834.658457-3-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:48 -04:00
Aaron Lewis
19238e75bd kvm: x86: Allow userspace to handle emulation errors
Add a fallback mechanism to the in-kernel instruction emulator that
allows userspace the opportunity to process an instruction the emulator
was unable to.  When the in-kernel instruction emulator fails to process
an instruction it will either inject a #UD into the guest or exit to
userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
not know how to proceed in an appropriate manner.  This feature lets
userspace get involved to see if it can figure out a better path
forward.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210510144834.658457-2-aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:48 -04:00
Sean Christopherson
27de925044 KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
Let the guest use 1g hugepages if TDP is enabled and the host supports
GBPAGES, KVM can't actively prevent the guest from using 1g pages in this
case since they can't be disabled in the hardware page walker.  While
injecting a page fault if a bogus 1g page is encountered during a
software page walk is perfectly reasonable since KVM is simply honoring
userspace's vCPU model, doing so arguably doesn't provide any meaningful
value, and at worst will be horribly confusing as the guest will see
inconsistent behavior and seemingly spurious page faults.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-55-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:48 -04:00
Sean Christopherson
9a65d0b70f KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
Use the current MMU instead of vCPU state to query CR4.SMEP when handling
a page fault.  In the nested NPT case, the current CR4.SMEP reflects L2,
whereas the page fault is shadowing L1's NPT, which uses L1's hCR4.
Practically speaking, this is a nop a NPT walks are always user faults,
i.e. this code will never be reached, but fix it up for consistency.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-54-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:48 -04:00
Sean Christopherson
fdaa293598 KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
Use the current MMU instead of vCPU state to query CR0.WP when handling
a page fault.  In the nested NPT case, the current CR0.WP reflects L2,
whereas the page fault is shadowing L1's NPT.  Practically speaking, this
is a nop a NPT walks are always user faults, but fix it up for
consistency.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-53-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:47 -04:00
Sean Christopherson
f82fdaf536 KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
Drop the extra reset of shadow_zero_bits in the nested NPT flow now
that shadow_mmu_init_context computes the correct level for nested NPT.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-52-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:47 -04:00
Sean Christopherson
7cd138db5c KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
best confusing.  Per the comment:

  Can have large pages at levels 2..last_nonleaf_level-1.

the intent of the variable would appear to be to track what levels can
_legally_ have large pages, but that intent doesn't align with reality.
The computed value will be wrong for 5-level paging, or if 1gb pages are
not supported.

The flawed code is not a problem in practice, because except for 32-bit
PSE paging, bit 7 is reserved if large pages aren't supported at the
level.  Take advantage of this invariant and simply omit the level magic
math for 64-bit page tables (including PAE).

For 32-bit paging (non-PAE), the adjustments are needed purely because
bit 7 is ignored if PSE=0.  Retain that logic as is, but make
is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.

Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
not PTEs and will blow up all the other gpte helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-51-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:47 -04:00
Sean Christopherson
616007c866 KVM: x86: Enhance comments for MMU roles and nested transition trickiness
Expand the comments for the MMU roles.  The interactions with gfn_track
PGD reuse in particular are hairy.

Regarding PGD reuse, add comments in the nested virtualization flows to
call out why kvm_init_mmu() is unconditionally called even when nested
TDP is used.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-50-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:47 -04:00
Sean Christopherson
3b77daa5ef KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
Replace make_spte()'s WARN on a collision with the magic MMIO value with
a generic WARN on reserved bits being set (including EPT's reserved WX
combination).  Warning on any reserved bits covers MMIO, A/D tracking
bits with PAE paging, and in theory any future goofs that are introduced.

Opportunistically convert to ONCE behavior to avoid spamming the kernel
log, odds are very good that if KVM screws up one SPTE, it will botch all
SPTEs for the same MMU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-49-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:46 -04:00
Sean Christopherson
961f84457c KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
Extract the reserved SPTE check and print helpers in get_mmio_spte() to
new helpers so that KVM can also WARN on reserved badness when making a
SPTE.

Tag the checking helper with __always_inline to improve the probability
of the compiler generating optimal code for the checking loop, e.g. gcc
appears to avoid using %rbp when the helper is tagged with a vanilla
"inline".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-48-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:46 -04:00
Sean Christopherson
36f267871e KVM: x86/mmu: Use MMU's role to determine PTTYPE
Use the MMU's role instead of vCPU state or role_regs to determine the
PTTYPE, i.e. which helpers to wire up.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-47-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:46 -04:00
Sean Christopherson
fe660f7244 KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
Skip paging32E_init_context() and paging64_init_context_common() and go
directly to paging64_init_context() (was the common version) now that
the relevant flows don't need to distinguish between 64-bit PAE and
32-bit PAE for other reasons.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-46-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:46 -04:00
Sean Christopherson
f4bd6f7376 KVM: x86/mmu: Add a helper to calculate root from role_regs
Add a helper to calculate the level for non-EPT page tables from the
MMU's role_regs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-45-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:45 -04:00
Sean Christopherson
533f9a4b38 KVM: x86/mmu: Add helper to update paging metadata
Consolidate MMU guest metadata updates into a common helper for TDP,
shadow, and nested MMUs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-44-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:45 -04:00
Sean Christopherson
af0eb17e99 KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
Don't bother updating the bitmasks and last-leaf information if paging is
disabled as the metadata will never be used.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-43-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:45 -04:00
Sean Christopherson
fa4b558802 KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
Move calls to reset_rsvds_bits_mask() out of the various mode statements
and under a more generic CR0.PG=1 check.  This will allow for additional
code consolidation in the future.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-42-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:45 -04:00
Sean Christopherson
87e99d7d70 KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
Get LA57 from the role_regs, which are initialized from the vCPU even
though TDP is enabled, instead of pulling the value directly from the
vCPU when computing the guest's root_level for TDP MMUs.  Note, the check
is inside an is_long_mode() statement, so that requirement is not lost.

Use role_regs even though the MMU's role is available and arguably
"better".  A future commit will consolidate the guest root level logic,
and it needs access to EFER.LMA, which is not tracked in the role (it
can't be toggled on VM-Exit, unlike LA57).

Drop is_la57_mode() as there are no remaining users, and to discourage
pulling MMU state from the vCPU (in the future).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-41-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:45 -04:00
Sean Christopherson
5472fcd4c6 KVM: x86/mmu: Get nested MMU's root level from the MMU's role
Initialize the MMU's (guest) root_level using its mmu_role instead of
redoing the calculations.  The role_regs used to calculate the mmu_role
are initialized from the vCPU, i.e. this should be a complete nop.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-40-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:44 -04:00
Sean Christopherson
a4c93252fe KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers
Drop kvm_mmu.nx as there no consumers left.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-39-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:44 -04:00
Sean Christopherson
90599c2801 KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration
Get the MMU's effective EFER.NX from its role instead of using the
one-off, dedicated flag.  This will allow dropping said flag in a
future commit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-38-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:44 -04:00
Sean Christopherson
84a1622604 KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata
Use the MMU's role and role_regs to calculate the MMU's guest root level
and NX bit.  For some flows, the vCPU state may not be correct (or
relevant), e.g. EPT doesn't interact with EFER.NX and nested NPT will
configure the guest_mmu with possibly-stale vCPU state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-37-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:44 -04:00
Sean Christopherson
cd628f0f1e KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk
Use the NX bit from the MMU's role instead of the MMU itself so that the
redundant, dedicated "nx" flag can be dropped.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-36-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:43 -04:00
Sean Christopherson
b67a93a87e KVM: x86/mmu: Use MMU's roles to compute last non-leaf level
Use the MMU's role to get CR4.PSE when determining the last level at
which the guest _cannot_ create a non-leaf PTE, i.e. cannot create a
huge page.

Note, the existing logic is arguably wrong when considering 5-level
paging and the case where 1gb pages aren't supported.  In practice, the
logic is confusing but not broken, because except for 32-bit non-PAE
paging, bit 7 (_PAGE_PSE) bit is reserved when a huge page isn't supported at
that level.  I.e. setting bit 7 will terminate the guest walk one way or
another.  Furthermore, last_nonleaf_level is only consulted after KVM has
verified there are no reserved bits set.

All that confusion will be addressed in a future patch by dropping
last_nonleaf_level entirely.  For now, massage the code to continue the
march toward using mmu_role for (almost) all MMU computations.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:43 -04:00
Sean Christopherson
2e4c06618d KVM: x86/mmu: Use MMU's role to compute PKRU bitmask
Use the MMU's role to calculate the Protection Keys (Restrict Userspace)
bitmask instead of pulling bits from current vCPU state.  For some flows,
the vCPU state may not be correct (or relevant), e.g. EPT doesn't
interact with PKRU.  Case in point, the "ept" param simply disappears.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-34-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:43 -04:00
Sean Christopherson
c596f1470a KVM: x86/mmu: Use MMU's role to compute permission bitmask
Use the MMU's role to generate the permission bitmasks for the MMU.
For some flows, the vCPU state may not be correct (or relevant), e.g.
the nested NPT MMU can be initialized with incoherent vCPU state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-33-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:43 -04:00
Sean Christopherson
b705a277b7 KVM: x86/mmu: Drop vCPU param from reserved bits calculator
Drop the vCPU param from __reset_rsvds_bits_mask() as it's now unused,
and ideally will remain unused in the future.  Any information that's
needed by the low level helper should be explicitly provided as it's used
for both shadow/host MMUs and guest MMUs, i.e. vCPU state may be
meaningless or simply wrong.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-32-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:42 -04:00
Sean Christopherson
4e9c0d80db KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits
Use the MMU's role to get CR4.PSE when calculating reserved bits for the
guest's PTEs.  Practically speaking, this is a glorified nop as the role
always come from vCPU state for the relevant flows, but converting to
the roles will provide consistency once everything else is converted, and
will Just Work if the "always comes from vCPU" behavior were ever to
change (unlikely).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-31-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:42 -04:00
Sean Christopherson
8c985b2d8e KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits
Unconditionally pass pse=false when calculating reserved bits for shadow
PTEs.  CR4.PSE is only relevant for 32-bit non-PAE paging, which KVM does
not use for shadow paging (including nested NPT).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:42 -04:00
Sean Christopherson
18db1b1790 KVM: x86/mmu: Always set new mmu_role immediately after checking old role
Refactor shadow MMU initialization to immediately set its new mmu_role
after verifying it differs from the old role, and so that all flavors
of MMU initialization share the same check-and-set pattern.  Immediately
setting the role will allow future commits to use mmu_role to configure
the MMU without consuming stale state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-29-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:42 -04:00
Sean Christopherson
84c679f5f5 KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active
Don't set cr4_pke or cr4_la57 in the MMU role if long mode isn't active,
which is required for protection keys and 5-level paging to be fully
enabled.  Ignoring the bit avoids unnecessary reconfiguration on reuse,
and also means consumers of mmu_role don't need to manually check for
long mode.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-28-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:41 -04:00
Sean Christopherson
ca8d664f50 KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0
Don't set CR0/CR4/EFER bits in the MMU role if paging is disabled, paging
modifiers are irrelevant if there is no paging in the first place.
Somewhat arbitrarily clear gpte_is_8_bytes for shadow paging if paging is
disabled in the guest.  Again, there are no guest PTEs to process, so the
size is meaningless.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-27-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:41 -04:00
Sean Christopherson
6066772455 KVM: x86/mmu: Add accessors to query mmu_role bits
Add accessors via a builder macro for all mmu_role bits that track a CR0,
CR4, or EFER bit, abstracting whether the bits are in the base or the
extended role.

Future commits will switch to using mmu_role instead of vCPU state to
configure the MMU, i.e. there are about to be a large number of users.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-26-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:41 -04:00