1. 21 Jul, 2019 1 commit
  2. 09 Jun, 2019 1 commit
  3. 25 May, 2019 1 commit
  4. 16 May, 2019 2 commits
  5. 04 May, 2019 1 commit
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots · 423ad0b9
      Marc Zyngier authored
      [ Upstream commit 7494cec6 ]
      Calling kvm_is_visible_gfn() implies that we're parsing the memslots,
      and doing this without the srcu lock is frown upon:
      [12704.164532] =============================
      [12704.164544] WARNING: suspicious RCU usage
      [12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty #16 Tainted: G        W
      [12704.164573] -----------------------------
      [12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
      [12704.164602] other info that might help us debug this:
      [12704.164616] rcu_scheduler_active = 2, debug_locks = 1
      [12704.164631] 6 locks held by qemu-system-aar/13968:
      [12704.164644]  #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
      [12704.164691]  #1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
      [12704.164726]  #2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [12704.164761]  #3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [12704.164794]  #4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [12704.164827]  #5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [12704.164861] stack backtrace:
      [12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G        W         5.1.0-rc1-00008-g600025238f51-dirty #16
      [12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
      [12704.164896] Call trace:
      [12704.164910]  dump_backtrace+0x0/0x138
      [12704.164920]  show_stack+0x24/0x30
      [12704.164934]  dump_stack+0xbc/0x104
      [12704.164946]  lockdep_rcu_suspicious+0xcc/0x110
      [12704.164958]  gfn_to_memslot+0x174/0x190
      [12704.164969]  kvm_is_visible_gfn+0x28/0x70
      [12704.164980]  vgic_its_check_id.isra.0+0xec/0x1e8
      [12704.164991]  vgic_its_save_tables_v0+0x1ac/0x330
      [12704.165001]  vgic_its_set_attr+0x298/0x3a0
      [12704.165012]  kvm_device_ioctl_attr+0x9c/0xd8
      [12704.165022]  kvm_device_ioctl+0x8c/0xf8
      [12704.165035]  do_vfs_ioctl+0xc8/0x960
      [12704.165045]  ksys_ioctl+0x8c/0xa0
      [12704.165055]  __arm64_sys_ioctl+0x28/0x38
      [12704.165067]  el0_svc_common+0xd8/0x138
      [12704.165078]  el0_svc_handler+0x38/0x78
      [12704.165089]  el0_svc+0x8/0xc
      Make sure the lock is taken when doing this.
      Fixes: bf308242
       ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin (Microsoft) <sashal@kernel.org>
  6. 03 Apr, 2019 1 commit
  7. 23 Mar, 2019 1 commit
    • Sean Christopherson's avatar
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 89dce6e4
      Sean Christopherson authored
      commit 15248258 upstream.
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      Fixes: e59dbe09
       ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  8. 12 Feb, 2019 3 commits
  9. 16 Jan, 2019 1 commit
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · cb754d67
      Christoffer Dall authored
      commit fb544d1c upstream.
      We recently addressed a VMID generation race by introducing a read/write
      lock around accesses and updates to the vmid generation values.
      However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
      does so without taking the read lock.
      As far as I can tell, this can lead to the same kind of race:
        VM 0, VCPU 0			VM 0, VCPU 1
        ------------			------------
        update_vttbr (vmid 254)
        				update_vttbr (vmid 1) // roll over
        need_new_vmid_gen == false //because vmid gen matches
        enter_guest (vmid 254)
        				kvm_arch.vttbr = <PGD>:<VMID 1>
        				enter_guest (vmid 1)
      Which results in running two VCPUs in the same VM with different VMIDs
      and (even worse) other VCPUs from other VMs could now allocate clashing
      VMID 254 from the new generation as long as VCPU 0 is not exiting.
      Attempt to solve this by making sure vttbr is updated before another CPU
      can observe the updated VMID generation.
      Cc: stable@vger.kernel.org
      Fixes: f0cf47d9
       "KVM: arm/arm64: Close VMID generation race"
      Reviewed-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  10. 09 Jan, 2019 1 commit
  11. 13 Nov, 2018 1 commit
    • Mark Rutland's avatar
      KVM: arm64: Fix caching of host MDCR_EL2 value · f1df7654
      Mark Rutland authored
      commit da5a3ce6 upstream.
      At boot time, KVM stashes the host MDCR_EL2 value, but only does this
      when the kernel is not running in hyp mode (i.e. is non-VHE). In these
      cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
      lead to CONSTRAINED UNPREDICTABLE behaviour.
      Since we use this value to derive the MDCR_EL2 value when switching
      to/from a guest, after a guest have been run, the performance counters
      do not behave as expected. This has been observed to result in accesses
      via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
      counters, resulting in events not being counted. In these cases, only
      the fixed-purpose cycle counter appears to work as expected.
      Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.
      Cc: Christopher Dall <christoffer.dall@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 1e947bad
       ("arm64: KVM: Skip HYP setup when already running in HYP")
      Tested-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  12. 26 Sep, 2018 2 commits
  13. 05 Sep, 2018 2 commits
  14. 24 Aug, 2018 2 commits
  15. 25 Jul, 2018 1 commit
    • Lan Tianyu's avatar
      KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. · 3a46a033
      Lan Tianyu authored
      commit b5020a8e
      Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
      when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
      for one specific eventfd. When the assign path hasn't finished but irqfd
      has been added to kvm->irqfds.items list, another thead may deassign the
      eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
      the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
      such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
      has been added to kvm->irqfds.items list, and call synchronize_srcu()
      in irq_shutdown() to make sure that irqfd has been fully initialized in
      the assign path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTianyu Lan <tianyu.lan@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  16. 22 Jul, 2018 4 commits
  17. 20 Jun, 2018 1 commit
  18. 30 May, 2018 1 commit
  19. 22 May, 2018 2 commits
  20. 01 May, 2018 2 commits
    • Marc Zyngier's avatar
      arm/arm64: KVM: Add PSCI version selection API · e5a290c4
      Marc Zyngier authored
      commit 85bd0ba1
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: default avatarChristoffer Dall <cdall@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Marc Zyngier's avatar
      KVM: arm/arm64: Close VMID generation race · 5a5ea340
      Marc Zyngier authored
      commit f0cf47d9
      Before entering the guest, we check whether our VMID is still
      part of the current generation. In order to avoid taking a lock,
      we start with checking that the generation is still current, and
      only if not current do we take the lock, recheck, and update the
      generation and VMID.
      This leaves open a small race: A vcpu can bump up the global
      generation number as well as the VM's, but has not updated
      the VMID itself yet.
      At that point another vcpu from the same VM comes in, checks
      the generation (and finds it not needing anything), and jumps
      into the guest. At this point, we end-up with two vcpus belonging
      to the same VM running with two different VMIDs. Eventually, the
      VMID used by the second vcpu will get reassigned, and things will
      really go wrong...
      A simple solution would be to drop this initial check, and always take
      the lock. This is likely to cause performance issues. A middle ground
      is to convert the spinlock to a rwlock, and only take the read lock
      on the fast path. If the check fails at that point, drop it and
      acquire the write lock, rechecking the condition.
      This ensures that the above scenario doesn't occur.
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarShannon Zhao <zhaoshenglong@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  21. 26 Apr, 2018 1 commit
  22. 24 Apr, 2018 1 commit
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list · 8f1a2803
      Marc Zyngier authored
      commit 7d8b44c5 upstream.
      vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
      a given vcpu. We allocate the array containing the intids before taking
      the lpi_list_lock, which means we can have an array size that is not
      equal to the number of LPIs.
      This is particularly obvious when looking at the path coming from
      vgic_enable_lpis, which is not a command, and thus can run in parallel
      with commands:
      vcpu 0:                                        vcpu 1:
            intids = kmalloc_array(irq_count)
                                                     MAPI(lpi targeting vcpu 0)
              intids[i++] = irq->intid;
      At that stage, we will happily overrun the intids array. Boo. An easy
      fix is is to break once the array is full. The MAPI command will update
      the config anyway, and we won't miss a thing....
  23. 21 Mar, 2018 3 commits
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · e693f133
      Marc Zyngier authored
      commit 16ca6a60 upstream.
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c
       ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: default avatarChristoffer Dall <cdall@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Marc Zyngier's avatar
      kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3 · b85437d0
      Marc Zyngier authored
      commit 27e91ad1 upstream.
      On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
      force synchronization between the memory-mapped guest view and
      the system-register view that the hypervisor uses.
      This is incorrect, as the spec calls out the need for "a DSB whose
      required access type is both loads and stores with any Shareability
      attribute", while we're only synchronizing stores.
      We also lack an isb after the dsb to ensure that the latter has
      actually been executed before we start reading stuff from the sysregs.
      The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
      just after.
      Cc: stable@vger.kernel.org
      Fixes: f68d2b1b
       ("arm64: KVM: Implement vgic-v3 save/restore")
      Acked-by: default avatarChristoffer Dall <cdall@kernel.org>
      Reviewed-by: André Przywara's avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Ard Biesheuvel's avatar
      KVM: arm/arm64: Reduce verbosity of KVM init log · 2ffe95e3
      Ard Biesheuvel authored
      commit 76600428
      On my GICv3 system, the following is printed to the kernel log at boot:
         kvm [1]: 8-bit VMID
         kvm [1]: IDMAP page: d20e35000
         kvm [1]: HYP VA range: 800000000000:ffffffffffff
         kvm [1]: vgic-v2@2c020000
         kvm [1]: GIC system register CPU interface enabled
         kvm [1]: vgic interrupt IRQ1
         kvm [1]: virtual timer IRQ4
         kvm [1]: Hyp mode initialized successfully
      The KVM IDMAP is a mapping of a statically allocated kernel structure,
      and so printing its physical address leaks the physical placement of
      the kernel when physical KASLR in effect. So change the kvm_info() to
      kvm_debug() to remove it from the log output.
      While at it, trim the output a bit more: IRQ numbers can be found in
      /proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
      informational either.
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarChristoffer Dall <cdall@kernel.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  24. 09 Mar, 2018 1 commit
    • Wanpeng Li's avatar
      KVM: mmu: Fix overlap between public and private memslots · 7135aaf3
      Wanpeng Li authored
      commit b28676bb
      Reported by syzkaller:
          pte_list_remove: ffff9714eb1f8078 0->BUG
          ------------[ cut here ]------------
          kernel BUG at arch/x86/kvm/mmu.c:1157!
          invalid opcode: 0000 [#1] SMP
          RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
          Call Trace:
           drop_spte+0x83/0xb0 [kvm]
           mmu_page_zap_pte+0xcc/0xe0 [kvm]
           kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
           kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
           kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
           kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
           ? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
           ? __mmu_notifier_release+0x5/0x110
           ? do_exit+0x281/0xcb0
           ? __context_tracking_exit.part.5+0x4a/0x150
      The reason is that when creates new memslot, there is no guarantee for new
      memslot not overlap with private memslots. This can be triggered by the
      following program:
         #include <fcntl.h>
         #include <pthread.h>
         #include <setjmp.h>
         #include <signal.h>
         #include <stddef.h>
         #include <stdint.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/ioctl.h>
         #include <sys/stat.h>
         #include <sys/syscall.h>
         #include <sys/types.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         long r[16];
         int main()
      	void *p = valloc(0x4000);
      	r[2] = open("/dev/kvm", 0);
      	r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
      	uint64_t addr = 0xf000;
      	ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
      	r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
      	ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
      	ioctl(r[6], KVM_RUN, 0);
      	ioctl(r[6], KVM_RUN, 0);
      	struct kvm_userspace_memory_region mr = {
      		.slot = 0,
      		.flags = KVM_MEM_LOG_DIRTY_PAGES,
      		.guest_phys_addr = 0xf000,
      		.memory_size = 0x4000,
      		.userspace_addr = (uintptr_t) p
      	ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
      	return 0;
      This patch fixes the bug by not adding a new memslot even if it
      overlaps with private memslots.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
  25. 25 Feb, 2018 2 commits
  26. 16 Feb, 2018 1 commit