1. 02 Dec, 2012 1 commit
  2. 30 Nov, 2012 2 commits
    • Will Auld's avatar
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld authored
      
      
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      
      Signed-off-by: default avatarWill Auld <will.auld@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • Will Auld's avatar
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld authored
      
      
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      
      Signed-off-by: default avatarWill Auld <will.auld@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
  3. 28 Nov, 2012 7 commits
  4. 14 Nov, 2012 3 commits
  5. 29 Oct, 2012 1 commit
  6. 18 Oct, 2012 1 commit
  7. 08 Oct, 2012 2 commits
  8. 23 Sep, 2012 2 commits
    • Jan Kiszka's avatar
      KVM: x86: Fix guest debug across vcpu INIT reset · c8639010
      Jan Kiszka authored
      
      
      If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
      KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.
      
      Fix this by saving the dr7 used for guest debugging and calculating the
      effective register value as well as switch_db_regs on any potential
      change. This will change to focus of the set_guest_debug vendor op to
      update_dp_bp_intercept.
      
      Found while trying to stop on start_secondary.
      
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      c8639010
    • Alex Williamson's avatar
      KVM: Add resampling irqfds for level triggered interrupts · 7a84428a
      Alex Williamson authored
      
      
      To emulate level triggered interrupts, add a resample option to
      KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
      the user when the irqchip has been resampled by the VM.  This may, for
      instance, indicate an EOI.  Also in this mode, posting of an interrupt
      through an irqfd only asserts the interrupt.  On resampling, the
      interrupt is automatically de-asserted prior to user notification.
      This enables level triggered interrupts to be posted and re-enabled
      from vfio with no userspace intervention.
      
      All resampling irqfds can make use of a single irq source ID, so we
      reserve a new one for this interface.
      
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      7a84428a
  9. 21 Sep, 2012 1 commit
    • Suresh Siddha's avatar
      x86, kvm: fix kvm's usage of kernel_fpu_begin/end() · b1a74bf8
      Suresh Siddha authored
      
      
      Preemption is disabled between kernel_fpu_begin/end() and as such
      it is not a good idea to use these routines in kvm_load/put_guest_fpu()
      which can be very far apart.
      
      kvm_load/put_guest_fpu() routines are already called with
      preemption disabled and KVM already uses the preempt notifier to save
      the guest fpu state using kvm_put_guest_fpu().
      
      So introduce __kernel_fpu_begin/end() routines which don't touch
      preemption and use them instead of kernel_fpu_begin/end()
      for KVM's use model of saving/restoring guest FPU state.
      
      Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit
      state in the case of VMX. For eagerFPU case, host cr0.TS is always clear.
      So no need to worry about it. For the traditional lazyFPU restore case,
      change the cr0.TS bit for the host state during vm-exit to be always clear
      and cr0.TS bit is set in the __vmx_load_host_state() when the FPU
      (guest FPU or the host task's FPU) state is not active. This ensures
      that the host/guest FPU state is properly saved, restored
      during context-switch and with interrupts (using irq_fpu_usable()) not
      stomping on the active FPU state.
      
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com
      
      
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      b1a74bf8
  10. 20 Sep, 2012 2 commits
    • Gleb Natapov's avatar
      KVM: optimize apic interrupt delivery · 1e08ec4a
      Gleb Natapov authored
      
      
      Most interrupt are delivered to only one vcpu. Use pre-build tables to
      find interrupt destination instead of looping through all vcpus. In case
      of logical mode loop only through vcpus in a logical cluster irq is sent
      to.
      
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      1e08ec4a
    • Avi Kivity's avatar
      KVM: MMU: Optimize pte permission checks · 97d64b78
      Avi Kivity authored
      
      
      walk_addr_generic() permission checks are a maze of branchy code, which is
      performed four times per lookup.  It depends on the type of access, efer.nxe,
      cr0.wp, cr4.smep, and in the near future, cr4.smap.
      
      Optimize this away by precalculating all variants and storing them in a
      bitmap.  The bitmap is recalculated when rarely-changing variables change
      (cr0, cr4) and is indexed by the often-changing variables (page fault error
      code, pte access permissions).
      
      The permission check is moved to the end of the loop, otherwise an SMEP
      fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
      Noted by Xiao Guangrong.
      
      The result is short, branch-free code.
      
      Reviewed-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      97d64b78
  11. 18 Sep, 2012 1 commit
  12. 17 Sep, 2012 1 commit
    • Michael S. Tsirkin's avatar
      KVM: make processes waiting on vcpu mutex killable · 9fc77441
      Michael S. Tsirkin authored
      
      
      vcpu mutex can be held for unlimited time so
      taking it with mutex_lock on an ioctl is wrong:
      one process could be passed a vcpu fd and
      call this ioctl on the vcpu used by another process,
      it will then be unkillable until the owner exits.
      
      Call mutex_lock_killable instead and return status.
      Note: mutex_lock_interruptible would be even nicer,
      but I am not sure all users are prepared to handle EINTR
      from these ioctls. They might misinterpret it as an error.
      
      Cleanup paths expect a vcpu that can't be used by
      any userspace so this will always succeed - catch bugs
      by calling BUG_ON.
      
      Catch callers that don't check return state by adding
      __must_check.
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      9fc77441
  13. 10 Sep, 2012 1 commit
    • Xiao Guangrong's avatar
      KVM: fix error paths for failed gfn_to_page() calls · 4484141a
      Xiao Guangrong authored
      
      
      This bug was triggered:
      [ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe
      [ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34
      ......
      [ 4220.237326] Call Trace:
      [ 4220.237361]  [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm]
      [ 4220.237382]  [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm]
      [ 4220.237401]  [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm]
      [ 4220.237407]  [<ffffffff81145425>] __fput+0x111/0x1ed
      [ 4220.237411]  [<ffffffff8114550f>] ____fput+0xe/0x10
      [ 4220.237418]  [<ffffffff81063511>] task_work_run+0x5d/0x88
      [ 4220.237424]  [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca
      
      The test case:
      
      	printf(fmt, ##args);		\
      	exit(-1);} while (0)
      
      static int create_vm(void)
      {
      	int sys_fd, vm_fd;
      
      	sys_fd = open("/dev/kvm", O_RDWR);
      	if (sys_fd < 0)
      		die("open /dev/kvm fail.\n");
      
      	vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0);
      	if (vm_fd < 0)
      		die("KVM_CREATE_VM fail.\n");
      
      	return vm_fd;
      }
      
      static int create_vcpu(int vm_fd)
      {
      	int vcpu_fd;
      
      	vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0);
      	if (vcpu_fd < 0)
      		die("KVM_CREATE_VCPU ioctl.\n");
      	printf("Create vcpu.\n");
      	return vcpu_fd;
      }
      
      static void *vcpu_thread(void *arg)
      {
      	int vm_fd = (int)(long)arg;
      
      	create_vcpu(vm_fd);
      	return NULL;
      }
      
      int main(int argc, char *argv[])
      {
      	pthread_t thread;
      	int vm_fd;
      
      	(void)argc;
      	(void)argv;
      
      	vm_fd = create_vm();
      	pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd);
      	printf("Exit.\n");
      	return 0;
      }
      
      It caused by release kvm->arch.ept_identity_map_addr which is the
      error page.
      
      The parent thread can send KILL signal to the vcpu thread when it was
      exiting which stops faulting pages and potentially allocating memory.
      So gfn_to_pfn/gfn_to_page may fail at this time
      
      Fixed by checking the page before it is used
      
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      4484141a
  14. 06 Sep, 2012 4 commits
  15. 05 Sep, 2012 3 commits
  16. 30 Aug, 2012 1 commit
  17. 27 Aug, 2012 2 commits
  18. 22 Aug, 2012 2 commits
  19. 13 Aug, 2012 1 commit
  20. 09 Aug, 2012 1 commit
  21. 06 Aug, 2012 1 commit