1. 11 Dec, 2009 1 commit
  2. 10 Dec, 2009 2 commits
  3. 09 Dec, 2009 1 commit
  4. 05 Dec, 2009 1 commit
  5. 04 Dec, 2009 2 commits
  6. 03 Dec, 2009 17 commits
    • Avi Kivity's avatar
      KVM: VMX: Fix comparison of guest efer with stale host value · d5696725
      Avi Kivity authored
      
      
      update_transition_efer() masks out some efer bits when deciding whether
      to switch the msr during guest entry; for example, NX is emulated using the
      mmu so we don't need to disable it, and LMA/LME are handled by the hardware.
      
      However, with shared msrs, the comparison is made against a stale value;
      at the time of the guest switch we may be running with another guest's efer.
      
      Fix by deferring the mask/compare to the actual point of guest entry.
      
      Noted by Marcelo.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d5696725
    • Avi Kivity's avatar
      KVM: x86 emulator: limit instructions to 15 bytes · eb3c79e6
      Avi Kivity authored
      
      
      While we are never normally passed an instruction that exceeds 15 bytes,
      smp games can cause us to attempt to interpret one, which will cause
      large latencies in non-preempt hosts.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      eb3c79e6
    • Jan Kiszka's avatar
      KVM: x86: Add KVM_GET/SET_VCPU_EVENTS · 3cfc3092
      Jan Kiszka authored
      
      
      This new IOCTL exports all yet user-invisible states related to
      exceptions, interrupts, and NMIs. Together with appropriate user space
      changes, this fixes sporadic problems of vmsave/restore, live migration
      and system reset.
      
      [avi: future-proof abi by adding a flags field]
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      3cfc3092
    • Avi Kivity's avatar
      KVM: x86 shared msr infrastructure · 18863bdd
      Avi Kivity authored
      
      
      The various syscall-related MSRs are fairly expensive to switch.  Currently
      we switch them on every vcpu preemption, which is far too often:
      
      - if we're switching to a kernel thread (idle task, threaded interrupt,
        kernel-mode virtio server (vhost-net), for example) and back, then
        there's no need to switch those MSRs since kernel threasd won't
        be exiting to userspace.
      
      - if we're switching to another guest running an identical OS, most likely
        those MSRs will have the same value, so there's little point in reloading
        them.
      
      - if we're running the same OS on the guest and host, the MSRs will have
        identical values and reloading is unnecessary.
      
      This patch uses the new user return notifiers to implement last-minute
      switching, and checks the msr values to avoid unnecessary reloading.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      18863bdd
    • Glauber Costa's avatar
      KVM: allow userspace to adjust kvmclock offset · afbcf7ab
      Glauber Costa authored
      
      
      When we migrate a kvm guest that uses pvclock between two hosts, we may
      suffer a large skew. This is because there can be significant differences
      between the monotonic clock of the hosts involved. When a new host with
      a much larger monotonic time starts running the guest, the view of time
      will be significantly impacted.
      
      Situation is much worse when we do the opposite, and migrate to a host with
      a smaller monotonic clock.
      
      This proposed ioctl will allow userspace to inform us what is the monotonic
      clock value in the source host, so we can keep the time skew short, and
      more importantly, never goes backwards. Userspace may also need to trigger
      the current data, since from the first migration onwards, it won't be
      reflected by a simple call to clock_gettime() anymore.
      
      [marcelo: future-proof abi with a flags field]
      [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
      Signed-off-by: default avatarGlauber Costa <glommer@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      afbcf7ab
    • Jan Kiszka's avatar
      KVM: SVM: Cleanup NMI singlestep · 6be7d306
      Jan Kiszka authored
      
      
      Push the NMI-related singlestep variable into vcpu_svm. It's dealing
      with an AMD-specific deficit, nothing generic for x86.
      Acked-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      
       arch/x86/include/asm/kvm_host.h |    1 -
       arch/x86/kvm/svm.c              |   12 +++++++-----
       2 files changed, 7 insertions(+), 6 deletions(-)
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      6be7d306
    • Jan Kiszka's avatar
      KVM: x86: Fix guest single-stepping while interruptible · 94fe45da
      Jan Kiszka authored
      
      
      Commit 705c5323 opened the doors of hell by unconditionally injecting
      single-step flags as long as guest_debug signaled this. This doesn't
      work when the guest branches into some interrupt or exception handler
      and triggers a vmexit with flag reloading.
      
      Fix it by saving cs:rip when user space requests single-stepping and
      restricting the trace flag injection to this guest code position.
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      94fe45da
    • Ed Swierk's avatar
      KVM: Xen PV-on-HVM guest support · ffde22ac
      Ed Swierk authored
      
      
      Support for Xen PV-on-HVM guests can be implemented almost entirely in
      userspace, except for handling one annoying MSR that maps a Xen
      hypercall blob into guest address space.
      
      A generic mechanism to delegate MSR writes to userspace seems overkill
      and risks encouraging similar MSR abuse in the future.  Thus this patch
      adds special support for the Xen HVM MSR.
      
      I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
      KVM which MSR the guest will write to, as well as the starting address
      and size of the hypercall blobs (one each for 32-bit and 64-bit) that
      userspace has loaded from files.  When the guest writes to the MSR, KVM
      copies one page of the blob from userspace to the guest.
      
      I've tested this patch with a hacked-up version of Gerd's userspace
      code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
      FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
      
      [jan: fix i386 build warning]
      [avi: future proof abi with a flags field]
      Signed-off-by: default avatarEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      ffde22ac
    • Mark Langsdorf's avatar
      KVM: SVM: Support Pause Filter in AMD processors · 565d0998
      Mark Langsdorf authored
      
      
      New AMD processors (Family 0x10 models 8+) support the Pause
      Filter Feature.  This feature creates a new field in the VMCB
      called Pause Filter Count.  If Pause Filter Count is greater
      than 0 and intercepting PAUSEs is enabled, the processor will
      increment an internal counter when a PAUSE instruction occurs
      instead of intercepting.  When the internal counter reaches the
      Pause Filter Count value, a PAUSE intercept will occur.
      
      This feature can be used to detect contended spinlocks,
      especially when the lock holding VCPU is not scheduled.
      Rescheduling another VCPU prevents the VCPU seeking the
      lock from wasting its quantum by spinning idly.
      
      Experimental results show that most spinlocks are held
      for less than 1000 PAUSE cycles or more than a few
      thousand.  Default the Pause Filter Counter to 3000 to
      detect the contended spinlocks.
      
      Processor support for this feature is indicated by a CPUID
      bit.
      
      On a 24 core system running 4 guests each with 16 VCPUs,
      this patch improved overall performance of each guest's
      32 job kernbench by approximately 3-5% when combined
      with a scheduler algorithm thati caused the VCPU to
      sleep for a brief period. Further performance improvement
      may be possible with a more sophisticated yield algorithm.
      Signed-off-by: default avatarMark Langsdorf <mark.langsdorf@amd.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      565d0998
    • Zhai, Edwin's avatar
      KVM: VMX: Add support for Pause-Loop Exiting · 4b8d54f9
      Zhai, Edwin authored
      
      
      New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
      control fields:
      PLE_Gap    - upper bound on the amount of time between two successive
                   executions of PAUSE in a loop.
      PLE_Window - upper bound on the amount of time a guest is allowed to execute in
                   a PAUSE loop
      
      If the time, between this execution of PAUSE and previous one, exceeds the
      PLE_Gap, processor consider this PAUSE belongs to a new loop.
      Otherwise, processor determins the the total execution time of this loop(since
      1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
      PLE_Window.
      * Refer SDM volume 3b section 21.6.13 & 22.1.3.
      
      Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
      is sched-out after hold a spinlock, then other VPs for same lock are sched-in
      to waste the CPU time.
      
      Our tests indicate that most spinlocks are held for less than 212 cycles.
      Performance tests show that with 2X LP over-commitment we can get +2% perf
      improvement for kernel build(Even more perf gain with more LPs).
      Signed-off-by: default avatarZhai Edwin <edwin.zhai@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      4b8d54f9
    • Jan Kiszka's avatar
      KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b
      Jan Kiszka authored
      
      
      Push TF and RF injection and filtering on guest single-stepping into the
      vender get/set_rflags callbacks. This makes the whole mechanism more
      robust wrt user space IOCTL order and instruction emulations.
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      91586a3b
    • Jan Kiszka's avatar
      KVM: x86: Refactor guest debug IOCTL handling · 355be0b9
      Jan Kiszka authored
      
      
      Much of so far vendor-specific code for setting up guest debug can
      actually be handled by the generic code. This also fixes a minor deficit
      in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      355be0b9
    • Alexander Graf's avatar
      KVM: Activate Virtualization On Demand · 10474ae8
      Alexander Graf authored
      
      
      X86 CPUs need to have some magic happening to enable the virtualization
      extensions on them. This magic can result in unpleasant results for
      users, like blocking other VMMs from working (vmx) or using invalid TLB
      entries (svm).
      
      Currently KVM activates virtualization when the respective kernel module
      is loaded. This blocks us from autoloading KVM modules without breaking
      other VMMs.
      
      To circumvent this problem at least a bit, this patch introduces on
      demand activation of virtualization. This means, that instead
      virtualization is enabled on creation of the first virtual machine
      and disabled on destruction of the last one.
      
      So using this, KVM can be easily autoloaded, while keeping other
      hypervisors usable.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      10474ae8
    • Gleb Natapov's avatar
      KVM: Move irq ack notifier list to arch independent code · 136bdfee
      Gleb Natapov authored
      
      
      Mask irq notifier list is already there.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      136bdfee
    • Gleb Natapov's avatar
      KVM: Maintain back mapping from irqchip/pin to gsi · 3e71f88b
      Gleb Natapov authored
      
      
      Maintain back mapping from irqchip/pin to gsi to speedup
      interrupt acknowledgment notifications.
      
      [avi: build fix on non-x86/ia64]
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      3e71f88b
    • Gleb Natapov's avatar
      KVM: Move irq sharing information to irqchip level · 1a6e4a8c
      Gleb Natapov authored
      
      
      This removes assumptions that max GSIs is smaller than number of pins.
      Sharing is tracked on pin level not GSI level.
      
      [avi: no PIC on ia64]
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      1a6e4a8c
    • Avi Kivity's avatar
      KVM: Don't pass kvm_run arguments · 851ba692
      Avi Kivity authored
      
      
      They're just copies of vcpu->run, which is readily accessible.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      851ba692
  7. 02 Dec, 2009 3 commits
  8. 01 Dec, 2009 1 commit
    • H. Peter Anvin's avatar
      x86, mm: Correct the implementation of is_untracked_pat_range() · ccef0864
      H. Peter Anvin authored
      The semantics the PAT code expect of is_untracked_pat_range() is "is
      this range completely contained inside the untracked region."  This
      means that checkin 8a271389
      
       was
      technically wrong, because the implementation needlessly confusing.
      
      The sane interface is for it to take a semiclosed range like just
      about everything else (as evidenced by the sheer number of "- 1"'s
      removed by that patch) so change the actual implementation to match.
      Reported-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jack Steiner <steiner@sgi.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <20091119202341.GA4420@sgi.com>
      ccef0864
  9. 27 Nov, 2009 12 commits