- 20 Mar, 2013 1 commit
-
-
Paolo Bonzini authored
The CS base was initialized to 0 on VMX (wrong, but usually overridden by userspace before starting) or 0xf0000 on SVM. The correct value is 0xffff0000, and VMX is able to emulate it now, so use it. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 18 Mar, 2013 1 commit
-
-
Jan Kiszka authored
Very old user space (namely qemu-kvm before kvm-49) didn't set the TSS base before running the VCPU. We always warned about this bug, but no reports about users actually seeing this are known. Time to finally remove the workaround that effectively prevented to call vmx_vcpu_reset while already holding the KVM srcu lock. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 14 Mar, 2013 2 commits
-
-
Jan Kiszka authored
Provided the host has this feature, it's straightforward to offer it to the guest as well. We just need to load to timer value on L2 entry if the feature was enabled by L1 and watch out for the corresponding exit reason. Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Jan Kiszka authored
We will need EFER.LMA saving to provide unrestricted guest mode. All what is missing for this is picking up EFER.LMA from VM_ENTRY_CONTROLS on L2->L1 switches. If the host does not support EFER.LMA saving, no change is performed, otherwise we properly emulate for L1 what the hardware does for L0. Advertise the support, depending on the host feature. Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 13 Mar, 2013 2 commits
-
-
Jan Kiszka authored
Only interrupt and NMI exiting are mandatory for KVM to work, thus can be exposed to the guest unconditionally, virtual NMI exiting is optional. So we must not advertise it unless the host supports it. Introduce the symbolic constant PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR at this chance. Reviewed-by:
: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Jan Kiszka authored
A VCPU sending INIT or SIPI to some other VCPU races for setting the remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED was overwritten by kvm_emulate_halt and, thus, got lost. This introduces APIC events for those two signals, keeping them in kvm_apic until kvm_apic_accept_events is run over the target vcpu context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there are pending events, thus if vcpu blocking should end. The patch comes with the side effect of effectively obsoleting KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI. The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED state. That also means we no longer exit to user space after receiving a SIPI event. Furthermore, we already reset the VCPU on INIT, only fixing up the code segment later on when SIPI arrives. Moreover, we fix INIT handling for the BSP: it never enter wait-for-SIPI but directly starts over on INIT. Tested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 12 Mar, 2013 1 commit
-
-
Jan Kiszka authored
Neither vmx nor svm nor the common part may generate an error on kvm_vcpu_reset. So drop the return code. Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 11 Mar, 2013 1 commit
-
-
Ioan Orghici authored
Signed-off-by:
Ioan <Orghici<ioan.orghici@gmail.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 07 Mar, 2013 2 commits
-
-
Jan Kiszka authored
The logic for calculating the value with which we call kvm_set_cr0/4 was broken (will definitely be visible with nested unrestricted guest mode support). Also, we performed the check regarding CR0_ALWAYSON too early when in guest mode. What really needs to be done on both CR0 and CR4 is to mask out L1-owned bits and merge them in from L1's guest_cr0/4. In contrast, arch.cr0/4 and arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, are not suited as input. For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and refuse the update if it fails. To be fully consistent, we implement this check now also for CR4. For CR4, we move the check into vmx_set_cr4 while we keep it in handle_set_cr0. This is because the CR0 checks for vmxon vs. guest mode will diverge soon when adding unrestricted guest mode support. Finally, we have to set the shadow to the value L2 wanted to write originally. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
Properly set those bits to 1 that the spec demands in case bit 55 of VMX_BASIC is 0 - like in our case. Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 05 Mar, 2013 2 commits
-
-
Jan Kiszka authored
Ouch, how could this work so well that far? We need to clear RFLAGS to the reset value as specified by the SDM. Particularly, IF must be off after VM-exit! Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
First of all, do not blindly overwrite GUEST_DR7 on L2 entry. The host may have guest debugging enabled. Then properly reset DR7 and DEBUG_CTL on L2->L1 switch as specified in the SDM. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 04 Mar, 2013 1 commit
-
-
Takuya Yoshikawa authored
Except ia64's stale code, KVM_SET_MEMORY_REGION support, this is only used for sanity checks in __kvm_set_memory_region() which can easily be changed to use slot id instead. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 28 Feb, 2013 2 commits
-
-
Jan Kiszka authored
Cleanup: __vmx_complete_interrupts has no use for the vmx structure. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Jan Kiszka authored
IDT_VECTORING_INFO_FIELD was already read right after vmexit. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 27 Feb, 2013 4 commits
-
-
Jan Kiszka authored
No need to re-read what vmx_vcpu_run already picked up for us. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Jan Kiszka authored
Switching the VMCS obviously invalidates what may have been cached about the guest segments. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Jan Kiszka authored
These exits have no preconditions, and we already process the corresponding reasons in nested_vmx_exit_handled correctly. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Jan Kiszka authored
Both are only used locally. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 22 Feb, 2013 2 commits
-
-
Jan Kiszka authored
This avoids basing decisions on uninitialized variables, potentially leaking kernel data to the L1 guest. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
This prevents trapping L2 I/O exits if L1 has neither unconditional nor bitmap-based exiting enabled. Furthermore, it implements I/O bitmap handling. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 14 Feb, 2013 1 commit
-
-
Jan Kiszka authored
We already pass vmcs12 as argument. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 11 Feb, 2013 1 commit
-
-
Yang Zhang authored
Without Posted Interrupt, current code is broken. Just disable by default until Posted Interrupt is ready. Signed-off-by:
Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 07 Feb, 2013 1 commit
-
-
Gleb Natapov authored
When calculating hw_cr0 teh current code masks bits that should be always on and re-adds them back immediately after. Cleanup the code by masking only those bits that should be dropped from hw_cr0. This allow us to get rid of some defines. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 06 Feb, 2013 1 commit
-
-
Dongxiao Xu authored
SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by:
Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 29 Jan, 2013 3 commits
-
-
Yang Zhang authored
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Yang Zhang authored
basically to benefit from apicv, we need to enable virtualized x2apic mode. Currently, we only enable it when guest is really using x2apic. Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic: 0x800 - 0x8ff: no read intercept for apicv register virtualization, except APIC ID and TMCCT which need software's assistance to get right value. Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Yang Zhang authored
- APIC read doesn't cause VM-Exit - APIC write becomes trap-like Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Yang Zhang <yang.z.zhang@intel.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 24 Jan, 2013 8 commits
-
-
Gleb Natapov authored
If emulate_invalid_guest_state=false vmx->emulation_required is never actually used, but it ends up to be always set to true since handle_invalid_guest_state(), the only place it is reset back to false, is never called. This, besides been not very clean, makes vmexit and vmentry path to check emulate_invalid_guest_state needlessly. The patch fixes that by keeping emulation_required coherent with emulate_invalid_guest_state setting. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
The function deals with code segment too. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Usability is returned in unusable field, so not need to clobber entire AR. Callers have to know how to deal with unusable segments already since if emulate_invalid_guest_state=true AR is not zeroed. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
vmx->rmode.vm86_active is never true is unrestricted guest is enabled. Make it more explicit that neither enter_pmode() nor enter_rmode() is called in this case. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
There is no reason for it. If state is suitable for vmentry it will be detected during guest entry and no emulation will happen. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Since vmx_get_cpl() always returns 0 when VCPU is in real mode it is no longer needed. Also reset CPL cache to zero during transaction to protected mode since transaction may happen while CS.selectors & 3 != 0, but in reality CPL is 0. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 08 Jan, 2013 1 commit
-
-
Marcelo Tosatti authored
CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 02 Jan, 2013 3 commits
-
-
Gleb Natapov authored
With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can enter the vcpu with smaller segment limit than guest configured. If the guest tries to access pass this limit it will get #GP at which point instruction will be emulated with correct segment limit applied. If during the emulation IO is detected it is not handled correctly. Vcpu thread should exit to userspace to serve the IO, but it returns to the guest instead. Since emulation is not completed till userspace completes the IO the faulty instruction is re-executed ad infinitum. The patch fixes that by exiting to userspace if IO happens during instruction emulation. Reported-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Segment registers will be fixed according to current emulation policy during switching to real mode for the first time. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Currently when emulation of invalid guest state is enable (emulate_invalid_guest_state=1) segment registers are still fixed for entry to vm86 mode some times. Segment register fixing is avoided in enter_rmode(), but vmx_set_segment() still does it unconditionally. The patch fixes it. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-