- 06 Sep, 2015 1 commit
-
-
Wanpeng Li authored
Change halt_poll_ns into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 30 Jul, 2015 1 commit
-
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 29 Jul, 2015 1 commit
-
-
Paolo Bonzini authored
This is another remnant of ia64 support. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Jul, 2015 2 commits
-
-
Andrey Smetanin authored
Sending of notification is done by exiting vcpu to user space if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space the kvm_run structure contains system_event with type KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred. Signed-off-by:
Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by:
Denis V. Lunev <den@openvz.org> Reviewed-by:
Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Andrey Smetanin authored
vcpu_debug is useful macro like kvm_debug but additionally includes vcpu context inside output. Signed-off-by:
Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by:
Denis V. Lunev <den@openvz.org> Reviewed-by:
Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 10 Jul, 2015 1 commit
-
-
Paolo Bonzini authored
If there are no assigned devices, the guest PAT are not providing any useful information and can be overridden to writeback; VMX always does this because it has the "IPAT" bit in its extended page table entries, but SVM does not have anything similar. Hook into VFIO and legacy device assignment so that they provide this information to KVM. Reviewed-by:
Alex Williamson <alex.williamson@redhat.com> Tested-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 05 Jun, 2015 2 commits
-
-
Paolo Bonzini authored
Only two ioctls have to be modified; the address space id is placed in the higher 16 bits of their slot id argument. As of this patch, no architecture defines more than one address space; x86 will be the first. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to use different address spaces, depending on whether the VCPU is in system management mode. We need to introduce a new family of functions for this purpose. For now, the VCPU-based functions have the same behavior as the existing per-VM ones, they just accept a different type for the first argument. Later however they will be changed to use one of many "struct kvm_memslots" stored in struct kvm, through an architecture hook. VM-based functions will unconditionally use the first memslots pointer. Whenever possible, this patch introduces slot-based functions with an __ prefix, with two wrappers for generic and vcpu-based actions. The exceptions are kvm_read_guest and kvm_write_guest, which are copied into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 04 Jun, 2015 1 commit
-
-
Paolo Bonzini authored
This patch adds the interface between x86.c and the emulator: the SMBASE register, a new emulator flag, the RSM instruction. It also adds a new request bit that will be used by the KVM_SMI ioctl. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 28 May, 2015 2 commits
-
-
Paolo Bonzini authored
The memory slot is already available from gfn_to_memslot_dirty_bitmap. Isn't it a shame to look it up again? Plus, it makes gfn_to_page_many_atomic agnostic of multiple VCPU address spaces. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 26 May, 2015 2 commits
-
-
Paolo Bonzini authored
Prepare for the case of multiple address spaces. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 19 May, 2015 1 commit
-
-
Paolo Bonzini authored
gfn_to_pfn_async is used in just one place, and because of x86-specific treatment that place will need to look at the memory slot. Hence inline it into try_async_pf and export __gfn_to_pfn_memslot. The patch also switches the subsequent call to gfn_to_pfn_prot to use __gfn_to_pfn_memslot. This is a small optimization. Finally, remove the now-unused async argument of __gfn_to_pfn. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 07 May, 2015 2 commits
-
-
Rik van Riel authored
Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to re-load them every time the guest accesses the FPU after a switch back into the guest from the host. This patch copies the x86 task switch semantics for FPU loading, with the FPU loaded eagerly after first use if the system uses eager fpu mode, or if the guest uses the FPU frequently. In the latter case, after loading the FPU for 255 times, the fpu_counter will roll over, and we will revert to loading the FPU on demand, until it has been established that the guest is still actively using the FPU. This mirrors the x86 task switch policy, which seems to work. Signed-off-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christian Borntraeger authored
Several kvm architectures disable interrupts before kvm_guest_enter. kvm_guest_enter then uses local_irq_save/restore to disable interrupts again or for the first time. Lets provide underscore versions of kvm_guest_{enter|exit} that assume being called locked. kvm_guest_enter now disables interrupts for the full function and thus we can remove the check for preemptible. This patch then adopts s390/kvm to use local_irq_disable/enable calls which are slighty cheaper that local_irq_save/restore and call these new functions. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 08 Apr, 2015 1 commit
-
-
Nadav Amit authored
After reset, the CPU can change the BSP, which will be used upon INIT. Reset should return the BSP which QEMU asked for, and therefore handled accordingly. To quote: "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions] Signed-off-by:
Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 26 Mar, 2015 1 commit
-
-
Nikolay Nikolaev authored
This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by:
Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 12 Mar, 2015 1 commit
-
-
Eric Auger authored
Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and associated kvm_arch_intc_initialized function. This latter allows to test whether the virtual interrupt controller is initialized and ready to accept virtual IRQ injection. On some architectures, the virtual interrupt controller is dynamically instantiated, justifying that kind of check. The new function can now be used by irqfd to check whether the virtual interrupt controller is ready on KVM_IRQFD request. If not, KVM_IRQFD returns -EAGAIN. Signed-off-by:
Eric Auger <eric.auger@linaro.org> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
- 10 Mar, 2015 1 commit
-
-
Thomas Huth authored
kvm_kvfree() provides exactly the same functionality as the new common kvfree() function - so let's simply replace the kvm function with the common function. Signed-off-by:
Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 09 Mar, 2015 1 commit
-
-
Rik van Riel authored
The host kernel is not doing anything while the CPU is executing a KVM guest VCPU, so it can be marked as being in an extended quiescent state, identical to that used when running user space code. The only exception to that rule is when the host handles an interrupt, which is already handled by the irq code, which calls rcu_irq_enter and rcu_irq_exit. The guest_enter and guest_exit functions already switch vtime accounting independent of context tracking. Leave those calls where they are, instead of moving them into the context tracking code. Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will deacon <will.deacon@arm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com>
-
- 12 Feb, 2015 1 commit
-
-
Andrea Arcangeli authored
Use the more generic get_user_pages_unlocked which has the additional benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault (which allows the first page fault in an unmapped area to be always able to block indefinitely by being allowed to release the mmap_sem). Signed-off-by:
Andrea Arcangeli <aarcange@redhat.com> Reviewed-by:
Andres Lagar-Cavilla <andreslc@google.com> Reviewed-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 05 Feb, 2015 1 commit
-
-
Tiejun Chen authored
After f78146b0, "KVM: Fix page-crossing MMIO", and 87da7e66 , "KVM: x86: fix vcpu->mmio_fragments overflow", actually KVM_MMIO_SIZE is gone. Signed-off-by:
Tiejun Chen <tiejun.chen@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 29 Jan, 2015 1 commit
-
-
Kai Huang authored
We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by:
Kai Huang <kai.huang@linux.intel.com> Reviewed-by:
Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Jan, 2015 1 commit
-
-
Dominik Dingel authored
The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by:
Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com>
-
- 20 Jan, 2015 2 commits
-
-
André Przywara authored
With everything separated and prepared, we implement a model of a GICv3 distributor and redistributors by using the existing framework to provide handler functions for each register group. Currently we limit the emulation to a model enforcing a single security state, with SRE==1 (forcing system register access) and ARE==1 (allowing more than 8 VCPUs). We share some of the functions provided for GICv2 emulation, but take the different ways of addressing (v)CPUs into account. Save and restore is currently not implemented. Similar to the split-off of the GICv2 specific code, the new emulation code goes into a new file (vgic-v3-emul.c). Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
André Przywara authored
Currently we unconditionally register the GICv2 emulation device during the host's KVM initialization. Since with GICv3 support we may end up with only v2 or only v3 or both supported, we move the registration into the GIC probing function, where we will later know which combination is valid. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
- 16 Jan, 2015 1 commit
-
-
Mario Smarduch authored
kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused by several architectures. Building on that we intrdoduce kvm_get_dirty_log_protect() adding write protection to mark these pages dirty for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user space. Reviewed-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Mario Smarduch <m.smarduch@samsung.com>
-
- 04 Dec, 2014 2 commits
-
-
Igor Mammedov authored
Current linear search doesn't scale well when large amount of memslots is used and looked up slot is not in the beginning memslots array. Taking in account that memslots don't overlap, it's possible to switch sorting order of memslots array from 'npages' to 'base_gfn' and use binary search for memslot lookup by GFN. As result of switching to binary search lookup times are reduced with large amount of memslots. Following is a table of search_memslot() cycles during WS2008R2 guest boot. boot, boot + ~10 min mostly same of using it, slot lookup randomized lookup max average average cycles cycles cycles 13 slots : 1450 28 30 13 slots : 1400 30 40 binary search 117 slots : 13000 30 460 117 slots : 2000 35 180 binary search Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Igor Mammedov authored
In typical guest boot workload only 2-3 memslots are used extensively, and at that it's mostly the same memslot lookup operation. Adding LRU cache improves average lookup time from 46 to 28 cycles (~40%) for this workload. Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 26 Nov, 2014 1 commit
-
-
Ard Biesheuvel authored
This reverts commit 85c8555f ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 25 Nov, 2014 2 commits
-
-
Ard Biesheuvel authored
Memory regions may be incoherent with the caches, typically when the guest has mapped a host system RAM backed memory region as uncached. Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots and handle them appropriately when mapping them. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Ard Biesheuvel authored
This reverts commit 85c8555f ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 24 Nov, 2014 1 commit
-
-
Paolo Bonzini authored
Create a new header, and hide the device assignment functions there. Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying arch/x86/kvm/iommu.c to take a PCI device struct. Based on a patch by Radim Krcmar <rkrcmark@redhat.com>. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Nov, 2014 1 commit
-
-
Radim Krčmář authored
Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 21 Nov, 2014 1 commit
-
-
Paolo Bonzini authored
ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by:
Gleb Natapov <gleb@kernel.org> Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Oct, 2014 1 commit
-
-
Wanpeng Li authored
After commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically), kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd (kvm-vfio: do not use module_init) move the dynamic register invoked by kvm_init in order to fix broke unloading of the kvm module. However, kvm_device_ops of vfio is unregistered after rmmod kvm-intel module which lead to device type collision detection warning after kvm-intel module reinsmod. WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]() Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel] CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9 0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48 ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff Call Trace: [<ffffffff814a61d9>] dump_stack+0x49/0x60 [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96 [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm] [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17 [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm] [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel] [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel] [<ffffffff810002ab>] do_one_initcall+0xe3/0x170 [<ffffffff811168a9>] ? __vunmap+0xad/0xb8 [<ffffffff8109c58f>] do_init_module+0x2b/0x174 [<ffffffff8109d414>] load_module+0x43e/0x569 [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174 [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82 [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20 [<ffffffff8109d65f>] SyS_init_module+0x54/0x81 [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b ---[ end trace 0626f4a3ddea56f3 ]--- The bug can be reproduced by: rmmod kvm_intel.ko insmod kvm_intel.ko without rmmod/insmod kvm.ko This patch fixes the bug by unregistering kvm_device_ops of vfio when the kvm-intel module is removed. Reported-by:
Liu Rongrong <rongrongx.liu@intel.com> Fixes: 3c3c29fd Signed-off-by:
Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Sep, 2014 3 commits
-
-
Tang Chen authored
Currently, the APIC access page is pinned by KVM for the entire life of the guest. We want to make it migratable in order to make memory hot-unplug available for machines that run KVM. This patch prepares to handle this in generic code, through a new request bit (that will be set by the MMU notifier) and a new hook that is called whenever the request bit is processed. Signed-off-by:
Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Tang Chen authored
Different architectures need different requests, and in fact we will use this function in architecture-specific code later. This will be outside kvm_main.c, so make it non-static and rename it to kvm_make_all_cpus_request(). Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Andres Lagar-Cavilla authored
When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory has been swapped out or is behind a filemap, this will trigger async readahead and return immediately. The rationale is that KVM will kick back the guest with an "async page fault" and allow for some other guest process to take over. If async PFs are enabled the fault is retried asap from an async workqueue. If not, it's retried immediately in the same code path. In either case the retry will not relinquish the mmap semaphore and will block on the IO. This is a bad thing, as other mmap semaphore users now stall as a function of swap or filemap latency. This patch ensures both the regular and async PF path re-enter the fault allowing for the mmap semaphore to be relinquished in the case of IO wait. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Andres Lagar-Cavilla <andreslc@google.com> Acked-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-