1. 30 Nov, 2018 2 commits
    • Andrea Arcangeli's avatar
      userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem · 5b51072e
      Andrea Arcangeli authored
      Userfaultfd did not create private memory when UFFDIO_COPY was invoked
      on a MAP_PRIVATE shmem mapping.  Instead it wrote to the shmem file,
      even when that had not been opened for writing.  Though, fortunately,
      that could only happen where there was a hole in the file.
      
      Fix the shmem-backed implementation of UFFDIO_COPY to create private
      memory for MAP_PRIVATE mappings.  The hugetlbfs-backed implementation
      was already correct.
      
      This change is visible to userland, if userfaultfd has been used in
      unintended ways: so it introduces a small risk of incompatibility, but
      is necessary in order to respect file permissions.
      
      An app that uses UFFDIO_COPY for anything like postcopy live migration
      won't notice the difference, and in fact it'll run faster because there
      will be no copy-on-write and memory waste in the tmpfs pagecache
      anymore.
      
      Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
      before.
      
      The real zeropage can also be built on a MAP_PRIVATE shmem mapping
      through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is
      never dirty, in turn even an mprotect upgrading the vma permission from
      PROT_READ to PROT_READ|PROT_WRITE won't make the zeropage pte writable.
      
      Link: http://lkml.kernel.org/r/20181126173452.26955-3-aarcange@redhat.com
      Fixes: 4c27fe4c
      
       ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b51072e
    • Andrea Arcangeli's avatar
      userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails · 9e368259
      Andrea Arcangeli authored
      Patch series "userfaultfd shmem updates".
      
      Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the
      lack of the VM_MAYWRITE check and the lack of i_size checks.
      
      Then looking into the above we also fixed the MAP_PRIVATE case.
      
      Hugh by source review also found a data loss source if UFFDIO_COPY is
      used on shmem MAP_SHARED PROT_READ mappings (the production usages
      incidentally run with PROT_READ|PROT_WRITE, so the data loss couldn't
      happen in those production usages like with QEMU).
      
      The whole patchset is marked for stable.
      
      We verified QEMU postcopy live migration with guest running on shmem
      MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE.
      Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU
      unconditionally invokes a punch hole if the guest mapping is filebacked
      and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and
      for the anon backend).
      
      This patch (of 5):
      
      We internally used EFAULT to communicate with the caller, switch to
      ENOENT, so EFAULT can be used as a non internal retval.
      
      Link: http://lkml.kernel.org/r/20181126173452.26955-2-aarcange@redhat.com
      Fixes: 4c27fe4c
      
       ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e368259
  2. 08 Jun, 2018 1 commit
    • Mike Rapoport's avatar
      userfaultfd: prevent non-cooperative events vs mcopy_atomic races · df2cc96e
      Mike Rapoport authored
      If a process monitored with userfaultfd changes it's memory mappings or
      forks() at the same time as uffd monitor fills the process memory with
      UFFDIO_COPY, the actual creation of page table entries and copying of
      the data in mcopy_atomic may happen either before of after the memory
      mapping modifications and there is no way for the uffd monitor to
      maintain consistent view of the process memory layout.
      
      For instance, let's consider fork() running in parallel with
      userfaultfd_copy():
      
      process        		         |	uffd monitor
      ---------------------------------+------------------------------
      fork()        		         | userfaultfd_copy()
      ...        		         | ...
          dup_mmap()        	         |     down_read(mmap_sem)
          down_write(mmap_sem)         |     /* create PTEs, copy data */
              dup_uffd()               |     up_read(mmap_sem)
              copy_page_range()        |
              up_write(mmap_sem)       |
              dup_uffd_complete()      |
                  /* notify monitor */ |
      
      If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
      be present by the time copy_page_range() is called and they will appear
      in the child's memory mappings.  However, if the fork() is the first to
      take the mmap_sem, the new pages won't be mapped in the child's address
      space.
      
      If the pages are not present and child tries to access them, the monitor
      will get page fault notification and everything is fine.  However, if
      the pages *are present*, the child can access them without uffd
      noticing.  And if we copy them into child it'll see the wrong data.
      Since we are talking about background copy, we'd need to decide whether
      the pages should be copied or not regardless #PF notifications.
      
      Since userfaultfd monitor has no way to determine what was the order,
      let's disallow userfaultfd_copy in parallel with the non-cooperative
      events.  In such case we return -EAGAIN and the uffd monitor can
      understand that userfaultfd_copy() clashed with a non-cooperative event
      and take an appropriate action.
      
      Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarPavel Emelyanov <xemul@virtuozzo.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Andrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df2cc96e
  3. 07 Feb, 2018 1 commit
  4. 07 Sep, 2017 2 commits
  5. 09 Mar, 2017 1 commit
  6. 02 Mar, 2017 1 commit
  7. 25 Feb, 2017 1 commit
  8. 23 Feb, 2017 6 commits
  9. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patc...
      09cbfeaf
  10. 17 Mar, 2016 1 commit
  11. 16 Jan, 2016 2 commits
  12. 04 Sep, 2015 2 commits