• Andy Whitcroft's avatar
    hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits · 84afd99b
    Andy Whitcroft authored
    
    
    When a hugetlb mapping with a reservation is split, a new VMA is cloned
    from the original.  This new VMA is a direct copy of the original
    including the reservation count.  When this pair of VMAs are unmapped we
    will incorrect double account the unused reservation and the overall
    reservation count will be incorrect, in extreme cases it will wrap.
    
    The problem occurs when we split an existing VMA say to unmap a page in
    the middle.  split_vma() will create a new VMA copying all fields from the
    original.  As we are storing our reservation count in vm_private_data this
    is also copies, endowing the new VMA with a duplicate of the original
    VMA's reservation.  Neither of the new VMAs can exhaust these reservations
    as they are too small, but when we unmap and close these VMAs we will
    incorrect credit the remainder twice and resv_huge_pages will become out
    of sync.  This can lead to allocation failures on mappings with
    reservations and even to resv_huge_pages wrapping which prevents all
    subsequent hugepage allocations.
    
    The simple fix would be to correctly apportion the remaining reservation
    count when the split is made.  However the only hook we have vm_ops->open
    only has the new VMA we do not know the identity of the preceeding VMA.
    Also even if we did have that VMA to hand we do not know how much of the
    reservation was consumed each side of the split.
    
    This patch therefore takes a different tack.  We know that the whole of
    any private mapping (which has a reservation) has a reservation over its
    whole size.  Any present pages represent consumed reservation.  Therefore
    if we track the instantiated pages we can calculate the remaining
    reservation.
    
    This patch reuses the existing regions code to track the regions for which
    we have consumed reservation (ie.  the instantiated pages), as each page
    is faulted in we record the consumption of reservation for the new page.
    When we need to return unused reservations at unmap time we simply count
    the consumed reservation region subtracting that from the whole of the
    map.  During a VMA split the newly opened VMA will point to the same
    region map, as this map is offset oriented it remains valid for both of
    the split VMAs.  This map is referenced counted so that it is removed when
    all VMAs which are part of the mmap are gone.
    
    Thanks to Adam Litke and Mel Gorman for their review feedback.
    Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
    Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
    Cc: Adam Litke <agl@us.ibm.com>
    Cc: Johannes Weiner <hannes@saeurebad.de>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
    Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    84afd99b