• Mel Gorman's avatar
    hugetlb: move hugetlb_acct_memory() · fc1b8a73
    Mel Gorman authored
    This is a patchset to give reliable behaviour to a process that
    successfully calls mmap(MAP_PRIVATE) on a hugetlbfs file.  Currently, it
    is possible for the process to be killed due to a small hugepage pool size
    even if it calls mlock().
    MAP_SHARED mappings on hugetlbfs reserve huge pages at mmap() time.  This
    guarantees all future faults against the mapping will succeed.  This
    allows local allocations at first use improving NUMA locality whilst
    retaining reliability.
    MAP_PRIVATE mappings do not reserve pages.  This can result in an
    application being SIGKILLed later if a huge page is not available at fault
    time.  This makes huge pages usage very ill-advised in some cases as the
    unexpected application failure cannot be detected and handled as it is
    immediately fatal.  Although an application may force instantiation of the
    pages using mlock(), this may lead to poor memory placement and the
    process may still be killed when performing COW.
    This patchset introduces a reliability guarantee for the process which
    creates a private mapping, i.e.  the process that calls mmap() on a
    hugetlbfs file successfully.  The first patch of the set is purely
    mechanical code move to make later diffs easier to read.  The second patch
    will guarantee faults up until the process calls fork().  After patch two,
    as long as the child keeps the mappings, the parent is no longer
    guaranteed to be reliable.  Patch 3 guarantees that the parent will always
    successfully COW by unmapping the pages from the child in the event there
    are insufficient pages in the hugepage pool in allocate a new page, be it
    via a static or dynamic pool.
    Existing hugepage-aware applications are unlikely to be affected by this
    change.  For much of hugetlbfs's history, pages were pre-faulted at mmap()
    time or mmap() failed which acts in a reserve-like manner.  If the pool is
    sized correctly already so that parent and child can fault reliably, the
    application will not even notice the reserves.  It's only when the pool is
    too small for the application to function perfectly reliably that the
    reserves come into play.
    Credit goes to Andy Whitcroft for cleaning up a number of mistakes during
    review before the patches were released.
    This patch:
    A later patch in this set needs to call hugetlb_acct_memory() before it is
    defined.  This patch moves the function without modification.  This makes
    later diffs easier to read.
    Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
    Acked-by: default avatarAdam Litke <agl@us.ibm.com>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>