Skip to content
  • David Gibson's avatar
    [PATCH] hugepage: serialize hugepage allocation and instantiation · 3935baa9
    David Gibson authored
    
    
    Currently, no lock or mutex is held between allocating a hugepage and
    inserting it into the pagetables / page cache.  When we do go to insert the
    page into pagetables or page cache, we recheck and may free the newly
    allocated hugepage.  However, since the number of hugepages in the system
    is strictly limited, and it's usualy to want to use all of them, this can
    still lead to spurious allocation failures.
    
    For example, suppose two processes are both mapping (MAP_SHARED) the same
    hugepage file, large enough to consume the entire available hugepage pool.
    If they race instantiating the last page in the mapping, they will both
    attempt to allocate the last available hugepage.  One will fail, of course,
    returning OOM from the fault and thus causing the process to be killed,
    despite the fact that the entire mapping can, in fact, be instantiated.
    
    The patch fixes this race by the simple method of adding a (sleeping) mutex
    to serialize the hugepage fault path between allocation and insertion into
    pagetables and/or page cache.  It would be possible to avoid the
    serialization by catching the allocation failures, waiting on some
    condition, then rechecking to see if someone else has instantiated the page
    for us.  Given the likely frequency of hugepage instantiations, it seems
    very doubtful it's worth the extra complexity.
    
    This patch causes no regression on the libhugetlbfs testsuite, and one
    test, which can trigger this race now passes where it previously failed.
    
    Actually, the test still sometimes fails, though less often and only as a
    shmat() failure, rather processes getting OOM killed by the VM.  The dodgy
    heuristic tests in fs/hugetlbfs/inode.c for whether there's enough hugepage
    space aren't protected by the new mutex, and would be ugly to do so, so
    there's still a race there.  Another patch to replace those tests with
    something saner for this reason as well as others coming...
    
    Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    3935baa9