Skip to content
  • Michal Hocko's avatar
    mm, hugetlb: unclutter hugetlb allocation layers · aaf14e40
    Michal Hocko authored
    Patch series "mm, hugetlb: allow proper node fallback dequeue".
    
    While working on a hugetlb migration issue addressed in a separate
    patchset[1] I have noticed that the hugetlb allocations from the
    preallocated pool are quite subotimal.
    
     [1] //lkml.kernel.org/r/20170608074553.22152-1-mhocko@kernel.org
    
    There is no fallback mechanism implemented and no notion of preferred
    node.  I have tried to work around it but Vlastimil was right to push
    back for a more robust solution.  It seems that such a solution is to
    reuse zonelist approach we use for the page alloctor.
    
    This series has 3 patches.  The first one tries to make hugetlb
    allocation layers more clear.  The second one implements the zonelist
    hugetlb pool allocation and introduces a preferred node semantic which
    is used by the migration callbacks.  The last patch is a clean up.
    
    This patch (of 3):
    
    Hugetlb allocation path for fresh huge pages is unnecessarily complex
    and it mixes different interfaces between layers.
    
    __alloc_buddy_huge_page is the central place to perform a new
    allocation.  It checks for the hugetlb overcommit and then relies on
    __hugetlb_alloc_buddy_huge_page to invoke the page allocator.  This is
    all good except that __alloc_buddy_huge_page pushes vma and address down
    the callchain and so __hugetlb_alloc_buddy_huge_page has to deal with
    two different allocation modes - one for memory policy and other node
    specific (or to make it more obscure node non-specific) requests.
    
    This just screams for a reorganization.
    
    This patch pulls out all the vma specific handling up to
    __alloc_buddy_huge_page_with_mpol where it belongs.
    __alloc_buddy_huge_page will get nodemask argument and
    __hugetlb_alloc_buddy_huge_page will become a trivial wrapper over the
    page allocator.
    
    In short:
    __alloc_buddy_huge_page_with_mpol - memory policy handling
      __alloc_buddy_huge_page - overcommit handling and accounting
        __hugetlb_alloc_buddy_huge_page - page allocator layer
    
    Also note that __hugetlb_alloc_buddy_huge_page and its cpuset retry loop
    is not really needed because the page allocator already handles the
    cpusets update.
    
    Finally __hugetlb_alloc_buddy_huge_page had a special case for node
    specific allocations (when no policy is applied and there is a node
    given).  This has relied on __GFP_THISNODE to not fallback to a different
    node.  alloc_huge_page_node is the only caller which relies on this
    behavior so move the __GFP_THISNODE there.
    
    Not only does this remove quite some code it also should make those
    layers easier to follow and clear wrt responsibilities.
    
    Link: http://lkml.kernel.org/r/20170622193034.28972-2-mhocko@kernel.org
    
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Tested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    aaf14e40