Skip to content
  • Lee Schermerhorn's avatar
    numa: slab: use numa_mem_id() for slab local memory node · 7d6e6d09
    Lee Schermerhorn authored
    
    
    Example usage of generic "numa_mem_id()":
    
    The mainline slab code, since ~ 2.6.19, does not handle memoryless nodes
    well.  Specifically, the "fast path"--____cache_alloc()--will never
    succeed as slab doesn't cache offnode object on the per cpu queues, and
    for memoryless nodes, all memory will be "off node" relative to
    numa_node_id().  This adds significant overhead to all kmem cache
    allocations, incurring a significant regression relative to earlier
    kernels [from before slab.c was reorganized].
    
    This patch uses the generic topology function "numa_mem_id()" to return
    the "effective local memory node" for the calling context.  This is the
    first node in the local node's generic fallback zonelist-- the same node
    that "local" mempolicy-based allocations would use.  This lets slab cache
    these "local" allocations and avoid fallback/refill on every allocation.
    
    N.B.: Slab will need to handle node and memory hotplug events that could
    change the value returned by numa_mem_id() for any given node if recent
    changes to address memory hotplug don't already address this.  E.g., flush
    all per cpu slab queues before rebuilding the zonelists while the
    "machine" is held in the stopped state.
    
    Performance impact on "hackbench 400 process 200"
    
    2.6.34-rc3-mmotm-100405-1609		no-patch	this-patch
    ia64 no memoryless nodes [avg of 10]:     11.713       11.637  ~0.65 diff
    ia64 cpus all on memless nodes  [10]:    228.259       26.484  ~8.6x speedup
    
    The slowdown of the patched kernel from ~12 sec to ~28 seconds when
    configured with memoryless nodes is the result of all cpus allocating from
    a single node's mm pagepool.  The cache lines of the single node are
    distributed/interleaved over the memory of the real physical nodes, but
    the zone lock, list heads, ...  of the single node with memory still each
    live in a single cache line that is accessed from all processors.
    
    x86_64 [8x6 AMD] [avg of 40]:		2.883	   2.845
    
    Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Mel Gorman <mel@csn.ul.ie>
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Eric Whitney <eric.whitney@hp.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: "Luck, Tony" <tony.luck@intel.com>
    Cc: Pekka Enberg <penberg@cs.helsinki.fi>
    Cc: <linux-arch@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7d6e6d09