Skip to content
  • Hugh Dickins's avatar
    shmem: writepage directly to swap · 9fab5619
    Hugh Dickins authored
    
    
    Synopsis: if shmem_writepage calls swap_writepage directly, most shmem
    swap loads benefit, and a catastrophic interaction between SLUB and some
    flash storage is avoided.
    
    shmem_writepage() has always been peculiar in making no attempt to write:
    it has just transferred a shmem page from file cache to swap cache, then
    let that page make its way around the LRU again before being written and
    freed.
    
    The idea was that people use tmpfs because they want those pages to stay
    in RAM; so although we give it an overflow to swap, we should resist
    writing too soon, giving those pages a second chance before they can be
    reclaimed.
    
    That was always questionable, and I've toyed with this patch for years;
    but never had a clear justification to depart from the original design.
    
    It became more questionable in 2.6.28, when the split LRU patches classed
    shmem and tmpfs pages as SwapBacked rather than as file_cache: that in
    itself gives them more resistance to reclaim than normal file pages.  I
    prepared this patch for 2.6.29, but the merge window arrived before I'd
    completed gathering statistics to justify sending it in.
    
    Then while comparing SLQB against SLUB, running SLUB on a laptop I'd
    habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping
    tests five times slower than SLAB or SLQB - other machines slower too, but
    nowhere near so bad.  Simpler "cp -a" swapping tests showed the same.
    
    slub_max_order=0 brings sanity to all, but heavy swapping is too far from
    normal to justify such a tuning.  The crucial factor on that laptop turns
    out to be that I'm using an SD card for swap.  What happens is this:
    
    By default, SLUB uses order-2 pages for shmem_inode_cache (and many other
    fs inodes), so creating tmpfs files under memory pressure brings lumpy
    reclaim into play.  One subpage of the order is chosen from the bottom of
    the LRU as usual, then the other three picked out from their random
    positions on the LRUs.
    
    In a tmpfs load, many of these pages will be ones which already passed
    through shmem_writepage, so already have swap allocated.  And though their
    offsets on swap were probably allocated sequentially, now that the pages
    are picked off at random, their swap offsets are scattered.
    
    But the flash storage on the SD card is very sensitive to having its
    writes merged: once swap is written at scattered offsets, performance
    falls apart.  Rotating disk seeks increase too, but less disastrously.
    
    So: stop giving shmem/tmpfs pages a second pass around the LRU, write them
    out to swap as soon as their swap has been allocated.
    
    It's surely possible to devise an artificial load which runs faster the
    old way, one whose sizing is such that the tmpfs pages on their second
    pass are the ones that are wanted again, and other pages not.
    
    But I've not yet found such a load: on all machines, under the loads I've
    tried, immediate swap_writepage speeds up shmem swapping: especially when
    using the SLUB allocator (and more effectively than slub_max_order=0), but
    also with the others; and it also reduces the variance between runs.  How
    much faster varies widely: a factor of five is rare, 5% is common.
    
    One load which might have suffered: imagine a swapping shmem load in a
    limited mem_cgroup on a machine with plenty of memory.  Before 2.6.29 the
    swapcache was not charged, and such a load would have run quickest with
    the shmem swapcache never written to swap.  But now swapcache is charged,
    so even this load benefits from shmem_writepage directly to swap.
    
    Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h:
    it's silly because that will never get called; but refactoring shmem.c
    sensibly according to CONFIG_SWAP will be a separate task.
    
    Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
    Acked-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
    Acked-by: default avatarRik van Riel <riel@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9fab5619