Skip to content
  • Rasmus Villemoes's avatar
    slab.h: sprinkle __assume_aligned attributes · 94a58c36
    Rasmus Villemoes authored
    
    
    The various allocators return aligned memory.  Telling the compiler that
    allows it to generate better code in many cases, for example when the
    return value is immediately passed to memset().
    
    Some code does become larger, but at least we win twice as much as we lose:
    
    $ scripts/bloat-o-meter /tmp/vmlinux vmlinux
    add/remove: 0/0 grow/shrink: 13/52 up/down: 995/-2140 (-1145)
    
    An example of the different (and smaller) code can be seen in mm_alloc(). Before:
    
    :       48 8d 78 08             lea    0x8(%rax),%rdi
    :       48 89 c1                mov    %rax,%rcx
    :       48 89 c2                mov    %rax,%rdx
    :       48 c7 00 00 00 00 00    movq   $0x0,(%rax)
    :       48 c7 80 48 03 00 00    movq   $0x0,0x348(%rax)
    :       00 00 00 00
    :       31 c0                   xor    %eax,%eax
    :       48 83 e7 f8             and    $0xfffffffffffffff8,%rdi
    :       48 29 f9                sub    %rdi,%rcx
    :       81 c1 50 03 00 00       add    $0x350,%ecx
    :       c1 e9 03                shr    $0x3,%ecx
    :       f3 48 ab                rep stos %rax,%es:(%rdi)
    
    After:
    
    :       48 89 c2                mov    %rax,%rdx
    :       b9 6a 00 00 00          mov    $0x6a,%ecx
    :       31 c0                   xor    %eax,%eax
    :       48 89 d7                mov    %rdx,%rdi
    :       f3 48 ab                rep stos %rax,%es:(%rdi)
    
    So gcc's strategy is to do two possibly (but not really, of course)
    unaligned stores to the first and last word, then do an aligned rep stos
    covering the middle part with a little overlap.  Maybe arches which do not
    allow unaligned stores gain even more.
    
    I don't know if gcc can actually make use of alignments greater than 8 for
    anything, so one could probably drop the __assume_xyz_alignment macros and
    just use __assume_aligned(8).
    
    The increases in code size are mostly caused by gcc deciding to
    opencode strlen() using the check-four-bytes-at-a-time trick when it
    knows the buffer is sufficiently aligned (one function grew by 200
    bytes). Now it turns out that many of these strlen() calls showing up
    were in fact redundant, and they're gone from -next. Applying the two
    patches to next-20151001 bloat-o-meter instead says
    
    add/remove: 0/0 grow/shrink: 6/52 up/down: 244/-2140 (-1896)
    
    Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
    Acked-by: default avatarChristoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    94a58c36