Skip to content
  • Linus Torvalds's avatar
    vfs: reorganize dput() memory accesses · 8aab6a27
    Linus Torvalds authored
    
    
    This is me being a bit OCD after all the dentry optimization work this
    merge window: profiles end up showing 'dput()' as a rather expensive
    operation, and there were two unrelated bad reasons for that.
    
    The first reason was reading d_lockref.count for debugging purposes,
    which touches the lockref cacheline (for reads) before really need to.
    More importantly, the debugging test in question is _wrong_, and has
    hidden bugs.  It's true that we can only sleep when the count goes down
    to zero, but the test as-is hides the much more subtle bug that happens
    if we race with somebody else deleting the file.
    
    Anyway we _will_ touch that cacheline, but let's do it for a write and
    in the right routine (ie in "lockref_put_or_lock()") which annotates the
    costs better.  So remove the misleading debug code.
    
    The other was an unnecessary access to the cacheline that contains the
    d_lru list, just to check whether we already were on the LRU list or
    not.  This is exactly what we have d_flags for, so that we can avoid
    touching extra cache lines for the common case.  So just add another bit
    for "is this dentry on the LRU".
    
    Finally, mark the tests properly likely/unlikely, so that the common
    fast-paths are dense in the instruction stream.
    
    This makes the profiles look much saner.
    
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    8aab6a27