1. 22 Feb, 2015 3 commits
  2. 26 Jan, 2015 1 commit
  3. 19 Nov, 2014 2 commits
  4. 03 Nov, 2014 1 commit
  5. 12 Oct, 2014 2 commits
  6. 09 Oct, 2014 2 commits
  7. 15 Sep, 2014 1 commit
    • Linus Torvalds's avatar
      vfs: avoid non-forwarding large load after small store in path lookup · 9226b5b4
      Linus Torvalds authored
      The performance regression that Josef Bacik reported in the pathname
      lookup (see commit 99d263d4
       "vfs: fix bad hashing of dentries") made
      me look at performance stability of the dcache code, just to verify that
      the problem was actually fixed.  That turned up a few other problems in
      this area.
      There are a few cases where we exit RCU lookup mode and go to the slow
      serializing case when we shouldn't, Al has fixed those and they'll come
      in with the next VFS pull.
      But my performance verification also shows that link_path_walk() turns
      out to have a very unfortunate 32-bit store of the length and hash of
      the name we look up, followed by a 64-bit read of the combined hash_len
      field.  That screws up the processor store to load forwarding, causing
      an unnecessary hickup in this critical routine.
      It's caused by the ugly calling convention for the "hash_name()"
      function, and easily fixed by just making hash_name() fill in the whole
      'struct qstr' rather than passing it a pointer to just the hash value.
      With that, the profile for this function looks much smoother.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 07 Aug, 2014 1 commit
    • J. Bruce Fields's avatar
      dcache: d_obtain_alias callers don't all want DISCONNECTED · 1a0a397e
      J. Bruce Fields authored
      There are a few d_obtain_alias callers that are using it to get the
      root of a filesystem which may already have an alias somewhere else.
      This is not the same as the filehandle-lookup case, and none of them
      actually need DCACHE_DISCONNECTED set.
      It isn't really a serious problem, but it would really be clearer if we
      reserved DCACHE_DISCONNECTED for those cases where it's actually needed.
      In the btrfs case this was causing a spurious printk from
      nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
      dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
      manually in 3a0dfa6a
       "Btrfs: unset DCACHE_DISCONNECTED when mounting
      default subvol", and this replaces that workaround.
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  9. 01 May, 2014 1 commit
    • Al Viro's avatar
      dentry_kill(): don't try to remove from shrink list · 41edf278
      Al Viro authored
      If the victim in on the shrink list, don't remove it from there.
      If shrink_dentry_list() manages to remove it from the list before
      we are done - fine, we'll just free it as usual.  If not - mark
      it with new flag (DCACHE_MAY_FREE) and leave it there.
      Eventually, shrink_dentry_list() will get to it, remove the sucker
      from shrink list and call dentry_kill(dentry, 0).  Which is where
      we'll deal with freeing.
      Since now dentry_kill(dentry, 0) may happen after or during
      dentry_kill(dentry, 1), we need to recognize that (by seeing
      DCACHE_DENTRY_KILLED already set), unlock everything
      and either free the sucker (in case DCACHE_MAY_FREE has been
      set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  10. 01 Apr, 2014 2 commits
  11. 12 Dec, 2013 1 commit
  12. 09 Nov, 2013 1 commit
    • David Howells's avatar
      VFS: Put a small type field into struct dentry::d_flags · b18825a7
      David Howells authored
      Put a type field into struct dentry::d_flags to indicate if the dentry is one
      of the following types that relate particularly to pathwalk:
      	Miss (negative dentry)
      	"Automount" directory (defective - no i_op->lookup())
      	Other (regular, socket, fifo, device)
      The type field is set to one of the first five types on a dentry by calls to
      __d_instantiate() and d_obtain_alias() from information in the inode (if one is
      The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
      dentry as a negative dentry.
      Accessors provided are:
      	d_set_type(dentry, type)
      A bunch of checks in pathname resolution switched to those.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  13. 25 Oct, 2013 1 commit
  14. 10 Sep, 2013 2 commits
    • Glauber Costa's avatar
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa authored
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Glauber Costa's avatar
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa authored
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      This patch:
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  15. 08 Sep, 2013 1 commit
    • Linus Torvalds's avatar
      vfs: reorganize dput() memory accesses · 8aab6a27
      Linus Torvalds authored
      This is me being a bit OCD after all the dentry optimization work this
      merge window: profiles end up showing 'dput()' as a rather expensive
      operation, and there were two unrelated bad reasons for that.
      The first reason was reading d_lockref.count for debugging purposes,
      which touches the lockref cacheline (for reads) before really need to.
      More importantly, the debugging test in question is _wrong_, and has
      hidden bugs.  It's true that we can only sleep when the count goes down
      to zero, but the test as-is hides the much more subtle bug that happens
      if we race with somebody else deleting the file.
      Anyway we _will_ touch that cacheline, but let's do it for a write and
      in the right routine (ie in "lockref_put_or_lock()") which annotates the
      costs better.  So remove the misleading debug code.
      The other was an unnecessary access to the cacheline that contains the
      d_lru list, just to check whether we already were on the LRU list or
      not.  This is exactly what we have d_flags for, so that we can avoid
      touching extra cache lines for the common case.  So just add another bit
      for "is this dentry on the LRU".
      Finally, mark the tests properly likely/unlikely, so that the common
      fast-paths are dense in the instruction stream.
      This makes the profiles look much saner.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  16. 05 Sep, 2013 2 commits
    • Al Viro's avatar
    • Miklos Szeredi's avatar
      vfs: check submounts and drop atomically · 848ac114
      Miklos Szeredi authored
      We check submounts before doing d_drop() on a non-empty directory dentry in
      NFS (have_submounts()), but we do not exclude a racing mount.
       Process A: have_submounts() -> returns false
       Process B: mount() -> success
       Process A: d_drop()
      This patch prepares the ground for the fix by doing the following
      operations all under the same rename lock:
      This is actually an optimization since have_submounts() and
      shrink_dcache_parent() both traverse the same dentry tree separately.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      CC: David Howells <dhowells@redhat.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      CC: Trond Myklebust <Trond.Myklebust@netapp.com>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  17. 02 Sep, 2013 1 commit
    • Linus Torvalds's avatar
      vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() · 15570086
      Linus Torvalds authored
      This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
      and re-implements it using the lockref infrastructure instead.  It also
      adds a lot of comments about what is actually going on, because turning
      a dentry that was looked up using RCU into a long-lived reference
      counted entry is one of the more subtle parts of the rcu walk.
      We also used to be _particularly_ subtle in unlazy_walk() where we
      re-validate both the dentry and its parent using the same sequence
      count.  We used to do it by nesting the locks and then verifying the
      sequence count just once.
      That was silly, because nested locking is expensive, but the sequence
      count check is not.  So this just re-validates the dentry and the parent
      separately, avoiding the nested locking, and making the lockref lookup
      Acked-by: default avatarWaiman Long <waiman.long@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  18. 29 Aug, 2013 1 commit
    • Waiman Long's avatar
      vfs: make the dentry cache use the lockref infrastructure · 98474236
      Waiman Long authored
      This just replaces the dentry count/lock combination with the lockref
      structure that contains both a count and a spinlock, and does the
      mechanical conversion to use the lockref infrastructure.
      There are no semantic changes here, it's purely syntactic.  The
      reference lockref implementation uses the spinlock exactly the same way
      that the old dcache code did, and the bulk of this patch is just
      expanding the internal "d_count" use in the dcache code to use
      "d_lockref.count" instead.
      This is purely preparation for the real change to make the reference
      count updates be lockless during the 3.12 merge window.
      [ As with the previous commit, this is a rewritten version of a concept
        originally from Waiman, so credit goes to him, blame for any errors
        goes to me.
        Waiman's patch had some semantic differences for taking advantage of
        the lockless update in dget_parent(), while this patch is
        intentionally a pure search-and-replace change with no semantic
        changes.     - Linus ]
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  19. 24 Aug, 2013 1 commit
  20. 20 Jul, 2013 1 commit
  21. 05 Jul, 2013 1 commit
  22. 29 Jun, 2013 2 commits
  23. 26 Feb, 2013 1 commit
    • Jeff Layton's avatar
      vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op · ecf3d1f1
      Jeff Layton authored
      The following set of operations on a NFS client and server will cause
          server# mkdir a
          client# cd a
          server# mv a a.bak
          client# sleep 30  # (or whatever the dir attrcache timeout is)
          client# stat .
          stat: cannot stat `.': Stale NFS file handle
      Obviously, we should not be getting an ESTALE error back there since the
      inode still exists on the server. The problem is that the lookup code
      will call d_revalidate on the dentry that "." refers to, because NFS has
      FS_REVAL_DOT set.
      nfs_lookup_revalidate will see that the parent directory has changed and
      will try to reverify the dentry by redoing a LOOKUP. That of course
      fails, so the lookup code returns ESTALE.
      The problem here is that d_revalidate is really a bad fit for this case.
      What we really want to know at this point is whether the inode is still
      good or not, but we don't really care what name it goes by or whether
      the dcache is still valid.
      Add a new d_op->d_weak_revalidate operation and have complete_walk call
      that instead of d_revalidate. The intent there is to allow for a
      "weaker" d_revalidate that just checks to see whether the inode is still
      good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
      special casing.
      [AV: changed method name, added note in porting, fixed confusion re
      having it possibly called from RCU mode (it won't be)]
      Cc: NeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  24. 23 Feb, 2013 3 commits
  25. 20 Dec, 2012 1 commit
  26. 18 Sep, 2012 1 commit
    • Miklos Szeredi's avatar
      vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() · b161dfa6
      Miklos Szeredi authored
      IBM reported a soft lockup after applying the fix for the rename_lock
      deadlock.  Commit c83ce989
       ("VFS: Fix the nfs sillyrename regression
      in kernel 2.6.38") was found to be the culprit.
      The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
      dentry was killed.  This flag can be set on non-killed dentries too,
      which results in infinite retries when trying to traverse the dentry
      This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
      only set in d_kill() and makes try_to_ascend() test only this flag.
      IBM reported successful test results with this patch.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  27. 14 Jul, 2012 2 commits
  28. 11 May, 2012 1 commit
    • Linus Torvalds's avatar
      vfs: make it possible to access the dentry hash/len as one 64-bit entry · 26fe5750
      Linus Torvalds authored
      This allows comparing hash and len in one operation on 64-bit
      architectures.  Right now only __d_lookup_rcu() takes advantage of this,
      since that is the case we care most about.
      The use of anonymous struct/unions hides the alternate 64-bit approach
      from most users, the exception being a few cases where we initialize a
      'struct qstr' with a static initializer.  This makes the problematic
      cases use a new QSTR_INIT() helper function for that (but initializing
      just the name pointer with a "{ .name = xyzzy }" initializer remains
      valid, as does just copying another qstr structure).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>