1. 22 Sep, 2016 1 commit
  2. 31 Jul, 2016 1 commit
  3. 28 May, 2016 1 commit
  4. 02 May, 2016 4 commits
    • Al Viro's avatar
    • Al Viro's avatar
      introduce a parallel variant of ->iterate() · 61922694
      Al Viro authored
      
      
      New method: ->iterate_shared().  Same arguments as in ->iterate(),
      called with the directory locked only shared.  Once all filesystems
      switch, the old one will be gone.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      61922694
    • Al Viro's avatar
      parallel lookups: actual switch to rwsem · 9902af79
      Al Viro authored
      
      
      ta-da!
      
      The main issue is the lack of down_write_killable(), so the places
      like readdir.c switched to plain inode_lock(); once killable
      variants of rwsem primitives appear, that'll be dealt with.
      
      lockdep side also might need more work
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9902af79
    • Al Viro's avatar
      parallel lookups machinery, part 2 · 84e710da
      Al Viro authored
      
      
      We'll need to verify that there's neither a hashed nor in-lookup
      dentry with desired parent/name before adding to in-lookup set.
      
      One possible solution would be to hold the parent's ->d_lock through
      both checks, but while the in-lookup set is relatively small at any
      time, dcache is not.  And holding the parent's ->d_lock through
      something like __d_lookup_rcu() would suck too badly.
      
      So we leave the parent's ->d_lock alone, which means that we watch
      out for the following scenario:
      	* we verify that there's no hashed match
      	* existing in-lookup match gets hashed by another process
      	* we verify that there's no in-lookup matches and decide
      that everything's fine.
      
      Solution: per-directory kinda-sorta seqlock, bumped around the times
      we hash something that used to be in-lookup or move (and hash)
      something in place of in-lookup.  Then the above would turn into
      	* read the counter
      	* do dcache lookup
      	* if no matches found, check for in-lookup matches
      	* if there had been none of those either, check if the
      counter has changed; repeat if it has.
      
      The "kinda-sorta" part is due to the fact that we don't have much spare
      space in inode.  There is a spare word (shared with i_bdev/i_cdev/i_pipe),
      so the counter part is not a problem, but spinlock is a different story.
      
      We could use the parent's ->d_lock, and it would be less painful in
      terms of contention, for __d_add() it would be rather inconvenient to
      grab; we could do that (using lock_parent()), but...
      
      Fortunately, we can get serialization on the counter itself, and it
      might be a good idea in general; we can use cmpxchg() in a loop to
      get from even to odd and smp_store_release() from odd to even.
      
      This commit adds the counter and updating logics; the readers will be
      added in the next commit.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      84e710da
  5. 11 Apr, 2016 1 commit
  6. 14 Jan, 2016 1 commit
    • Al Viro's avatar
      Make sure that highmem pages are not added to symlink page cache · e8ecde25
      Al Viro authored
      
      
      inode_nohighmem() is sufficient to make sure that page_get_link()
      won't try to allocate a highmem page.  Moreover, it is sufficient
      to make sure that page_symlink/__page_symlink won't do the same
      thing.  However, any filesystem that manually preseeds the symlink's
      page cache upon symlink(2) needs to make sure that the page it
      inserts there won't be a highmem one.
      
      Fortunately, only nfs and shmem have run afoul of that...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e8ecde25
  7. 30 Dec, 2015 1 commit
  8. 09 Dec, 2015 2 commits
    • Al Viro's avatar
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro authored
      
      
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b255391
    • Al Viro's avatar
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro authored
      
      
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  9. 01 Jul, 2015 1 commit
  10. 04 Jun, 2015 1 commit
  11. 15 May, 2015 1 commit
  12. 12 Apr, 2015 2 commits
  13. 19 Nov, 2014 2 commits
  14. 03 Apr, 2014 1 commit
    • Johannes Weiner's avatar
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner authored
      
      
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  15. 09 Nov, 2013 1 commit
  16. 10 Sep, 2013 2 commits
  17. 29 Jun, 2013 2 commits
  18. 26 Feb, 2013 1 commit
    • Jeff Layton's avatar
      vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op · ecf3d1f1
      Jeff Layton authored
      
      
      The following set of operations on a NFS client and server will cause
      
          server# mkdir a
          client# cd a
          server# mv a a.bak
          client# sleep 30  # (or whatever the dir attrcache timeout is)
          client# stat .
          stat: cannot stat `.': Stale NFS file handle
      
      Obviously, we should not be getting an ESTALE error back there since the
      inode still exists on the server. The problem is that the lookup code
      will call d_revalidate on the dentry that "." refers to, because NFS has
      FS_REVAL_DOT set.
      
      nfs_lookup_revalidate will see that the parent directory has changed and
      will try to reverify the dentry by redoing a LOOKUP. That of course
      fails, so the lookup code returns ESTALE.
      
      The problem here is that d_revalidate is really a bad fit for this case.
      What we really want to know at this point is whether the inode is still
      good or not, but we don't really care what name it goes by or whether
      the dcache is still valid.
      
      Add a new d_op->d_weak_revalidate operation and have complete_walk call
      that instead of d_revalidate. The intent there is to allow for a
      "weaker" d_revalidate that just checks to see whether the inode is still
      good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
      special casing.
      
      [AV: changed method name, added note in porting, fixed confusion re
      having it possibly called from RCU mode (it won't be)]
      
      Cc: NeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ecf3d1f1
  19. 20 Dec, 2012 1 commit
  20. 03 Aug, 2012 1 commit
  21. 14 Jul, 2012 4 commits
  22. 06 May, 2012 1 commit
  23. 21 Mar, 2012 1 commit
  24. 25 Jul, 2011 1 commit
  25. 21 Jul, 2011 2 commits
    • Josef Bacik's avatar
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik authored
      
      
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
    • Josef Bacik's avatar
      fs: add SEEK_HOLE and SEEK_DATA flags · 982d8165
      Josef Bacik authored
      
      
      This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
      using fiemap in things like cp cause more problems than it solves, so lets try
      and give userspace an interface that doesn't suck.  We need to match solaris
      here, and the definitions are
      
      *o* If /whence/ is SEEK_HOLE, the offset of the start of the
      next hole greater than or equal to the supplied offset
      is returned. The definition of a hole is provided near
      the end of the DESCRIPTION.
      
      *o* If /whence/ is SEEK_DATA, the file pointer is set to the
      start of the next non-hole file region greater than or
      equal to the supplied offset.
      
      So in the generic case the entire file is data and there is a virtual hole at
      the end.  That means we will just return i_size for SEEK_HOLE and will return
      the same offset for SEEK_DATA.  This is how Solaris does it so we have to do it
      the same way.
      
      Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      982d8165
  26. 20 Jul, 2011 1 commit
  27. 25 Mar, 2011 1 commit
    • Dave Chinner's avatar
      fs: remove inode_lock from iput_final and prune_icache · f283c86a
      Dave Chinner authored
      
      
      Now that inode state changes are protected by the inode->i_lock and
      the inode LRU manipulations by the inode_lru_lock, we can remove the
      inode_lock from prune_icache and the initial part of iput_final().
      
      instead of using the inode_lock to protect the inode during
      iput_final, use the inode->i_lock instead. This protects the inode
      against new references being taken while we change the inode state
      to I_FREEING, as well as preventing prune_icache from grabbing the
      inode while we are manipulating it. Hence we no longer need the
      inode_lock in iput_final prior to setting I_FREEING on the inode.
      
      For prune_icache, we no longer need the inode_lock to protect the
      LRU list, and the inodes themselves are protected against freeing
      races by the inode->i_lock. Hence we can lift the inode_lock from
      prune_icache as well.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f283c86a
  28. 16 Mar, 2011 1 commit