1. 01 Oct, 2016 5 commits
    • Miklos Szeredi's avatar
      fuse: handle killpriv in userspace fs · 5e940c1d
      Miklos Szeredi authored
      
      
      Only userspace filesystem can do the killing of suid/sgid without races.
      So introduce an INIT flag and negotiate support for this.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5e940c1d
    • Miklos Szeredi's avatar
      fuse: fix killing s[ug]id in setattr · a09f99ed
      Miklos Szeredi authored
      
      
      Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on
      chown and truncate, and (since writeback_cache) write.  The problem with
      this is that it'll potentially restore a stale mode.
      
      The poper fix would be to let the filesystems do the suid/sgid clearing on
      the relevant operations.  Possibly some are already doing it but there's no
      way we can detect this.
      
      So fix this by refreshing and recalculating the mode.  Do this only if
      ATTR_KILL_S[UG]ID is set to not destroy performance for writes.  This is
      still racy but the size of the window is reduced.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      a09f99ed
    • Miklos Szeredi's avatar
      fuse: invalidate dir dentry after chmod · 5e2b8828
      Miklos Szeredi authored
      
      
      Without "default_permissions" the userspace filesystem's lookup operation
      needs to perform the check for search permission on the directory.
      
      If directory does not allow search for everyone (this is quite rare) then
      userspace filesystem has to set entry timeout to zero to make sure
      permissions are always performed.
      
      Changing the mode bits of the directory should also invalidate the
      (previously cached) dentry to make sure the next lookup will have a chance
      of updating the timeout, if needed.
      Reported-by: default avatarJean-Pierre André <jean-pierre.andre@wanadoo.fr>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      5e2b8828
    • Seth Forshee's avatar
      fuse: Use generic xattr ops · 703c7362
      Seth Forshee authored
      
      
      In preparation for posix acl support, rework fuse to use xattr handlers and
      the generic setxattr/getxattr/listxattr callbacks.  Split the xattr code
      out into it's own file, and promote symbols to module-global scope as
      needed.
      
      Functionally these changes have no impact, as fuse still uses a single
      handler for all xattrs which uses the old callbacks.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      703c7362
    • Miklos Szeredi's avatar
      fuse: listxattr: verify xattr list · cb3ae6d2
      Miklos Szeredi authored
      
      
      Make sure userspace filesystem is returning a well formed list of xattr
      names (zero or more nonzero length, null terminated strings).
      
      [Michael Theall: only verify in the nonzero size case]
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      cb3ae6d2
  2. 24 Aug, 2016 1 commit
  3. 30 Jul, 2016 1 commit
  4. 29 Jul, 2016 4 commits
  5. 28 Jul, 2016 1 commit
  6. 19 Jul, 2016 1 commit
  7. 05 Jul, 2016 1 commit
  8. 30 Jun, 2016 2 commits
    • Ashish Sangwan's avatar
      fuse: improve aio directIO write performance for size extending writes · 7879c4e5
      Ashish Sangwan authored
      
      
      While sending the blocking directIO in fuse, the write request is broken
      into sub-requests, each of default size 128k and all the requests are sent
      in non-blocking background mode if async_dio mode is supported by libfuse.
      The process which issue the write wait for the completion of all the
      sub-requests. Sending multiple requests parallely gives a chance to perform
      parallel writes in the user space fuse implementation if it is
      multi-threaded and hence improves the performance.
      
      When there is a size extending aio dio write, we switch to blocking mode so
      that we can properly update the size of the file after completion of the
      writes. However, in this situation all the sub-requests are sent in
      serialized manner where the next request is sent only after receiving the
      reply of the current request. Hence the multi-threaded user space
      implementation is not utilized properly.
      
      This patch changes the size extending aio dio behavior to exactly follow
      blocking dio. For multi threaded fuse implementation having 10 threads and
      using buffer size of 64MB to perform async directIO, we are getting double
      the speed.
      Signed-off-by: default avatarAshish Sangwan <ashishsangwan2@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      7879c4e5
    • Miklos Szeredi's avatar
      fuse: serialize dirops by default · 5c672ab3
      Miklos Szeredi authored
      
      
      Negotiate with userspace filesystems whether they support parallel readdir
      and lookup.  Disable parallelism by default for fear of breaking fuse
      filesystems.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 9902af79 ("parallel lookups: actual switch to rwsem")
      Fixes: d9b3dbdc ("fuse: switch to ->iterate_shared()")
      5c672ab3
  9. 11 Jun, 2016 1 commit
    • Linus Torvalds's avatar
      vfs: make the string hashes salt the hash · 8387ff25
      Linus Torvalds authored
      
      
      We always mixed in the parent pointer into the dentry name hash, but we
      did it late at lookup time.  It turns out that we can simplify that
      lookup-time action by salting the hash with the parent pointer early
      instead of late.
      
      A few other users of our string hashes also wanted to mix in their own
      pointers into the hash, and those are updated to use the same mechanism.
      
      Hash users that don't have any particular initial salt can just use the
      NULL pointer as a no-salt.
      
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8387ff25
  10. 28 May, 2016 1 commit
  11. 02 May, 2016 1 commit
  12. 01 May, 2016 2 commits
  13. 25 Apr, 2016 1 commit
  14. 11 Apr, 2016 1 commit
  15. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      
      
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  16. 16 Mar, 2016 1 commit
  17. 14 Mar, 2016 2 commits
    • Seth Forshee's avatar
      fuse: Add reference counting for fuse_io_priv · 744742d6
      Seth Forshee authored
      The 'reqs' member of fuse_io_priv serves two purposes. First is to track
      the number of oustanding async requests to the server and to signal that
      the io request is completed. The second is to be a reference count on the
      structure to know when it can be freed.
      
      For sync io requests these purposes can be at odds.  fuse_direct_IO() wants
      to block until the request is done, and since the signal is sent when
      'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to
      use the object after the userspace server has completed processing
      requests. This leads to some handshaking and special casing that it
      needlessly complicated and responsible for at least one race condition.
      
      It's much cleaner and safer to maintain a separate reference count for the
      object lifecycle and to let 'reqs' just be a count of outstanding requests
      to the userspace server. Then we can know for sure when it is safe to free
      the object without any handshaking or special cases.
      
      The catch here is that most of the time these objects are stack allocated
      and should not be freed. Initializing these objects with a single reference
      that is never released prevents accidental attempts to free the objects.
      
      Fixes: 9d5722b7
      
       ("fuse: handle synchronous iocbs internally")
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      744742d6
    • Robert Doebbelin's avatar
      fuse: do not use iocb after it may have been freed · 7cabc61e
      Robert Doebbelin authored
      
      
      There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an
      iocb that could have been freed if async io has already completed.  The fix
      in this case is simple and obvious: cache the result before starting io.
      
      It was discovered by KASan:
      
      kernel: ==================================================================
      kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390
      Signed-off-by: default avatarRobert Doebbelin <robert@quobyte.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: bcba24cc ("fuse: enable asynchronous processing direct IO")
      Cc: <stable@vger.kernel.org> # 3.10+
      7cabc61e
  18. 22 Jan, 2016 1 commit
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      
      
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  19. 15 Jan, 2016 1 commit
    • Vladimir Davydov's avatar
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov authored
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style ...
      5d097056
  20. 30 Dec, 2015 1 commit
  21. 29 Dec, 2015 1 commit
  22. 09 Dec, 2015 1 commit
    • Al Viro's avatar
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro authored
      
      
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b255391
  23. 10 Nov, 2015 3 commits
  24. 22 Oct, 2015 1 commit
  25. 16 Aug, 2015 1 commit
  26. 01 Jul, 2015 3 commits