1. 03 Sep, 2013 1 commit
    • Maxim Patlasov's avatar
      fuse: postpone end_page_writeback() in fuse_writepage_locked() · 4a4ac4eb
      Maxim Patlasov authored
      
      
      The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs shrinking truncate(2) intended to purge the page.
      3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
         writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
         the original page by end_page_writeback.
      4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
         is free.
      5) Ordinary write(2) extends i_size back to cover the page. Note that
         fuse_send_write_pages do wait for fuse writeback, but for another
         page->index.
      6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
         fuse_send_writepage is supposed to crop inarg->size of the request,
         but it doesn't because i_size has already been extended back.
      
      Moving end_page_writeback to the end of fuse_writepage_locked fixes the
      race because now the fact that truncate_pagecache is successfully returned
      infers that fuse_writepage_locked has already called end_page_writeback.
      And this, in turn, infers that fuse_flush_writepages has already called
      fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
      could not extend it because of i_mutex held by ftruncate(2).
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      4a4ac4eb
  2. 29 Jun, 2013 1 commit
  3. 17 Jun, 2013 1 commit
    • Maxim Patlasov's avatar
      fuse: hold i_mutex in fuse_file_fallocate() · 14c14414
      Maxim Patlasov authored
      
      
      Changing size of a file on server and local update (fuse_write_update_size)
      should be always protected by inode->i_mutex. Otherwise a race like this is
      possible:
      
      1. Process 'A' calls fallocate(2) to extend file (~FALLOC_FL_KEEP_SIZE).
      fuse_file_fallocate() sends FUSE_FALLOCATE request to the server.
      2. Process 'B' calls ftruncate(2) shrinking the file. fuse_do_setattr()
      sends shrinking FUSE_SETATTR request to the server and updates local i_size
      by i_size_write(inode, outarg.attr.size).
      3. Process 'A' resumes execution of fuse_file_fallocate() and calls
      fuse_write_update_size(inode, offset + length). But 'offset + length' was
      obsoleted by ftruncate from previous step.
      
      Changed in v2 (thanks Brian and Anand for suggestions):
       - made relation between mutex_lock() and fuse_set_nowrite(inode) more
         explicit and clear.
       - updated patch description to use ftruncate(2) in example
      Signed-off-by: default avatarMaxim V. Patlasov <MPatlasov@parallels.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      14c14414
  4. 03 Jun, 2013 2 commits
    • Maxim Patlasov's avatar
      fuse: fix alignment in short read optimization for async_dio · e5c5f05d
      Maxim Patlasov authored
      
      
      The bug was introduced with async_dio feature: trying to optimize short reads,
      we cut number-of-bytes-to-read to i_size boundary. Hence the following example:
      
      	truncate --size=300 /mnt/file
      	dd if=/mnt/file of=/dev/null iflag=direct
      
      led to FUSE_READ request of 300 bytes size. This turned out to be problem
      for userspace fuse implementations who rely on assumption that kernel fuse
      does not change alignment of request from client FS.
      
      The patch turns off the optimization if async_dio is disabled. And, if it's
      enabled, the patch fixes adjustment of number-of-bytes-to-read to preserve
      alignment.
      
      Note, that we cannot throw out short read optimization entirely because
      otherwise a direct read of a huge size issued on a tiny file would generate
      a huge amount of fuse requests and most of them would be ACKed by userspace
      with zero bytes read.
      Signed-off-by: default avatarMaxim Patlasov <MPatlasov@parallels.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      e5c5f05d
    • Brian Foster's avatar
      fuse: return -EIOCBQUEUED from fuse_direct_IO() for all async requests · c9ecf989
      Brian Foster authored
      
      
      If request submission fails for an async request (i.e.,
      get_user_pages() returns -ERESTARTSYS), we currently skip the
      -EIOCBQUEUED return and drop into wait_for_sync_kiocb() forever.
      
      Avoid this by always returning -EIOCBQUEUED for async requests. If
      an error occurs, the error is passed into fuse_aio_complete(),
      returned via aio_complete() and thus propagated to userspace via
      io_getevents().
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      c9ecf989
  5. 20 May, 2013 2 commits
    • Brian Foster's avatar
      fuse: update inode size and invalidate attributes on fallocate · bee6c307
      Brian Foster authored
      
      
      An fallocate request without FALLOC_FL_KEEP_SIZE set can extend the
      size of a file. Update the inode size after a successful fallocate.
      
      Also invalidate the inode attributes after a successful fallocate
      to ensure we pick up the latest attribute values (i.e., i_blocks).
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      bee6c307
    • Brian Foster's avatar
      fuse: truncate pagecache range on hole punch · 3634a632
      Brian Foster authored
      
      
      fuse supports hole punch via the fallocate() FALLOC_FL_PUNCH_HOLE
      interface. When a hole punch is passed through, the page cache
      is not cleared and thus allows reading stale data from the cache.
      
      This is easily demonstrable (using FOPEN_KEEP_CACHE) by reading a
      smallish random data file into cache, punching a hole and creating
      a copy of the file. Drop caches or remount and observe that the
      original file no longer matches the file copied after the hole
      punch. The original file contains a zeroed range and the latter
      file contains stale data.
      
      Protect against writepage requests in progress and punch out the
      associated page cache range after a successful client fs hole
      punch.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      3634a632
  6. 14 May, 2013 1 commit
  7. 08 May, 2013 1 commit
  8. 01 May, 2013 1 commit
  9. 18 Apr, 2013 1 commit
  10. 17 Apr, 2013 6 commits
    • Maxim Patlasov's avatar
      fuse: optimize short direct reads · 439ee5f0
      Maxim Patlasov authored
      
      
      If user requested direct read beyond EOF, we can skip sending fuse requests
      for positions beyond EOF because userspace would ACK them with zero bytes read
      anyway. We can trust to i_size in fuse_direct_IO for such cases because it's
      called from fuse_file_aio_read() and the latter updates fuse attributes
      including i_size.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      439ee5f0
    • Maxim Patlasov's avatar
      fuse: enable asynchronous processing direct IO · bcba24cc
      Maxim Patlasov authored
      
      
      In case of synchronous DIO request (i.e. read(2) or write(2) for a file
      opened with O_DIRECT), the patch submits fuse requests asynchronously, but
      waits for their completions before return from fuse_direct_IO().
      
      In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened
      with O_DIRECT), the patch submits fuse requests asynchronously and return
      -EIOCBQUEUED immediately.
      
      The only special case is async DIO extending file. Here the patch falls back
      to old behaviour because we can't return -EIOCBQUEUED and update i_size later,
      without i_mutex hold. And we have no method to wait on real async I/O
      requests.
      
      The patch also clean __fuse_direct_write() up: it's better to update i_size
      in its callers. Thanks Brian for suggestion.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      bcba24cc
    • Maxim Patlasov's avatar
      fuse: make fuse_direct_io() aware about AIO · 36cf66ed
      Maxim Patlasov authored
      
      
      The patch implements passing "struct fuse_io_priv *io" down the stack up to
      fuse_send_read/write where it is used to submit request asynchronously.
      io->async==0 designates synchronous processing.
      
      Non-trivial part of the patch is changes in fuse_direct_io(): resources
      like fuse requests and user pages cannot be released immediately in async
      case.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      36cf66ed
    • Maxim Patlasov's avatar
      fuse: add support of async IO · 01e9d11a
      Maxim Patlasov authored
      
      
      The patch implements a framework to process an IO request asynchronously. The
      idea is to associate several fuse requests with a single kiocb by means of
      fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
      dio' for direct-io.c.
      
      The framework is supposed to be used like this:
       - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
         and initializes it setting 'async' field to non-zero value.
       - as soon as fuse request is filled, it can be submitted (in non-blocking way)
         by fuse_async_req_send()
       - when all submitted requests are ACKed by userspace, io->reqs drops to zero
         triggering aio_complete()
      
      In case of IO initiated by libaio, aio_complete() will finish processing the
      same way as in case of dio_complete() calling aio_complete(). But the
      framework may be also used for internal FUSE use when initial IO request
      was synchronous (from user perspective), but it's beneficial to process it
      asynchronously. Then the caller should wait on kiocb explicitly and
      aio_complete() will wake the caller up.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      01e9d11a
    • Maxim Patlasov's avatar
      fuse: move fuse_release_user_pages() up · 187c5c36
      Maxim Patlasov authored
      
      
      fuse_release_user_pages() will be indirectly used by fuse_send_read/write
      in future patches.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      187c5c36
    • Maxim Patlasov's avatar
      fuse: make request allocations for background processing explicit · 8b41e671
      Maxim Patlasov authored
      
      
      There are two types of processing requests in FUSE: synchronous (via
      fuse_request_send()) and asynchronous (via adding to fc->bg_queue).
      
      Fortunately, the type of processing is always known in advance, at the time
      of request allocation. This preparatory patch utilizes this fact making
      fuse_get_req() aware about the type. Next patches will use it.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      8b41e671
  11. 09 Apr, 2013 1 commit
  12. 27 Feb, 2013 1 commit
  13. 04 Feb, 2013 1 commit
  14. 31 Jan, 2013 1 commit
  15. 24 Jan, 2013 11 commits
  16. 17 Jan, 2013 1 commit
  17. 18 Dec, 2012 1 commit
  18. 09 Oct, 2012 1 commit
    • Konstantin Khlebnikov's avatar
      mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4
      Konstantin Khlebnikov authored
      
      
      Move actual pte filling for non-linear file mappings into the new special
      vma operation: ->remap_pages().
      
      Filesystems must implement this method to get non-linear mapping support,
      if it uses filemap_fault() then generic_file_remap_pages() can be used.
      
      Now device drivers can implement this method and obtain nonlinear vma support.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b173bc4
  19. 06 Aug, 2012 1 commit
    • Zach Brown's avatar
      fuse: verify all ioctl retry iov elements · fb6ccff6
      Zach Brown authored
      Commit 7572777e
      
       attempted to verify that
      the total iovec from the client doesn't overflow iov_length() but it
      only checked the first element.  The iovec could still overflow by
      starting with a small element.  The obvious fix is to check all the
      elements.
      
      The overflow case doesn't look dangerous to the kernel as the copy is
      limited by the length after the overflow.  This fix restores the
      intention of returning an error instead of successfully copying less
      than the iovec represented.
      
      I found this by code inspection.  I built it but don't have a test case.
      I'm cc:ing stable because the initial commit did as well.
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      CC: <stable@vger.kernel.org>         [2.6.37+]
      fb6ccff6
  20. 31 Jul, 2012 1 commit
  21. 18 Jul, 2012 1 commit
    • Brian Foster's avatar
      fuse: update attributes on aio_read · a8894274
      Brian Foster authored
      
      
      A fuse-based network filesystem might allow for the inode
      and/or file data to change unexpectedly. A local client
      that opens and repeatedly reads a file might never pick
      up on such changes and indefinitely return stale data.
      
      Always invoke fuse_update_attributes() in the read path
      to cause an attr revalidation when the attributes expire.
      This leads to a page cache invalidation if necessary and
      ensures fuse issues new read requests to the fuse client.
      
      The original logic (reval only on reads beyond EOF) is
      preserved unless the client specifies FUSE_AUTO_INVAL_DATA
      on init.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      a8894274
  22. 01 Jun, 2012 1 commit
    • Josef Bacik's avatar
      fs: introduce inode operation ->update_time · c3b2da31
      Josef Bacik authored
      
      
      Btrfs has to make sure we have space to allocate new blocks in order to modify
      the inode, so updating time can fail.  We've gotten around this by having our
      own file_update_time but this is kind of a pain, and Christoph has indicated he
      would like to make xfs do something different with atime updates.  So introduce
      ->update_time, where we will deal with i_version an a/m/c time updates and
      indicate which changes need to be made.  The normal version just does what it
      has always done, updates the time and marks the inode dirty, and then
      filesystems can choose to do something different.
      
      I've gone through all of the users of file_update_time and made them check for
      errors with the exception of the fault code since it's complicated and I wasn't
      quite sure what to do there, also Jan is going to be pushing the file time
      updates into page_mkwrite for those who have it so that should satisfy btrfs and
      make it not a big deal to check the file_update_time() return code in the
      generic fault path. Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      c3b2da31
  23. 26 Apr, 2012 1 commit