1. 09 Feb, 2013 5 commits
    • Theodore Ts'o's avatar
      ext4: start handle at the last possible moment when creating inodes · 1139575a
      Theodore Ts'o authored
      
      
      In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
      until the inode has been succesfully allocated.  In order to do this,
      we need to start the handle in the ext4_new_inode().  So create a new
      variant of this function, ext4_new_inode_start_handle(), so the handle
      can be created at the last possible minute, before we need to modify
      the inode allocation bitmap block.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1139575a
    • Theodore Ts'o's avatar
      ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() · 64044abf
      Theodore Ts'o authored
      
      
      The ext4_unlink() and ext4_rmdir() don't actually release the blocks
      associated with the file/directory.  This gets done in a separate jbd2
      handle called via ext4_evict_inode().  Thus, we don't need to reserve
      lots of journal credits for the truncate.
      
      Note that using too many journal credits is non-optimal because it can
      leading to the journal transmit getting closed too early, before it is
      strictly necessary.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      64044abf
    • Theodore Ts'o's avatar
      ext4: start handle at the last possible moment in ext4_rmdir() · 8dcfaad2
      Theodore Ts'o authored
      
      
      Don't start the jbd2 transaction handle until after the directory
      entry has been found, to minimize the amount of time that a handle is
      held active.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      8dcfaad2
    • Theodore Ts'o's avatar
      ext4: start handle at the last possible moment in ext4_unlink() · 931b6864
      Theodore Ts'o authored
      
      
      Don't start the jbd2 transaction handle until after the directory
      entry has been found, to minimize the amount of time that a handle is
      held active.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      931b6864
    • Theodore Ts'o's avatar
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o authored
      
      
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  2. 29 Jan, 2013 4 commits
  3. 07 Jan, 2013 2 commits
  4. 27 Dec, 2012 1 commit
    • Theodore Ts'o's avatar
      ext4: avoid hang when mounting non-journal filesystems with orphan list · 0e9a9a1a
      Theodore Ts'o authored
      
      
      When trying to mount a file system which does not contain a journal,
      but which does have a orphan list containing an inode which needs to
      be truncated, the mount call with hang forever in
      ext4_orphan_cleanup() because ext4_orphan_del() will return
      immediately without removing the inode from the orphan list, leading
      to an uninterruptible loop in kernel code which will busy out one of
      the CPU's on the system.
      
      This can be trivially reproduced by trying to mount the file system
      found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
      source tree.  If a malicious user were to put this on a USB stick, and
      mount it on a Linux desktop which has automatic mounts enabled, this
      could be considered a potential denial of service attack.  (Not a big
      deal in practice, but professional paranoids worry about such things,
      and have even been known to allocate CVE numbers for such problems.)
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      0e9a9a1a
  5. 10 Dec, 2012 11 commits
  6. 13 Nov, 2012 1 commit
  7. 11 Nov, 2012 1 commit
  8. 27 Sep, 2012 2 commits
    • Carlos Maiolino's avatar
      ext4: ext4_bread usage audit · 6d1ab10e
      Carlos Maiolino authored
      When ext4_bread() returns NULL and err is set to zero, this means
      there is no phyical block mapped to the specified logical block
      number.  (Previous to commit 90b0a973
      
      , err was uninitialized in this
      case, which caused other problems.)
      
      The directory handling routines use ext4_bread() in many places, the
      fact that ext4_bread() now returns NULL with err set to zero could
      cause problems since a number of these functions will simply return
      the value of err if the result of ext4_bread() was the NULL pointer,
      causing the caller of the function to think that the function was
      successful.
      
      Since directories should never contain holes, this case can only
      happen if the file system is corrupted.  This commit audits all of the
      callers of ext4_bread(), and makes sure they do the right thing if a
      hole in a directory is found by ext4_bread().
      
      Some ext4_bread() callers did not need any changes either because they
      already had its own hole detector paths.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      6d1ab10e
    • Bernd Schubert's avatar
      ext4: always set i_op in ext4_mknod() · 6a08f447
      Bernd Schubert authored
      
      
      ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
      to mask those methods. And ext4_iget also always sets it, so there is
      an inconsistency.
      Signed-off-by: default avatarBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      6a08f447
  9. 18 Sep, 2012 2 commits
    • Anatol Pomozov's avatar
      ext4: make orphan functions be no-op in no-journal mode · c9b92530
      Anatol Pomozov authored
      
      
      Instead of checking whether the handle is valid, we check if journal
      is enabled. This avoids taking the s_orphan_lock mutex in all cases
      when there is no journal in use, including the error paths where
      ext4_orphan_del() is called with a handle set to NULL.
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c9b92530
    • Carlos Maiolino's avatar
      ext4: fix possible non-initialized variable in htree_dirblock_to_tree() · 90b0a973
      Carlos Maiolino authored
      
      
      htree_dirblock_to_tree() declares a non-initialized 'err' variable,
      which is passed as a reference to another functions expecting them to
      set this variable with their error codes.
      
      It's passed to ext4_bread(), which then passes it to ext4_getblk(). If
      ext4_map_blocks() returns 0 due to a lookup failure, leaving the
      ext4_getblk() buffer_head uninitialized, it will make ext4_getblk()
      return to ext4_bread() without initialize the 'err' variable, and
      ext4_bread() will return to htree_dirblock_to_tree() with this variable
      still uninitialized.  htree_dirblock_to_tree() will pass this variable
      with garbage back to ext4_htree_fill_tree(), which expects a number of
      directory entries added to the rb-tree. which, in case, might return a
      fake non-zero value due the garbage left in the 'err' variable, leading
      the kernel to an Oops in ext4_dx_readdir(), once this is expecting a
      filled rb-tree node, when in turn it will have a NULL-ed one, causing an
      invalid page request when trying to get a fname struct from this NULL-ed
      rb-tree node in this line:
      
      fname = rb_entry(info->curr_node, struct fname, rb_hash);
      
      The patch itself initializes the err variable in
      htree_dirblock_to_tree() to avoid usage mistakes by the called
      functions, and also fix ext4_getblk() to return a initialized 'err'
      variable when ext4_map_blocks() fails a lookup.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      90b0a973
  10. 17 Aug, 2012 1 commit
    • Theodore Ts'o's avatar
      ext4: add max_dir_size_kb mount option · df981d03
      Theodore Ts'o authored
      
      
      Very large directories can cause significant performance problems, or
      perhaps even invoke the OOM killer, if the process is running in a
      highly constrained memory environment (whether it is VM's with a small
      amount of memory or in a small memory cgroup).
      
      So it is useful, in cloud server/data center environments, to be able
      to set a filesystem-wide cap on the maximum size of a directory, to
      ensure that directories never get larger than a sane size.  We do this
      via a new mount option, max_dir_size_kb.  If there is an attempt to
      grow the directory larger than max_dir_size_kb, the system call will
      return ENOSPC instead.
      
      Google-Bug-Id: 6863013
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      
      
      
      df981d03
  11. 23 Jul, 2012 1 commit
  12. 22 Jul, 2012 1 commit
  13. 14 Jul, 2012 2 commits
  14. 09 Jul, 2012 1 commit
  15. 28 May, 2012 1 commit
  16. 11 May, 2012 1 commit
    • Linus Torvalds's avatar
      vfs: make it possible to access the dentry hash/len as one 64-bit entry · 26fe5750
      Linus Torvalds authored
      
      
      This allows comparing hash and len in one operation on 64-bit
      architectures.  Right now only __d_lookup_rcu() takes advantage of this,
      since that is the case we care most about.
      
      The use of anonymous struct/unions hides the alternate 64-bit approach
      from most users, the exception being a few cases where we initialize a
      'struct qstr' with a static initializer.  This makes the problematic
      cases use a new QSTR_INIT() helper function for that (but initializing
      just the name pointer with a "{ .name = xyzzy }" initializer remains
      valid, as does just copying another qstr structure).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26fe5750
  17. 30 Apr, 2012 1 commit
  18. 29 Apr, 2012 2 commits