1. 21 May, 2011 1 commit
    • Steven Whitehouse's avatar
      GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4
      Steven Whitehouse authored
      
      
      The deallocation code for directories in GFS2 is largely divided into
      two parts. The first part deallocates any directory leaf blocks and
      marks the directory as being a regular file when that is complete. The
      second stage was identical to deallocating regular files.
      
      Regular files have their data blocks in a different
      address space to directories, and thus what would have been normal data
      blocks in a regular file (the hash table in a GFS2 directory) were
      deallocated correctly. However, a reference to these blocks was left in the
      journal (assuming of course that some previous activity had resulted in
      those blocks being in the journal or ail list).
      
      This patch uses the i_depth as a test of whether the inode is an
      exhash directory (we cannot test the inode type as that has already
      been changed to a regular file at this stage in deallocation)
      
      The original issue was reported by Chris Hertel as an issue he encountered
      running bonnie++
      
      Reported-by: default avatarChristopher R. Hertel <crh@samba.org>
      Cc: Abhijith Das <adas@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      6d3117b4
  2. 20 Apr, 2011 2 commits
  3. 18 Apr, 2011 1 commit
    • Bob Peterson's avatar
      GFS2: filesystem hang caused by incorrect lock order · 44ad37d6
      Bob Peterson authored
      
      
      This patch fixes a deadlock in GFS2 where two processes are trying
      to reclaim an unlinked dinode:
      One holds the inode glock and calls gfs2_lookup_by_inum trying to look
      up the inode, which it can't, due to I_FREEING.  The other has set
      I_FREEING from vfs and is at the beginning of gfs2_delete_inode
      waiting for the glock, which is held by the first.  The solution is to
      add a new non_block parameter to the gfs2_iget function that causes it
      to return -ENOENT if the inode is being freed.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      44ad37d6
  4. 24 Feb, 2011 1 commit
    • Bob Peterson's avatar
      GFS2: deallocation performance patch · 4c16c36a
      Bob Peterson authored
      
      
      This patch is a performance improvement to GFS2's dealloc code.
      Rather than update the quota file and statfs file for every
      single block that's stripped off in unlink function do_strip,
      this patch keeps track and updates them once for every layer
      that's stripped.  This is done entirely inside the existing
      transaction, so there should be no risk of corruption.
      The other functions that deallocate blocks will be unaffected
      because they are using wrapper functions that do the same
      thing that they do today.
      
      I tested this code on my roth cluster by creating 200
      files in a directory, each of which is 100MB, then on
      four nodes, I simultaneously deleted the files, thus competing
      for GFS2 resources (but different files).  The commands
      I used were:
      
      [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      
      The performance increase was significant:
      
                   roth-01     roth-02     roth-03     roth-05
                   ---------   ---------   ---------   ---------
      old: real    0m34.027    0m25.021s   0m23.906s   0m35.646s
      new: real    0m22.379s   0m24.362s   0m24.133s   0m18.562s
      
      Total time spent deleting:
      old: 118.6s
      new:  89.4
      
      For this particular case, this showed a 25% performance increase for
      GFS2 unlinks.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      4c16c36a
  5. 07 Dec, 2010 1 commit
    • Bob Peterson's avatar
      GFS2: fsck.gfs2 reported statfs error after gfs2_grow · bcd7278d
      Bob Peterson authored
      
      
      When you do gfs2_grow it failed to take the very last
      rgrp into account when adding up the new free space due
      to an off-by-one error.  It was not reading the last
      rgrp from the rindex because of a check for "<=" that
      should have been "<".  Therefore, fsck.gfs2 was finding
      (and fixing) an error with the system statfs file.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      bcd7278d
  6. 30 Nov, 2010 2 commits
  7. 15 Nov, 2010 1 commit
    • Steven Whitehouse's avatar
      GFS2: Fix inode deallocation race · 044b9414
      Steven Whitehouse authored
      
      
      This area of the code has always been a bit delicate due to the
      subtleties of lock ordering. The problem is that for "normal"
      alloc/dealloc, we always grab the inode locks first and the rgrp lock
      later.
      
      In order to ensure no races in looking up the unlinked, but still
      allocated inodes, we need to hold the rgrp lock when we do the lookup,
      which means that we can't take the inode glock.
      
      The solution is to borrow the technique already used by NFS to solve
      what is essentially the same problem (given an inode number, look up
      the inode carefully, checking that it really is in the expected
      state).
      
      We cannot do that directly from the allocation code (lock ordering
      again) so we give the job to the pre-existing delete workqueue and
      carry on with the allocation as normal.
      
      If we find there is no space, we do a journal flush (required anyway
      if space from a deallocation is to be released) which should block
      against the pending deallocations, so we should always get the space
      back.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      044b9414
  8. 30 Sep, 2010 1 commit
    • Bob Peterson's avatar
      GFS2 fatal: filesystem consistency error on rename · 46290341
      Bob Peterson authored
      
      
      This patch fixes a GFS2 problem whereby the first rename after a
      mount can result in a file system consistency error being flagged
      improperly and cause the file system to withdraw.  The problem is
      that the rename code tries to run the rgrp list with function
      gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in
      from disk.  The patch makes the rename function hold the rindex
      glock (as the gfs2_unlink code does today) which reads in the rgrp
      list if need be.  There were a total of three places in the rename
      code that improperly referenced the rgrp list without the rindex
      glock and this patch fixes all three.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      46290341
  9. 20 Sep, 2010 3 commits
    • Benjamin Marzinski's avatar
      GFS2: fallocate support · 3921120e
      Benjamin Marzinski authored
      
      
      This patch adds support for fallocate to gfs2.  Since the gfs2 does not support
      uninitialized data blocks, it must write out zeros to all the blocks.  However,
      since it does not need to lock any pages to read from, gfs2 can write out the
      zero blocks much more efficiently.  On a moderately full filesystem, fallocate
      works around 5 times faster on average.  The fallocate call also allows gfs2 to
      add blocks to the file without changing the filesize, which will make it
      possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
      grow a completely full filesystem.
      
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      3921120e
    • Steven Whitehouse's avatar
      GFS2: Add a bug trap in allocation code · 9a3f236d
      Steven Whitehouse authored
      
      
      This adds a check to ensure that if we reach the block allocator
      that we don't try and proceed if there is no alloc structure
      hanging off the inode. This should only happen if there is a bug
      in GFS2. The error return code is distinctive in order that it
      will be easily spotted.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      9a3f236d
    • Steven Whitehouse's avatar
      GFS2: Remove i_disksize · a2e0f799
      Steven Whitehouse authored
      
      
      With the update of the truncate code, ip->i_disksize and
      inode->i_size are merely copies of each other. This means
      we can remove ip->i_disksize and use inode->i_size exclusively
      reducing the size of a GFS2 inode by 8 bytes.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      a2e0f799
  10. 16 Sep, 2010 1 commit
    • Christoph Hellwig's avatar
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig authored
      
      
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  11. 10 Sep, 2010 1 commit
  12. 21 May, 2010 1 commit
  13. 12 May, 2010 1 commit
  14. 28 Apr, 2010 1 commit
  15. 14 Apr, 2010 1 commit
    • Bob Peterson's avatar
      GFS2: glock livelock · 1a0eae88
      Bob Peterson authored
      
      
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      1a0eae88
  16. 01 Feb, 2010 3 commits
  17. 03 Dec, 2009 1 commit
  18. 21 Sep, 2009 1 commit
  19. 14 Sep, 2009 2 commits
  20. 08 Sep, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Be extra careful about deallocating inodes · acf7e244
      Steven Whitehouse authored
      
      
      There is a potential race in the inode deallocation code if two
      nodes try to deallocate the same inode at the same time. Most of
      the issue is solved by the iopen locking. There is still a small
      window which is not covered by the iopen lock. This patches fixes
      that and also makes the deallocation code more robust in the face of
      any errors in the rgrp bitmaps, or erroneous iopen callbacks from
      other nodes.
      
      This does introduce one extra disk read, but that is generally not
      an issue since its the same block that must be written to later
      in the deallocation process. The total disk accesses therefore stay
      the same,
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      acf7e244
  21. 27 Aug, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Remove no_formal_ino generating code · 8d8291ae
      Steven Whitehouse authored
      
      
      The inum structure used throughout GFS2 has two fields. One
      no_addr is the disk block number of the inode in question and
      is used everywhere as the inode number. The other, no_formal_ino,
      is used only as the generation number for NFS.
      
      Historically the no_formal_ino field was set using a complicated
      system of one global and one per-node file containing inode numbers
      in order to ensure that each no_formal_ino was unique. Also this
      code made no provision for what would happen when eventually the
      (64 bit) numbers ran out. Now I know that is pretty unlikely to
      happen given the large space of numbers, but it is possible
      nevertheless.
      
      The only guarantee required for no_formal_ino is that, for any
      single inode, the same number doesn't get reused too quickly.
      
      We already have a generation number which is kept in the inode
      and initialised from a counter in the resource group (almost
      no overhead, since we have to touch the resource group anyway
      in order to allocate an inode in the first place). Aside from
      ensuring that we never use the value 0 in the no_formal_ino
      field, we can use that counter directly.
      
      As a result of that change, we lose about 200 lines of code and
      also gain about 10 creates/sec on the postmark benchmark (on
      my test machine).
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      8d8291ae
  22. 17 Aug, 2009 2 commits
  23. 30 Jul, 2009 2 commits
  24. 12 Jun, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Add tracepoints · 63997775
      Steven Whitehouse authored
      
      
      This patch adds the ability to trace various aspects of the GFS2
      filesystem. The trace points are divided into three groups,
      glocks, logging and bmap. These points have been chosen because
      they allow inspection of the major internal functions of GFS2
      and they are also generic enough that they are unlikely to need
      any major changes as the filesystem evolves.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      63997775
  25. 22 May, 2009 2 commits
  26. 21 May, 2009 2 commits
    • Steven Whitehouse's avatar
      GFS2: Be more aggressive in reclaiming unlinked inodes · 1ce97e56
      Steven Whitehouse authored
      
      
      This patch increases the frequency with which gfs2 looks
      for unlinked, but still allocated inodes. Its the equivalent
      operation to ext3's orphan list, but done with bitmaps in
      the resource groups.
      
      This also fixes a bug where a field in the rgrp was too small.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      1ce97e56
    • Steven Whitehouse's avatar
      GFS2: Add a rgrp bitmap full flag · 60a0b8f9
      Steven Whitehouse authored
      
      
      During block allocation, it is useful to know if sections of disk
      are full on a finer grained basis than a single resource group.
      This can make a performance difference when resource groups have
      larger numbers of bitmap blocks, since we no longer have to search
      them all block by block in each individual bitmap.
      
      The full flag is set on a per-bitmap basis when it has been
      searched and found to have no free space. It is then skipped in
      subsequent searches until the flag is reset. The resetting
      occurs if we have to drop the glock on the resource group for any
      reason, or if we deallocate some blocks within that resource
      group and thus free up some space.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      60a0b8f9
  27. 20 May, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Improve resource group error handling · 09010978
      Steven Whitehouse authored
      
      
      This patch improves the error handling in the case where we
      discover that the summary information in the resource group
      doesn't match the bitmap information while in the process of
      allocating blocks. Originally this resulted in a kernel bug,
      but this patch changes that so that we return -EIO and print
      some messages explaining what went wrong, and how to fix it.
      
      We also remember locally not to try and allocate from the
      same rgrp again, so that a subsequent allocation in a
      different rgrp should succeed.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      09010978
  28. 23 Apr, 2009 2 commits
    • Steven Whitehouse's avatar
      GFS2: Ensure that the inode goal block settings are updated · d9ba7615
      Steven Whitehouse authored
      
      
      GFS2 has a goal block associated with each inode indicating the
      search start position for future block allocations (in fact there
      are two, but thats for backward compatibility with GFS1 as they
      are set to identical locations in GFS2).
      
      In some circumstances, depending on the ordering of updates to
      the inode it was possible for the goal block settings to not
      be updated on disk. This patch ensures that the goal block will
      always get updated, thus reducing the potential for searching
      the same (already allocated) blocks again when looking for free
      space during block allocation.
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      d9ba7615
    • Steven Whitehouse's avatar
      GFS2: Fix bug in block allocation · d8bd504a
      Steven Whitehouse authored
      
      
      The new bitfit algorithm was counting from the wrong end of
      64 bit words in the bitfield. This fixes it by using __ffs64
      instead of fls64
      
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      d8bd504a