1. 19 Jun, 2015 1 commit
  2. 18 Jun, 2015 1 commit
    • Bob Peterson's avatar
      GFS2: Don't add all glocks to the lru · e7ccaf5f
      Bob Peterson authored
      
      
      The glocks used for resource groups often come and go hundreds of
      thousands of times per second. Adding them to the lru list just
      adds unnecessary contention for the lru_lock spin_lock, especially
      considering we're almost certainly going to re-use the glock and
      take it back off the lru microseconds later. We never want the
      glock shrinker to cull them anyway. This patch adds a new bit in
      the glops that determines which glock types get put onto the lru
      list and which ones don't.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      e7ccaf5f
  3. 09 Jun, 2015 1 commit
  4. 08 Jun, 2015 1 commit
  5. 02 Jun, 2015 2 commits
    • Abhi Das's avatar
      gfs2: limit quota log messages · 9cde2898
      Abhi Das authored
      
      
      This patch makes the quota subsystem only report once that a
      particular user/group has exceeded their allotted quota.
      
      Previously, it was possible for a program to continuously try
      exceeding quota (despite receiving EDQUOT) and in turn trigger
      gfs2 to issue a kernel log message about quota exceed. In theory,
      this could get out of hand and flood the log and the filesystem
      hosting the log files.
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      9cde2898
    • Abhi Das's avatar
      gfs2: fix quota updates on block boundaries · 39a72580
      Abhi Das authored
      
      
      For smaller block sizes (512B, 1K, 2K), some quotas straddle block
      boundaries such that the usage value is on one block and the rest
      of the quota is on the previous block. In such cases, the value
      does not get updated correctly. This patch fixes that by addressing
      the boundary conditions correctly.
      
      This patch also adds a (s64) cast that was missing in a call to
      gfs2_quota_change() in inode.c
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      39a72580
  6. 18 May, 2015 1 commit
  7. 05 May, 2015 5 commits
    • Fabian Frederick's avatar
      gfs2: kerneldoc warning fixes · 1272574b
      Fabian Frederick authored
      
      
      Fixes the following kernel-doc warnings:
      Warning(fs/gfs2/aops.c:180): No description found for parameter 'wbc'
      Warning(fs/gfs2/aops.c:236): No description found for parameter 'end'
      Warning(fs/gfs2/aops.c:236): No description found for parameter 'done_index'
      Warning(fs/gfs2/aops.c:236): Excess function parameter 'writepage' description in 'gfs2_write_jdata_pagevec'
      Warning(fs/gfs2/aops.c:346): Excess function parameter 'writepage' description in 'gfs2_write_cache_jdata'
      Warning(fs/gfs2/aops.c:346): Excess function parameter 'data' description in 'gfs2_write_cache_jdata'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'file'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'mapping'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'pages'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'nr_pages'
      Warning(fs/gfs2/aops.c:870): No description found for parameter 'copied'
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      1272574b
    • Fabian Frederick's avatar
      gfs2: convert simple_str to kstr · e50ead48
      Fabian Frederick authored
      
      
      -Remove obsolete simple_str functions.
      -Return error code when kstr failed.
      -This patch also calls functions corresponding to destination type.
      
      Thanks to Alexey Dobriyan for suggesting improvements in
      block_store() and wdack_store()
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      e50ead48
    • Benjamin Marzinski's avatar
      GFS2: make sure S_NOSEC flag isn't overwritten · 01e64ee4
      Benjamin Marzinski authored
      
      
      At the end of gfs2_set_inode_flags inode->i_flags is set to flags, so
      we should be modifying flags instead of inode->i_flags, so it isn't
      overwritten.
      
      Signed-off-by: Benjamin Marzinski <bmarzins redhat com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      01e64ee4
    • Benjamin Marzinski's avatar
      GFS2: add support for rename2 and RENAME_EXCHANGE · a63b7bbc
      Benjamin Marzinski authored
      
      
      gfs2 now uses the rename2 directory iop, and supports the
      RENAME_EXCHANGE flag (as well as RENAME_NOREPLACE, which the vfs
      takes care of).
      
      Signed-off-by: Benjamin Marzinski <bmarzins redhat com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      a63b7bbc
    • Abhi Das's avatar
      gfs2: handle NULL rgd in set_rgrp_preferences · 959b6717
      Abhi Das authored
      
      
      The function set_rgrp_preferences() does not handle the (rarely
      returned) NULL value from gfs2_rgrpd_get_next() and this patch
      fixes that.
      
      The fs image in question is only 150MB in size which allows for
      only 1 rgrp to be created. The in-memory rb tree has only 1 node
      and when gfs2_rgrpd_get_next() is called on this sole rgrp, it
      returns NULL. (Default behavior is to wrap around the rb tree and
      return the first node to give the illusion of a circular linked
      list. In the case of only 1 rgrp, we can't have
      gfs2_rgrpd_get_next() return the same rgrp (first, last, next all
      point to the same rgrp)... that would cause unintended consequences
      and infinite loops.)
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      959b6717
  8. 01 May, 2015 2 commits
  9. 24 Apr, 2015 2 commits
  10. 08 Apr, 2015 1 commit
    • Abhi Das's avatar
      gfs2: fix quota refresh race in do_glock() · 30133177
      Abhi Das authored
      
      
      quotad periodically syncs in-memory quotas to the ondisk quota file
      and sets the QDF_REFRESH flag so that a subsequent read of a synced
      quota is re-read from disk.
      
      gfs2_quota_lock() checks for this flag and sets a 'force' bit to
      force re-read from disk if requested. However, there is a race
      condition here. It is possible for gfs2_quota_lock() to find the
      QDF_REFRESH flag unset (i.e force=0) and quotad comes in immediately
      after and syncs the relevant quota and sets the QDF_REFRESH flag.
      gfs2_quota_lock() resumes with force=0 and uses the stale in-memory
      quota usage values that result in miscalculations.
      
      This patch fixes this race by moving the check for the QDF_REFRESH
      flag check further out into the gfs2_quota_lock() process, i.e, in
      do_glock(), under the protection of the quota glock.
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      30133177
  11. 30 Mar, 2015 1 commit
  12. 26 Mar, 2015 1 commit
  13. 18 Mar, 2015 6 commits
  14. 22 Feb, 2015 1 commit
    • David Howells's avatar
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells authored
      
      
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  15. 13 Feb, 2015 2 commits
    • Vladimir Davydov's avatar
      list_lru: add helpers to isolate items · 3f97b163
      Vladimir Davydov authored
      
      
      Currently, the isolate callback passed to the list_lru_walk family of
      functions is supposed to just delete an item from the list upon returning
      LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
      __list_lru_walk_one after the callback returns.  Since the callback is
      allowed to drop the lock after removing an item (it has to return
      LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
      of elements on the list even if we check them under the lock.  This makes
      it difficult to move items from one list_lru_one to another, which is
      required for per-memcg list_lru reparenting - we can't just splice the
      lists, we have to move entries one by one.
      
      This patch therefore introduces helpers that must be used by callback
      functions to isolate items instead of raw list_del/list_move.  These are
      list_lru_isolate and list_lru_isolate_move.  They not only remove the
      entry from the list, but also fix the nr_items counter, making sure
      nr_items always reflects the actual number of elements on the list if
      checked under the appropriate lock.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f97b163
    • Vladimir Davydov's avatar
      list_lru: introduce list_lru_shrink_{count,walk} · 503c358c
      Vladimir Davydov authored
      
      
      Kmem accounting of memcg is unusable now, because it lacks slab shrinker
      support.  That means when we hit the limit we will get ENOMEM w/o any
      chance to recover.  What we should do then is to call shrink_slab, which
      would reclaim old inode/dentry caches from this cgroup.  This is what
      this patch set is intended to do.
      
      Basically, it does two things.  First, it introduces the notion of
      per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
      cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
      passed the memory cgroup to scan from in shrink_control->memcg.  For
      such shrinkers shrink_slab iterates over the whole cgroup subtree under
      the target cgroup and calls the shrinker for each kmem-active memory
      cgroup.
      
      Secondly, this patch set makes the list_lru structure per-memcg.  It's
      done transparently to list_lru users - everything they have to do is to
      tell list_lru_init that they want memcg-aware list_lru.  Then the
      list_lru will automatically distribute objects among per-memcg lists
      basing on which cgroup the object is accounted to.  This way to make FS
      shrinkers (icache, dcache) memcg-aware we only need to make them use
      memcg-aware list_lru, and this is what this patch set does.
      
      As before, this patch set only enables per-memcg kmem reclaim when the
      pressure goes from memory.limit, not from memory.kmem.limit.  Handling
      memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
      it is still unclear whether we will have this knob in the unified
      hierarchy.
      
      This patch (of 9):
      
      NUMA aware slab shrinkers use the list_lru structure to distribute
      objects coming from different NUMA nodes to different lists.  Whenever
      such a shrinker needs to count or scan objects from a particular node,
      it issues commands like this:
      
              count = list_lru_count_node(lru, sc->nid);
              freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                         isolate_arg, &sc->nr_to_scan);
      
      where sc is an instance of the shrink_control structure passed to it
      from vmscan.
      
      To simplify this, let's add special list_lru functions to be used by
      shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
      consolidate the nid and nr_to_scan arguments in the shrink_control
      structure.
      
      This will also allow us to avoid patching shrinkers that use list_lru
      when we make shrink_slab() per-memcg - all we will have to do is extend
      the shrink_control structure to include the target memcg and make
      list_lru_shrink_{count,walk} handle this appropriately.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: default avatarDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      503c358c
  16. 10 Feb, 2015 2 commits
  17. 05 Feb, 2015 1 commit
    • Theodore Ts'o's avatar
      vfs: add support for a lazytime mount option · 0ae45f63
      Theodore Ts'o authored
      
      
      Add a new mount option which enables a new "lazytime" mode.  This mode
      causes atime, mtime, and ctime updates to only be made to the
      in-memory version of the inode.  The on-disk times will only get
      updated when (a) if the inode needs to be updated for some non-time
      related change, (b) if userspace calls fsync(), syncfs() or sync(), or
      (c) just before an undeleted inode is evicted from memory.
      
      This is OK according to POSIX because there are no guarantees after a
      crash unless userspace explicitly requests via a fsync(2) call.
      
      For workloads which feature a large number of random write to a
      preallocated file, the lazytime mount option significantly reduces
      writes to the inode table.  The repeated 4k writes to a single block
      will result in undesirable stress on flash devices and SMR disk
      drives.  Even on conventional HDD's, the repeated writes to the inode
      table block will trigger Adjacent Track Interference (ATI) remediation
      latencies, which very negatively impact long tail latencies --- which
      is a very big deal for web serving tiers (for example).
      
      Google-Bug-Id: 18297052
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0ae45f63
  18. 04 Feb, 2015 1 commit
  19. 28 Jan, 2015 1 commit
    • Jan Kara's avatar
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara authored
      
      
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      14bf61ff
  20. 26 Jan, 2015 1 commit
  21. 20 Jan, 2015 2 commits
  22. 13 Jan, 2015 1 commit
  23. 09 Jan, 2015 1 commit
  24. 20 Nov, 2014 2 commits