1. 25 Oct, 2013 1 commit
  2. 09 Jul, 2013 1 commit
  3. 08 Jul, 2013 1 commit
  4. 03 Jul, 2013 1 commit
  5. 29 Jun, 2013 4 commits
  6. 07 Jun, 2013 1 commit
  7. 04 Mar, 2013 1 commit
    • Eric W. Biederman's avatar
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman authored
      
      
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarKees Cook <keescook@google.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  8. 26 Feb, 2013 2 commits
  9. 23 Feb, 2013 1 commit
  10. 13 Feb, 2013 1 commit
  11. 20 Dec, 2012 1 commit
  12. 06 Dec, 2012 1 commit
  13. 03 Oct, 2012 1 commit
  14. 20 Aug, 2012 1 commit
    • Tejun Heo's avatar
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo authored
      
      
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  15. 14 Jul, 2012 3 commits
  16. 01 Jun, 2012 1 commit
    • Josef Bacik's avatar
      fs: introduce inode operation ->update_time · c3b2da31
      Josef Bacik authored
      
      
      Btrfs has to make sure we have space to allocate new blocks in order to modify
      the inode, so updating time can fail.  We've gotten around this by having our
      own file_update_time but this is kind of a pain, and Christoph has indicated he
      would like to make xfs do something different with atime updates.  So introduce
      ->update_time, where we will deal with i_version an a/m/c time updates and
      indicate which changes need to be made.  The normal version just does what it
      has always done, updates the time and marks the inode dirty, and then
      filesystems can choose to do something different.
      
      I've gone through all of the users of file_update_time and made them check for
      errors with the exception of the fault code since it's complicated and I wasn't
      quite sure what to do there, also Jan is going to be pushing the file time
      updates into page_mkwrite for those who have it so that should satisfy btrfs and
      make it not a big deal to check the file_update_time() return code in the
      generic fault path. Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      c3b2da31
  17. 30 May, 2012 1 commit
  18. 06 May, 2012 1 commit
  19. 28 Mar, 2012 1 commit
  20. 21 Mar, 2012 1 commit
  21. 07 Jan, 2012 1 commit
  22. 04 Jan, 2012 6 commits
  23. 14 Dec, 2011 1 commit
  24. 02 Nov, 2011 1 commit
  25. 21 Jul, 2011 1 commit
    • Josef Bacik's avatar
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik authored
      
      
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  26. 28 May, 2011 2 commits
  27. 27 May, 2011 1 commit
    • Ying Han's avatar
      memcg: add the pagefault count into memcg stats · 456f998e
      Ying Han authored
      
      
      Two new stats in per-memcg memory.stat which tracks the number of page
      faults and number of major page faults.
      
        "pgfault"
        "pgmajfault"
      
      They are different from "pgpgin"/"pgpgout" stat which count number of
      pages charged/discharged to the cgroup and have no meaning of reading/
      writing page to disk.
      
      It is valuable to track the two stats for both measuring application's
      performance as well as the efficiency of the kernel page reclaim path.
      Counting pagefaults per process is useful, but we also need the aggregated
      value since processes are monitored and controlled in cgroup basis in
      memcg.
      
      Functional test: check the total number of pgfault/pgmajfault of all
      memcgs and compare with global vmstat value:
      
        $ cat /proc/vmstat | grep fault
        pgfault 1070751
        pgmajfault 553
      
        $ cat /dev/cgroup/memory.stat | grep fault
        pgfault 1071138
        pgmajfault 553
        total_pgfault 1071142
        total_pgmajfault 553
      
        $ cat /dev/cgroup/A/memory.stat | grep fault
        pgfault 199
        pgmajfault 0
        total_pgfault 199
        total_pgmajfault 0
      
      Performance test: run page fault test(pft) wit 16 thread on faulting in
      15G anon pages in 16G container.  There is no regression noticed on the
      "flt/cpu/s"
      
      Sample output from pft:
      
        TAG pft:anon-sys-default:
          Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
          15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260
      
        +-------------------------------------------------------------------------+
            N           Min           Max        Median           Avg        Stddev
        x  10     16682.962     17344.027     16913.524     16928.812      166.5362
        +  10     16695.568     16923.896     16820.604     16824.652     84.816568
        No difference proven at 95.0% confidence
      
      [akpm@linux-foundation.org: fix build]
      [hughd@google.com: shmem fix]
      Signed-off-by: default avatarYing Han <yinghan@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      456f998e
  28. 26 May, 2011 1 commit