1. 01 Oct, 2013 12 commits
    • Miklos Szeredi's avatar
      fuse: writepage: skip already in flight · ff17be08
      Miklos Szeredi authored
      
      
      If ->writepage() tries to write back a page whose copy is still in flight,
      then just skip by calling redirty_page_for_writepage().
      
      This is OK, since now ->writepage() should never be called for data
      integrity sync.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      ff17be08
    • Miklos Szeredi's avatar
      fuse: writepages: handle same page rewrites · 8b284dc4
      Miklos Szeredi authored
      
      
      As Maxim Patlasov pointed out, it's possible to get a dirty page while it's
      copy is still under writeback, despite fuse_page_mkwrite() doing its thing
      (direct IO).
      
      This could result in two concurrent write request for the same offset, with
      data corruption if they get mixed up.
      
      To prevent this, fuse needs to check and delay such writes.  This
      implementation does this by:
      
       1. check if page is still under writeout, if so create a new, single page
          secondary request for it
      
       2. chain this secondary request onto the in-flight request
      
       2/a. if a seconday request for the same offset was already chained to the
          in-flight request, then just copy the contents of the page and discard
          the new secondary request.  This makes sure that for each page will
          have at most two requests associated with it
      
       3. when the in-flight request finished, send off all secondary requests
          chained onto it
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      8b284dc4
    • Miklos Szeredi's avatar
      fuse: writepages: fix aggregation · 1e112a48
      Miklos Szeredi authored
      
      
      Checking against tmp-page indexes is not very useful, and results in one
      (or rarely two) page requests.  Which is not much of an improvement...
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      1e112a48
    • Maxim Patlasov's avatar
      fuse: fix race in fuse_writepages() · 2d033eaa
      Maxim Patlasov authored
      
      
      The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs shrinking truncate(2) intended to purge the page.
      3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
         writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request,
         then releases the original page by end_page_writeback and unlock it.
      4) fuse_do_setattr completes and successfully returns. Since now, i_mutex
         is free.
      5) Ordinary write(2) extends i_size back to cover the page. Note that
         fuse_send_write_pages do wait for fuse writeback, but for another
         page->index.
      6) fuse_writepages_fill attaches more pages to the request (if any), then
         fuse_writepages_send is eventually called. It is supposed to crop
         inarg->size of the request, but it doesn't because i_size has already been
         extended back.
      
      Moving end_page_writeback behind fuse_writepages_send guarantees that
      __fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size
      of the request before write(2) gets the chance to extend i_size.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      2d033eaa
    • Pavel Emelyanov's avatar
      fuse: Implement writepages callback · 26d614df
      Pavel Emelyanov authored
      
      
      The .writepages one is required to make each writeback request carry more than
      one page on it. The patch enables optimized behaviour unconditionally,
      i.e. mmap-ed writes will benefit from the patch even if fc->writeback_cache=0.
      
      [SzM: simplify, add comments]
      Signed-off-by: default avatarMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      26d614df
    • Miklos Szeredi's avatar
      fuse: don't BUG on no write file · 72523425
      Miklos Szeredi authored
      
      
      Don't bug if there's no writable files found for page writeback.  If ever
      this is triggered, a WARN_ON helps debugging it much better then a BUG_ON.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      72523425
    • Miklos Szeredi's avatar
      fuse: lock page in mkwrite · cca24370
      Miklos Szeredi authored
      
      
      Lock the page in fuse_page_mkwrite() to protect against a race with
      fuse_writepage() where the page is redirtied before the actual writeback
      begins.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      cca24370
    • Pavel Emelyanov's avatar
      fuse: Prepare to handle multiple pages in writeback · 385b1268
      Pavel Emelyanov authored
      
      
      The .writepages callback will issue writeback requests with more than one
      page aboard. Make existing end/check code be aware of this.
      Signed-off-by: default avatarMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      385b1268
    • Pavel Emelyanov's avatar
      fuse: Getting file for writeback helper · adcadfa8
      Pavel Emelyanov authored
      
      
      There will be a .writepageS callback implementation which will need to
      get a fuse_file out of a fuse_inode, thus make a helper for this.
      Signed-off-by: default avatarMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      adcadfa8
    • Miklos Szeredi's avatar
      fuse: no RCU mode in fuse_access() · 698fa1d1
      Miklos Szeredi authored
      
      
      fuse_access() is never called in RCU walk, only on the final component of
      access(2) and chdir(2)...
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      698fa1d1
    • Miklos Szeredi's avatar
      fuse: readdirplus: fix RCU walk · 6314efee
      Miklos Szeredi authored
      
      
      Doing dput(parent) is not valid in RCU walk mode.  In RCU mode it would
      probably be okay to update the parent flags, but it's actually not
      necessary most of the time...
      
      So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was
      recently initialized by READDIRPLUS.
      
      This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by
      READDIRPLUS and only dropping out of RCU mode if this flag is set.
      FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in
      the parent.
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      6314efee
    • Miklos Szeredi's avatar
      fuse: don't check_submounts_and_drop() in RCU walk · 3c70b8ee
      Miklos Szeredi authored
      
      
      If revalidate finds an invalid dentry in RCU walk mode, let the VFS deal
      with it instead of calling check_submounts_and_drop() which is not prepared
      for being called from RCU walk.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      3c70b8ee
  2. 18 Sep, 2013 2 commits
    • Maxim Patlasov's avatar
      fuse: fix fallocate vs. ftruncate race · 0ab08f57
      Maxim Patlasov authored
      
      
      A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed
      description of races between ftruncate and anyone who can extend i_size:
      
      > 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size
      > changed (due to our own fuse_do_setattr()) and is going to call
      > truncate_pagecache() for some  'new_size' it believes valid right now. But by
      > the time that particular truncate_pagecache() is called ...
      > 2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      > not -- it doesn't matter).
      > 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      > 4. mmap-ed write makes a page in the extended region dirty.
      
      This patch adds necessary bits to fuse_file_fallocate() to protect from that
      race.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      0ab08f57
    • Maxim Patlasov's avatar
      fuse: wait for writeback in fuse_file_fallocate() · bde52788
      Maxim Patlasov authored
      
      
      The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
         and <offset, size> covering the page.
      3) Before truncate_pagecache_range call from fuse_file_fallocate,
         the page goes to write-back. The page is fully processed by fuse_writepage
         (including end_page_writeback on the page), but fuse_flush_writepages did
         nothing because fi->writectr < 0.
      4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
         by calling fuse_release_nowrite. The latter triggers processing queued
         write-back request which will write stale data to the hole soon.
      
      Changed in v2 (thanks to Brian for suggestion):
       - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
         we can end up in returning -ENOTSUPP while user data is already punched
         from page cache. Use filemap_write_and_wait_range() instead.
      Changed in v3 (thanks to Miklos for suggestion):
       - fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite()
         instead. So far as we need a dirty-page barrier only, fuse_sync_writes()
         should be enough.
       - rebased to for-linus branch of fuse.git
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      bde52788
  3. 15 Sep, 2013 1 commit
  4. 14 Sep, 2013 1 commit
    • Linus Torvalds's avatar
      vfs: fix dentry LRU list handling and nr_dentry_unused accounting · 89dc77bc
      Linus Torvalds authored
      
      
      The LRU list changes interacted badly with our nr_dentry_unused
      accounting, and even worse with the new DCACHE_LRU_LIST bit logic.
      
      This introduces helper functions to make sure everything follows the
      proper dcache d_lru list rules: the dentry cache is complicated by the
      fact that some of the hotpaths don't even want to look at the LRU list
      at all, and the fact that we use the same list entry in the dentry for
      both the LRU list and for our temporary shrinking lists when removing
      things from the LRU.
      
      The helper functions temporarily have some extra sanity checking for the
      flag bits that have to match the current LRU state of the dentry.  We'll
      remove that before the final 3.12 release, but considering how easy it
      is to get wrong, this first cleanup version has some very particular
      sanity checking.
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89dc77bc
  5. 13 Sep, 2013 2 commits
  6. 12 Sep, 2013 10 commits
  7. 11 Sep, 2013 12 commits
    • Rob Landley's avatar
      initmpfs: move rootfs code from fs/ramfs/ to init/ · 57f150a5
      Rob Landley authored
      
      
      When the rootfs code was a wrapper around ramfs, having them in the same
      file made sense.  Now that it can wrap another filesystem type, move it in
      with the init code instead.
      
      This also allows a subsequent patch to access rootfstype= command line
      arg.
      Signed-off-by: default avatarRob Landley <rob@landley.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57f150a5
    • Rob Landley's avatar
      initmpfs: move bdi setup from init_rootfs to init_ramfs · 4bbee76b
      Rob Landley authored
      Even though ramfs hasn't got a backing device, commit e0bf68dd
      
       ("mm:
      bdi init hooks") added one anyway, and put the initialization in
      init_rootfs() since that's the first user, leaving it out of init_ramfs()
      to avoid duplication.
      
      But initmpfs uses init_tmpfs() instead, so move the init into the
      filesystem's init function, add a "once" guard to prevent duplicate
      initialization, and call the filesystem init from rootfs init.
      
      This goes part of the way to allowing ramfs to be built as a module.
      
      [akpm@linux-foundation.org; using bit 1 was odd]
      Signed-off-by: default avatarRob Landley <rob@landley.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bbee76b
    • Rob Landley's avatar
      initmpfs: replace MS_NOUSER in initramfs · 137fdcc1
      Rob Landley authored
      
      
      Mounting MS_NOUSER prevents --bind mounts from rootfs.  Prevent new rootfs
      mounts with a different mechanism that doesn't affect bind mounts.
      Signed-off-by: default avatarRob Landley <rob@landley.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      137fdcc1
    • Jan Kara's avatar
      lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt · 5e4c0d97
      Jan Kara authored
      
      
      With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
      one such possible user), the following race can happen:
      
      radix_tree_preload()
      ...
      radix_tree_insert()
        radix_tree_node_alloc()
          if (rtp->nr) {
            ret = rtp->nodes[rtp->nr - 1];
      <interrupt>
      ...
      radix_tree_preload()
      ...
      radix_tree_insert()
        radix_tree_node_alloc()
          if (rtp->nr) {
            ret = rtp->nodes[rtp->nr - 1];
      
      And we give out one radix tree node twice.  That clearly results in radix
      tree corruption with different results (usually OOPS) depending on which
      two users of radix tree race.
      
      We fix the problem by making radix_tree_node_alloc() always allocate fresh
      radix tree nodes when in interrupt.  Using preloading when in interrupt
      doesn't make sense since all the allocations have to be atomic anyway and
      we cannot steal nodes from process-context users because some users rely
      on radix_tree_insert() succeeding after radix_tree_preload().
      in_interrupt() check is somewhat ugly but we cannot simply key off passed
      gfp_mask as that is acquired from root_gfp_mask() and thus the same for
      all preload users.
      
      Another part of the fix is to avoid node preallocation in
      radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
      preallocation in such case doesn't make sense and when preallocation would
      happen in interrupt we could possibly leak some allocated nodes.  However,
      some users of radix_tree_preload() require following radix_tree_insert()
      to succeed.  To avoid unexpected effects for these users,
      radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
      and we provide a new function radix_tree_maybe_preload() for those users
      which get different gfp mask from different call sites and which are
      prepared to handle radix_tree_insert() failure.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e4c0d97
    • Dan Carpenter's avatar
      affs: use loff_t in affs_truncate() · 63259326
      Dan Carpenter authored
      
      
      It seems pretty unlikely that AFFS supports files over 4GB but we may as
      well leave use loff_t just for cleanness sake instead of truncating it to
      32 bits.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63259326
    • Michael Holzheu's avatar
      vmcore: enable /proc/vmcore mmap for s390 · 11e376a3
      Michael Holzheu authored
      
      
      The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
      now to use mmap also on s390.
      
      So enable mmap for s390 again.
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11e376a3
    • Michael Holzheu's avatar
      vmcore: introduce remap_oldmem_pfn_range() · 9cb21813
      Michael Holzheu authored
      
      
      For zfcpdump we can't map the HSA storage because it is only available via
      a read interface.  Therefore, for the new vmcore mmap feature we have
      introduce a new mechanism to create mappings on demand.
      
      This patch introduces a new architecture function remap_oldmem_pfn_range()
      that should be used to create mappings with remap_pfn_range() for oldmem
      areas that can be directly mapped.  For zfcpdump this is everything
      besides of the HSA memory.  For the areas that are not mapped by
      remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
      handler mmap_vmcore_fault() is called.
      
      This handler works as follows:
      
      * Get already available or new page from page cache (find_or_create_page)
      * Check if /proc/vmcore page is filled with data (PageUptodate)
      * If yes:
        Return that page
      * If no:
        Fill page using __vmcore_read(), set PageUptodate, and return page
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9cb21813
    • Michael Holzheu's avatar
      vmcore: introduce ELF header in new memory feature · be8a8d06
      Michael Holzheu authored
      
      
      For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
      (zfcpdump).  We have support where the first HSA_SIZE bytes are saved into
      a hypervisor owned memory area (HSA) before the kdump kernel is booted.
      When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.
      
      The advantages of this mechanism are:
      
       * No crashkernel memory has to be defined in the old kernel.
       * Early boot problems (before kexec_load has been done) can be dumped
       * Non-Linux systems can be dumped.
      
      We modify the s390 copy_oldmem_page() function to read from the HSA memory
      if memory below HSA_SIZE bytes is requested.
      
      Since we cannot use the kexec tool to load the kernel in this scenario,
      we have to build the ELF header in the 2nd (kdump/new) kernel.
      
      So with the following patch set we would like to introduce the new
      function that the ELF header for /proc/vmcore can be created in the 2nd
      kernel memory.
      
      The following steps are done during zfcpdump execution:
      
      1.  Production system crashes
      2.  User boots a SCSI disk that has been prepared with the zfcpdump tool
      3.  Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
      4.  Boot loader loads kernel into low memory area
      5.  Kernel boots and uses only HSA_SIZE bytes of memory
      6.  Kernel saves registers of non-boot CPUs
      7.  Kernel does memory detection for dump memory map
      8.  Kernel creates ELF header for /proc/vmcore
      9.  /proc/vmcore uses this header for initialization
      10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
          - copy_oldmem_page() copies from HSA for memory below HSA_SIZE
          - copy_oldmem_page() copies from real memory for memory above HSA_SIZE
      
      Currently for s390 we create the ELF core header in the 2nd kernel with a
      small trick.  We relocate the addresses in the ELF header in a way that
      for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
      and the read_from_oldmem() returns the correct data.  This allows the
      /proc/vmcore code to use the ELF header in the 2nd kernel.
      
      This patch:
      
      Exchange the old mechanism with the new and much cleaner function call
      override feature that now offcially allows to create the ELF core header
      in the 2nd kernel.
      
      To use the new feature the following function have to be defined
      by the architecture backend code to read from new memory:
      
       * elfcorehdr_alloc: Allocate ELF header
       * elfcorehdr_free: Free the memory of the ELF header
       * elfcorehdr_read: Read from ELF header
       * elfcorehdr_read_notes: Read from ELF notes
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be8a8d06
    • Oleg Nesterov's avatar
      exec: cleanup the error handling in search_binary_handler() · 6b3c538f
      Oleg Nesterov authored
      
      
      The error hanling and ret-from-loop look confusing and inconsistent.
      
      - "retval >= 0" simply returns
      
      - "!bprm->file" returns too but with read_unlock() because
         binfmt_lock was already re-acquired
      
      - "retval != -ENOEXEC || bprm->mm == NULL" does "break" and
        relies on the same check after the main loop
      
      Consolidate these checks into a single if/return statement.
      
      need_retry still checks "retval == -ENOEXEC", but this and -ENOENT before
      the main loop are not needed.  This is only for pathological and
      impossible list_empty(&formats) case.
      
      It is not clear why do we check "bprm->mm == NULL", probably this
      should be removed.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Zach Levis <zml@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b3c538f
    • Oleg Nesterov's avatar
      exec: don't retry if request_module() fails · 4e0621a0
      Oleg Nesterov authored
      
      
      A separate one-liner for better documentation.
      
      It doesn't make sense to retry if request_module() fails to exec
      /sbin/modprobe, add the additional "request_module() < 0" check.
      
      However, this logic still doesn't look exactly right:
      
      1. It would be better to check "request_module() != 0", the user
         space modprobe process should report the correct exit code.
         But I didn't dare to add the user-visible change.
      
      2. The whole ENOEXEC logic looks suboptimal. Suppose that we try
         to exec a "#!path-to-unsupported-binary" script. In this case
         request_module() + "retry" will be done twice: first by the
         "depth == 1" code, and then again by the "depth == 0" caller
         which doesn't make sense.
      
      3. And note that in the case above bprm->buf was already changed
         by load_script()->prepare_binprm(), so this looks even more
         ugly.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Zach Levis <zml@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e0621a0
    • Oleg Nesterov's avatar
      exec: cleanup the CONFIG_MODULES logic · cb7b6b1c
      Oleg Nesterov authored
      
      
      search_binary_handler() uses "for (try=0; try<2; try++)" to avoid "goto"
      but the code looks too complicated and horrible imho.  We still need to
      check "try == 0" before request_module() and add the additional "break"
      for !CONFIG_MODULES case.
      
      Kill this loop and use a simple "bool need_retry" + "goto retry".  The
      code looks much simpler and we do not even need ifdef's, gcc can optimize
      out the "if (need_retry)" block if !IS_ENABLED().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Zach Levis <zml@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb7b6b1c
    • Oleg Nesterov's avatar
      exec: kill ->load_binary != NULL check in search_binary_handler() · 92eaa565
      Oleg Nesterov authored
      
      
      search_binary_handler() checks ->load_binary != NULL for no reason, this
      method should be always defined.  Turn this check into WARN_ON() and move
      it into __register_binfmt().
      
      Also, kill the function pointer.  The current code looks confusing, as if
      ->load_binary can go away after read_unlock(&binfmt_lock).  But we rely on
      module_get(fmt->module), this fmt can't be changed or unregistered,
      otherwise this code is buggy anyway.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Zach Levis <zml@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92eaa565