1. 14 Jul, 2017 32 commits
  2. 13 Jul, 2017 1 commit
  3. 12 Jul, 2017 7 commits
    • Michal Hocko's avatar
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko authored
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
    • Dmitry Vyukov's avatar
      fault-inject: support systematic fault injection · e41d5818
      Dmitry Vyukov authored
      Add /proc/self/task/<current-tid>/fail-nth file that allows failing
      0-th, 1-st, 2-nd and so on calls systematically.
      Excerpt from the added documentation:
      
       "Write to this file of integer N makes N-th call in the current task
        fail (N is 0-based). Read from this file returns a single char 'Y' or
        'N' that says if the fault setup with a previous write to this file
        was injected or not, and disables the fault if it wasn't yet injected.
        Note that this file enables all types of faults (slab, futex, etc).
        This setting takes precedence over all other generic settings like
        probability, interval, times, etc. But per-capability settings (e.g.
        fail_futex/ignore-private) take precedence over it. This feature is
        intended for systematic testing of faults in a single system call. See
        an example below"
      
      Why add a new setting:
      1. Existing settings are global rather than per-task.
         So parallel testing is not possible.
      2. attr->interval is close but it depends on attr->count
         which is non reset to 0, so interval does not work as expected.
      3. Trying to model this with existing settings requires manipulations
         of all of probability, interval, times, space, task-filter and
         unexposed count and per-task make-it-fail files.
      4. Existing settings are per-failure-type, and the set of failure
         types is potentially expanding.
      5. make-it-fail can't be changed by unprivileged user and aggressive
         stress testing better be done from an unprivileged user.
         Similarly, this would require opening the debugfs files to the
         unprivileged user, as he would need to reopen at least times file
         (not possible to pre-open before dropping privs).
      
      The proposed interface solves all of the above (see the example).
      
      We want to integrate this into syzkaller fuzzer.  A prototype has found
      10 bugs in kernel in first day of usage:
      
        https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance
      
      I've made the current interface work with all types of our sandboxes.
      For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
      make /proc entries non-root owned.  So I am fine with the current
      version of the code.
      
      [akpm@linux-foundation.org: fix build]
      Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com
      
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e41d5818
    • Cyrill Gorcunov's avatar
      procfs: fdinfo: extend information about epoll target files · 77493f04
      Cyrill Gorcunov authored
      Since it is possbile to have same number in tfd field (say file added,
      closed, then nother file dup'ed to same number and added back) it is
      imposible to distinguish such target files solely by their numbers.
      
      Strictly speaking regular applications don't need to recognize these
      targets at all but for checkpoint/restore sake we need to collect
      targets to be able to push them back on restore stage in a proper order.
      
      Thus lets add file position, inode and device number where this target
      lays.  This three fields can be used as a primary key for sorting, and
      together with kcmp help CRIU can find out an exact file target (from the
      whole set of processes being checkpointed).
      
      Link: http://lkml.kernel.org/r/20170424154423.436491881@gmail.com
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77493f04
    • Bharat Bhushan's avatar
      kexec/kdump: minor Documentation updates for arm64 and Image · a711bdc0
      Bharat Bhushan authored
      Minor updates in Documentation for arm64 as relocatable kernel.  Also
      this patch updates documentation for using uncompressed image "Image"
      which is used for ARM64.
      
      Link: http://lkml.kernel.org/r/1495104793-6563-1-git-send-email-Bharat.Bhushan@nxp.com
      
      Signed-off-by: default avatarBharat Bhushan <Bharat.Bhushan@nxp.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a711bdc0
    • SeongJae Park's avatar
      kokr/memory-barriers.txt: Fix obsolete link to atomic_ops.txt · 51e988f4
      SeongJae Park authored
      
      
      Obsolete links to atomic_ops.txt exist in ko_KR/memory-barriers.txt
      though the file has moved to core-api/atomic_ops.rst.  This commit fixes
      the obsolete links.
      Signed-off-by: default avatarSeongJae Park <sj38.park@gmail.com>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      51e988f4
    • SeongJae Park's avatar
      memory-barriers.txt: Fix broken link to atomic_ops.txt · 3afadfd9
      SeongJae Park authored
      
      
      Few obsolete links to atomic_ops.txt exist in memory-barriers.txt though
      the file has moved to core-api/atomic_ops.rst.  This commit fixes the
      obsolete links.
      Signed-off-by: default avatarSeongJae Park <sj38.park@gmail.com>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      3afadfd9
    • Jonathan Corbet's avatar
      docs: Turn off section numbering for the input docs · 6d16e914
      Jonathan Corbet authored
      
      
      The input docs enable section numbering at multiple levels, leading to a
      lot of bright-red "nested numbered toctree" warnings in newer Sphinx
      versions.  Just take that directive out for now to help alleviate the
      global red-pixel shortage.
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      6d16e914