Skip to content
  • Dmitry Vyukov's avatar
    fault-inject: support systematic fault injection · e41d5818
    Dmitry Vyukov authored
    Add /proc/self/task/<current-tid>/fail-nth file that allows failing
    0-th, 1-st, 2-nd and so on calls systematically.
    Excerpt from the added documentation:
    
     "Write to this file of integer N makes N-th call in the current task
      fail (N is 0-based). Read from this file returns a single char 'Y' or
      'N' that says if the fault setup with a previous write to this file
      was injected or not, and disables the fault if it wasn't yet injected.
      Note that this file enables all types of faults (slab, futex, etc).
      This setting takes precedence over all other generic settings like
      probability, interval, times, etc. But per-capability settings (e.g.
      fail_futex/ignore-private) take precedence over it. This feature is
      intended for systematic testing of faults in a single system call. See
      an example below"
    
    Why add a new setting:
    1. Existing settings are global rather than per-task.
       So parallel testing is not possible.
    2. attr->interval is close but it depends on attr->count
       which is non reset to 0, so interval does not work as expected.
    3. Trying to model this with existing settings requires manipulations
       of all of probability, interval, times, space, task-filter and
       unexposed count and per-task make-it-fail files.
    4. Existing settings are per-failure-type, and the set of failure
       types is potentially expanding.
    5. make-it-fail can't be changed by unprivileged user and aggressive
       stress testing better be done from an unprivileged user.
       Similarly, this would require opening the debugfs files to the
       unprivileged user, as he would need to reopen at least times file
       (not possible to pre-open before dropping privs).
    
    The proposed interface solves all of the above (see the example).
    
    We want to integrate this into syzkaller fuzzer.  A prototype has found
    10 bugs in kernel in first day of usage:
    
      https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance
    
    I've made the current interface work with all types of our sandboxes.
    For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
    make /proc entries non-root owned.  So I am fine with the current
    version of the code.
    
    [akpm@linux-foundation.org: fix build]
    Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com
    
    
    Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
    Cc: Akinobu Mita <akinobu.mita@gmail.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e41d5818