1. 05 Jul, 2017 1 commit
  2. 29 Jun, 2017 1 commit
  3. 24 Jun, 2017 1 commit
    • David Miller's avatar
      crypto: Work around deallocated stack frame reference gcc bug on sparc. · b355b899
      David Miller authored
      commit d41519a6
      On sparc, if we have an alloca() like situation, as is the case with
      SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
      memory.  The result can be that the value is clobbered if a trap
      or interrupt arrives at just the right instruction.
      It only occurs if the function ends returning a value from that
      alloca() area and that value can be placed into the return value
      register using a single instruction.
      For example, in lib/libcrc32c.c:crc32c() we end up with a return
      sequence like:
              return  %i7+8
               lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],
      %o5 holds the base of the on-stack area allocated for the shash
      descriptor.  But the return released the stack frame and the
      register window.
      So if an intererupt arrives between 'return' and 'lduw', then
      the value read at %o5+16 can be corrupted.
      Add a data compiler barrier to work around this problem.  This is
      exactly what the gcc fix will end up doing as well, and it absolutely
      should not change the code generated for other cpus (unless gcc
      on them has the same bug :-)
      With crucial insight from Eric Sandeen.
      Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  4. 14 Jun, 2017 1 commit
  5. 14 May, 2017 1 commit
    • Daniel Borkmann's avatar
      bpf, arm64: fix jit branch offset related to ldimm64 · 493d0a7b
      Daniel Borkmann authored
      [ Upstream commit ddc665a4 ]
      When the instruction right before the branch destination is
      a 64 bit load immediate, we currently calculate the wrong
      jump offset in the ctx->offset[] array as we only account
      one instruction slot for the 64 bit load immediate although
      it uses two BPF instructions. Fix it up by setting the offset
      into the right slot after we incremented the index.
      Before (ldimm64 test 1):
        00000020:  52800007  mov w7, #0x0 // #0
        00000024:  d2800060  mov x0, #0x3 // #3
        00000028:  d2800041  mov x1, #0x2 // #2
        0000002c:  eb01001f  cmp x0, x1
        00000030:  54ffff82  b.cs 0x00000020
        00000034:  d29fffe7  mov x7, #0xffff // #65535
        00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
        0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
        00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
        00000044:  d29dddc7  mov x7, #0xeeee // #61166
        00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
        0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
        00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
      After (ldimm64 test 1):
        00000020:  52800007  mov w7, #0x0 // #0
        00000024:  d2800060  mov x0, #0x3 // #3
        00000028:  d2800041  mov x1, #0x2 // #2
        0000002c:  eb01001f  cmp x0, x1
        00000030:  540000a2  b.cs 0x00000044
        00000034:  d29fffe7  mov x7, #0xffff // #65535
        00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
        0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
        00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
        00000044:  d29dddc7  mov x7, #0xeeee // #61166
        00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
        0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
        00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
      Also, add a couple of test cases to make sure JITs pass
      this test. Tested on Cavium ThunderX ARMv8. The added
      test cases all pass after the fix.
      Fixes: 8eee539d
       ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
      Reported-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Xi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  6. 21 Apr, 2017 1 commit
  7. 08 Apr, 2017 1 commit
  8. 26 Jan, 2017 2 commits
  9. 19 Jan, 2017 1 commit
  10. 09 Dec, 2016 1 commit
  11. 08 Dec, 2016 1 commit
  12. 01 Dec, 2016 2 commits
  13. 25 Nov, 2016 2 commits
    • Michael Ellerman's avatar
      locking/selftest: Fix output since KERN_CONT changes · 25139409
      Michael Ellerman authored
      Since the KERN_CONT changes the locking-selftest output is messed up, eg:
                                         | spin |wlock |rlock |mutex | wsem | rsem |
                             A-A deadlock:
          ok  |
          ok  |
          ok  |
          ok  |
          ok  |
          ok  |
      Use pr_cont() to get it looking normal again:
                                         | spin |wlock |rlock |mutex | wsem | rsem |
                             A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
      Reported-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead...
    • Andrey Ryabinin's avatar
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · f5527fff
      Andrey Ryabinin authored
      This fixes CVE-2016-8650.
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      This affects the use of RSA keys and X.509 certificates that carry them.
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 0001...
  14. 18 Nov, 2016 1 commit
  15. 17 Nov, 2016 1 commit
    • Abhi Das's avatar
      fix iov_iter_advance() for ITER_PIPE · 680bb946
      Abhi Das authored
      iov_iter_advance() needs to decrement iter->count by the number of
      bytes we'd moved beyond.  Normal flavours do that, but ITER_PIPE
      doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files
      ends up with a bogus fallback to page cache read, resulting in incorrect
      values for file offset and bytes read.
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  16. 11 Nov, 2016 1 commit
  17. 28 Oct, 2016 3 commits
    • Daniel Mentz's avatar
      lib/genalloc.c: start search from start of chunk · 62e931fa
      Daniel Mentz authored
      gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
      a contiguous block of memory that satisfies the allocation request.
      The shortcut
      	if (size > atomic_read(&chunk->avail))
      makes the loop skip over chunks that do not have enough bytes left to
      fulfill the request.  There are two situations, though, where an
      allocation might still fail:
      (1) The available memory is not contiguous, i.e.  the request cannot
          be fulfilled due to external fragmentation.
      (2) A race condition.  Another thread runs the same code concurrently
          and is quicker to grab the available memory.
      In those situations, the loop calls pool->algo() to search the entire
      chunk, and pool->algo() returns some value that is >= end_bit to
      indicate that the search failed.  This return value is then assigned to
      start_bit.  The variables start_bit and end_bit describe the range that
      should be searched, and this range should be reset for every chunk that
      is search...
    • Dmitry Vyukov's avatar
      lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB · 02754e0a
      Dmitry Vyukov authored
      KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
      Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
      second level).  Size of each stack is (num_frames + 3) * sizeof(long).
      Which gives us ~84K stacks.  This capacity was chosen empirically and it
      is enough to run kernel normally.
      However, when lots of configs are enabled and a fuzzer tries to maximize
      code coverage, it easily hits the limit within tens of minutes.  I've
      tested for long a time with number of top level entries bumped 4x
      (4096).  And I think I've seen overflow only once.  But I don't have all
      configs enabled and code coverage has not reached maximum yet.  So bump
      it 8x to 8192.
      Since we have two-level table, memory cost of this is very moderate --
      currently the top-level table is 8KB, with this patch it is 64KB, which
      is negligible under KASAN.
      Here is some approx math.
      128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
      I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
      kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
      alloc/free call sites are reachable with only one stack.  But some
      utility functions can have large fanout.  Assuming average fanout is 5x,
      total number of alloc/free stacks is ~300K.
      Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Baozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Kees Cook's avatar
      latent_entropy: raise CONFIG_FRAME_WARN by default · 0e07f663
      Kees Cook authored
      When building with the latent_entropy plugin, set the default
      CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
      blocks that, when instrumented by the latent_entropy plugin, grow beyond
      1024 byte stack size on 32-bit builds.
      Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  18. 20 Oct, 2016 1 commit
    • Daniel Borkmann's avatar
      bpf, test: fix ld_abs + vlan push/pop stress test · 0d906b1e
      Daniel Borkmann authored
      After commit 636c2628 ("net: skbuff: Remove errornous length
      validation in skb_vlan_pop()") mentioned test case stopped working,
      throwing a -12 (ENOMEM) return code. The issue however is not due to
      636c2628, but rather due to a buggy test case that got uncovered
      from the change in behaviour in 636c2628.
      The data_size of that test case for the skb was set to 1. In the
      bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
      loop with: reading skb data, pushing 68 tags, reading skb data,
      popping 68 tags, reading skb data, etc, in order to force a skb
      expansion and thus trigger that JITs recache skb->data. Problem is
      that initial data_size is too small.
      While before 636c2628, the test silently bailed out due to the
      skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
      error from failing skb_ensure_writable(). Set at least minimum of
      ETH_HLEN as an initial length so that on first push of data, equivalent
      pop will su...
  19. 15 Oct, 2016 1 commit
  20. 11 Oct, 2016 5 commits
  21. 10 Oct, 2016 2 commits
  22. 08 Oct, 2016 4 commits
    • Chris Metcalf's avatar
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf authored
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Chris Metcalf's avatar
      nmi_backtrace: do a local dump_stack() instead of a self-NMI · 67766489
      Chris Metcalf authored
      Currently on arm there is code that checks whether it should call
      dump_stack() explicitly, to avoid trying to raise an NMI when the
      current context is not preemptible by the backtrace IPI.  Similarly, the
      forthcoming arch/tile support uses an IPI mechanism that does not
      support generating an NMI to self.
      Accordingly, move the code that guards this case into the generic
      mechanism, and invoke it unconditionally whenever we want a backtrace of
      the current cpu.  It seems plausible that in all cases, dump_stack()
      will generate better information than generating a stack from the NMI
      handler.  The register state will be missing, but that state is likely
      not particularly helpful in any case.
      Or, if we think it is helpful, we should be capturing and emitting the
      current register state in all cases when regs == NULL is passed to
      Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.com
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Acked-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Chris Metcalf's avatar
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf authored
      Patch series "improvements to the nmi_backtrace code" v9.
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      This patch (of 4):
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Vineet Gupta's avatar
      atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE · 51a02124
      Vineet Gupta authored
      This came to light when implementing native 64-bit atomics for ARCv2.
      The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
      to check whether atomic64_dec_if_positive() is available.  It seems it
      was needed when not every arch defined it.  However as of current code
      the Kconfig option seems needless
       - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
         generic definition of API is present lib/atomic64.c
       - arches with native 64-bit atomics select it in arch/*/Kconfig and
         define the API in their headers
      So I see no point in keeping the Kconfig option
      Compile tested for:
       - blackfin (CONFIG_GENERIC_ATOMIC64)
       - x86 (!CONFIG_GENERIC_ATOMIC64)
       - ia64
      Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ming Lin <ming.l@ssi.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  23. 05 Oct, 2016 3 commits
    • Miklos Szeredi's avatar
      pipe: add pipe_buf_release() helper · a779638c
      Miklos Szeredi authored
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Al Viro's avatar
      new iov_iter flavour: pipe-backed · 241699cd
      Al Viro authored
      iov_iter variant for passing data into pipe.  copy_to_iter()
      copies data into page(s) it has allocated and stuffs them into
      the pipe; copy_page_to_iter() stuffs there a reference to the
      page given to it.  Both will try to coalesce if possible.
      iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
      and friends will do as copy_to_iter() would have and return the
      pages where the data would've been copied.  iov_iter_advance()
      will truncate everything past the spot it has advanced to.
      New primitive: iov_iter_pipe(), used for initializing those.
      pipe should be locked all along.
      Running out of space acts as fault would for iovec-backed ones;
      in other words, giving it to ->read_iter() may result in short
      read if the pipe overflows, or -EFAULT if it happens with nothing
      copied there.
      In other words, ->read_iter() on those acts pretty much like
      ->splice_read().  Moreover, all generic_file_splice_read() users,
      as well as many other ->splice_read() instances can be switched
      to that scheme - that'll happen in the next commit.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Johannes Weiner's avatar
      mm: filemap: don't plant shadow entries without radix tree node · d3798ae8
      Johannes Weiner authored
      When the underflow checks were added to workingset_node_shadow_dec(),
      they triggered immediately:
        kernel BUG at ./include/linux/swap.h:276!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
         soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
        CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60b #1
        Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
        task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
        RIP: page_cache_tree_insert+0xf1/0x100
        Call Trace:
        Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
        RIP  page_cache_tree_insert+0xf1/0x100
      This is a long-standing bug in the way shadow entries are accounted in
      the radix tree nodes. The shrinker needs to know when radix tree nodes
      contain only shadow entries, no pages, so node->count is split in half
      to count shadows in the upper bits and pages in the lower bits.
      Unfortunately, the radix tree implementation doesn't know of this and
      assumes all entries are in node->count. When there is a shadow entry
      directly in root->rnode and the tree is later extended, the radix tree
      implementation will copy that entry into the new node and and bump its
      node->count, i.e. increases the page count bits. Once the shadow gets
      removed and we subtract from the upper counter, node->count underflows
      and triggers the warning. Afterwards, without node->count reaching 0
      again, the radix tree node is leaked.
      Limit shadow entries to when we have actual radix tree nodes and can
      count them properly. That means we lose the ability to detect refaults
      from files that had only the first page faulted in at eviction time.
      Fixes: 449dd698
       ("mm: keep page cache radix tree nodes in check")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-and-tested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  24. 30 Sep, 2016 1 commit
  25. 29 Sep, 2016 1 commit