1. 01 Nov, 2018 6 commits
    • Philippe Gerum's avatar
      mm: ipipe: disable ondemand memory · 8ade150c
      Philippe Gerum authored
      Co-kernels cannot bear with the extra latency caused by memory access
      faults involved in COW or
      overcommit. __ipipe_disable_ondemand_mappings() force commits all
      common memory mappings with physical RAM.
      In addition, the architecture code is given a chance to pre-load page
      table entries for ioremap and vmalloc memory, for preventing further
      minor faults accessing such memory due to PTE misses (if that ever
      makes sense for them).
      Revisit: Further COW breaking in copy_user_page() and copy_pte_range()
      may be useless once __ipipe_disable_ondemand_mappings() has run for a
      co-kernel task, since all of its mappings have been populated, and
      unCOWed if applicable.
    • Philippe Gerum's avatar
      dump_stack: ipipe: make dump_stack() domain-aware · 9ac7f531
      Philippe Gerum authored
      When dumping a stack backtrace, we neither need nor want to disable
      root stage IRQs over the head stage, where CPU migration can't
      Conversely, we neither need nor want to disable hard IRQs from the
      head stage, so that latency won't skyrocket either.
    • Philippe Gerum's avatar
      lib/smp_processor_id: ipipe: exclude head domain from preemption check · e8a89820
      Philippe Gerum authored
      There can be no CPU migration from the head stage, however the
      out-of-band code currently running smp_processor_id() might have
      preempted the regular kernel code from within a preemptible section,
      which might cause false positive in the end.
      These are the two reasons why we certainly neither need nor want to do
      the preemption check in that case.
    • Philippe Gerum's avatar
      atomic: ipipe: keep atomic when pipelining IRQs · 76498343
      Philippe Gerum authored
      Because of the virtualization of interrupt masking for the regular
      kernel code when the pipeline is enabled, atomic helpers relying on
      common interrupt disabling helpers such as local_irq_save/restore
      pairs would not be atomic anymore, leading to data corruption.
      This commit restores true atomicity for the atomic helpers that would
      be otherwise affected by interrupt virtualization.
    • Philippe Gerum's avatar
      locking: ipipe: add hard lock alternative to regular spinlocks · 7c28f350
      Philippe Gerum authored
      Hard spinlocks manipulate the CPU interrupt mask, without affecting
      the kernel preemption state in locking/unlocking operations.
      This type of spinlock is useful for implementing a critical section to
      serialize concurrent accesses from both in-band and out-of-band
      contexts, i.e. from root and head stages.
      Hard spinlocks exclusively depend on the pre-existing arch-specific
      bits which implement regular spinlocks. They can be seen as basic
      spinlocks still affecting the CPU's interrupt state when all other
      spinlock types only deal with the virtual interrupt flag managed by
      the pipeline core - i.e. only disable interrupts for the regular
      in-band kernel activity.
    • Philippe Gerum's avatar
      genirq: add generic I-pipe core · d9f057db
      Philippe Gerum authored
      This commit provides the arch-independent bits for implementing the
      interrupt pipeline core, a lightweight layer introducing a separate,
      high-priority execution stage for handling all IRQs in pseudo-NMI
      mode, which cannot be delayed by the regular kernel code. See
      Documentation/ipipe.rst for details about interrupt pipelining.
      Architectures which support interrupt pipelining should select
      HAVE_IPIPE_SUPPORT, along with implementing the required arch-specific
      code. In such a case, CONFIG_IPIPE becomes available to the user via
      the Kconfig interface for enabling the feature.
  2. 04 Oct, 2018 1 commit
    • Bart Van Assche's avatar
      scsi: klist: Make it safe to use klists in atomic context · 1390c37d
      Bart Van Assche authored
      [ Upstream commit 624fa779 ]
      In the scsi_transport_srp implementation it cannot be avoided to
      iterate over a klist from atomic context when using the legacy block
      layer instead of blk-mq. Hence this patch that makes it safe to use
      klists in atomic context. This patch avoids that lockdep reports the
      WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
       Possible interrupt unsafe locking scenario:
             CPU0                    CPU1
             ----                    ----
      stack backtrace:
      Workqueue: kblockd blk_timeout_work
      Call Trace:
       srp_timed_out+0xaf/0x1d0 [scsi_transport_srp]
       scsi_times_out+0xd4/0x410 [scsi_mod]
      See also commit c9ddf734 ("scsi: scsi_transport_srp: Fix shost to
      rport translation").
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: James Bottomley <jejb@linux.vnet.ibm.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  3. 19 Sep, 2018 1 commit
  4. 15 Sep, 2018 1 commit
  5. 05 Sep, 2018 1 commit
    • Petr Mladek's avatar
      printk/nmi: Prevent deadlock when accessing the main log buffer in NMI · cd71265a
      Petr Mladek authored
      commit 03fc7f9c upstream.
      The commit 719f6a70 ("printk: Use the main logbuf in NMI
      when logbuf_lock is available") brought back the possible deadlocks
      in printk() and NMI.
      The check of logbuf_lock is done only in printk_nmi_enter() to prevent
      mixed output. But another CPU might take the lock later, enter NMI, and:
            + Both NMIs might be serialized by yet another lock, for example,
      	the one in nmi_cpu_backtrace().
            + The other CPU might get stopped in NMI, see smp_send_stop()
      	in panic().
      The only safe solution is to use trylock when storing the message
      into the main log-buffer. It might cause reordering when some lines
      go to the main lock buffer directly and others are delayed via
      the per-CPU buffer. It means that it is not useful in general.
      This patch replaces the problematic NMI deferred context with NMI
      direct context. It can be used to mark a code that might produce
      many messages in NMI and the risk of losing them is more critical
      than problems with eventual reordering.
      The context is then used when dumping trace buffers on oops. It was
      the primary motivation for the original fix. Also the reordering is
      even smaller issue there because some traces have their own time stamps.
      Finally, nmi_cpu_backtrace() need not longer be serialized because
      it will always us the per-CPU buffers again.
      Fixes: 719f6a70 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  6. 17 Aug, 2018 1 commit
    • Chintan Pandya's avatar
      ioremap: Update pgtable free interfaces with addr · a3480696
      Chintan Pandya authored
      commit 785a19f9 upstream.
      The following kernel panic was observed on ARM64 platform due to a stale
      TLB entry.
       1. ioremap with 4K size, a valid pte page table is set.
       2. iounmap it, its pte entry is set to 0.
       3. ioremap the same address with 2M size, update its pmd entry with
          a new value.
       4. CPU may hit an exception because the old pmd entry is still in TLB,
          which leads to a kernel panic.
      Commit b6bdb751 ("mm/vmalloc: add interfaces to free unmapped page
      table") has addressed this panic by falling to pte mappings in the above
      case on ARM64.
      To support pmd mappings in all cases, TLB purge needs to be performed
      in this case on ARM64.
      Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
      so that TLB purge can be added later in seprate patches.
      [toshi.kani@hpe.com: merge changes, rewrite patch description]
      Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces")
      Signed-off-by: default avatarChintan Pandya <cpandya@codeaurora.org>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: mhocko@suse.com
      Cc: akpm@linux-foundation.org
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  7. 25 Jul, 2018 1 commit
  8. 03 Jul, 2018 1 commit
    • Geert Uytterhoeven's avatar
      lib/vsprintf: Remove atomic-unsafe support for %pCr · ea0ac01f
      Geert Uytterhoeven authored
      commit 666902e4 upstream.
      "%pCr" formats the current rate of a clock, and calls clk_get_rate().
      The latter obtains a mutex, hence it must not be called from atomic
      Remove support for this rarely-used format, as vsprintf() (and e.g.
      printk()) must be callable from any context.
      Any remaining out-of-tree users will start seeing the clock's name
      printed instead of its rate.
      Reported-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Fixes: 900cca29 ("lib/vsprintf: add %pC{,n,r} format specifiers for clocks")
      Link: http://lkml.kernel.org/r/1527845302-12159-5-git-send-email-geert+renesas@glider.be
      To: Jia-Ju Bai <baijiaju1990@gmail.com>
      To: Jonathan Corbet <corbet@lwn.net>
      To: Michael Turquette <mturquette@baylibre.com>
      To: Stephen Boyd <sboyd@kernel.org>
      To: Zhang Rui <rui.zhang@intel.com>
      To: Eduardo Valentin <edubezval@gmail.com>
      To: Eric Anholt <eric@anholt.net>
      To: Stefan Wahren <stefan.wahren@i2se.com>
      To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-clk@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: linux-serial@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-renesas-soc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  9. 30 May, 2018 2 commits
  10. 22 May, 2018 2 commits
    • Ross Zwisler's avatar
      radix tree: fix multi-order iteration race · 572e2385
      Ross Zwisler authored
      commit 9f418224 upstream.
      Fix a race in the multi-order iteration code which causes the kernel to
      hit a GP fault.  This was first seen with a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
      order 9 PMD DAX entries.
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember for example that
      an order 2 entry looks like this:
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      When we delete 'entry' from the tree, we call :
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      This lets you find the first sibling, and you skip them all in order.
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      We fix this race by fixing the way that skip_siblings() detects sibling
      nodes.  Instead of testing against the preceding slot we instead look
      for siblings via is_sibling_entry() which compares against the position
      of the struct radix_tree_node.slots[] array.  This ensures that sibling
      entries are properly identified, even if they are no longer contiguous
      with the 'entry' they point to.
      Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
      Fixes: 148deab2 ("radix-tree: improve multiorder iterators")
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarCR, Sapthagirish <sapthagirish.cr@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Matthew Wilcox's avatar
      lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly · f6c0f020
      Matthew Wilcox authored
      commit 1e3054b9 upstream.
      I had neglected to increment the error counter when the tests failed,
      which made the tests noisy when they fail, but not actually return an
      error code.
      Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au
      Fixes: 3cc78125 ("lib/test_bitmap.c: add optimisation tests")
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  11. 09 May, 2018 1 commit
    • Matthew Wilcox's avatar
      errseq: Always report a writeback error once · 0799a0ea
      Matthew Wilcox authored
      commit b4678df1 upstream.
      The errseq_t infrastructure assumes that errors which occurred before
      the file descriptor was opened are of no interest to the application.
      This turns out to be a regression for some applications, notably Postgres.
      Before errseq_t, a writeback error would be reported exactly once (as
      long as the inode remained in memory), so Postgres could open a file,
      call fsync() and find out whether there had been a writeback error on
      that file from another process.
      This patch changes the errseq infrastructure to report errors to all
      file descriptors which are opened after the error occurred, but before
      it was reported to any file descriptor.  This restores the user-visible
      Cc: stable@vger.kernel.org
      Fixes: 5660e13d ("fs: new infrastructure for writeback error handling and reporting")
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  12. 01 May, 2018 1 commit
    • Dmitry Vyukov's avatar
      kobject: don't use WARN for registration failures · a5f42767
      Dmitry Vyukov authored
      commit 3e14c6ab upstream.
      This WARNING proved to be noisy. The function still returns an error
      and callers should handle it. That's how most of kernel code works.
      Downgrade the WARNING to pr_err() and leave WARNINGs for kernel bugs.
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+209c0f67f99fec8eb14b@syzkaller.appspotmail.com
      Reported-by: syzbot+7fb6d9525a4528104e05@syzkaller.appspotmail.com
      Reported-by: syzbot+2e63711063e2d8f9ea27@syzkaller.appspotmail.com
      Reported-by: syzbot+de73361ee4971b6e6f75@syzkaller.appspotmail.com
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  13. 26 Apr, 2018 1 commit
    • Yonghong Song's avatar
      bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y · 3e01c16d
      Yonghong Song authored
      [ Upstream commit 09584b40 ]
      With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file,
      tools/testing/selftests/bpf/test_kmod.sh failed like below:
        [root@localhost bpf]# ./test_kmod.sh
        sysctl: setting key "net.core.bpf_jit_enable": Invalid argument
        [ JIT enabled:0 hardened:0 ]
        [  132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:0 ]
        [  133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:1 ]
        [  134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:2 ]
        [  136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [root@localhost bpf]#
      The test_kmod.sh load/remove test_bpf.ko multiple times with different
      settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297
      of test_bpf.ko is designed such that JIT always fails.
      Commit 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      introduced the following tightening logic:
              if (!bpf_prog_is_dev_bound(fp->aux)) {
                      fp = bpf_int_jit_compile(fp);
          #ifdef CONFIG_BPF_JIT_ALWAYS_ON
                      if (!fp->jited) {
                              *err = -ENOTSUPP;
                              return fp;
      With this logic, Test #297 always gets return value -ENOTSUPP
      when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure.
      This patch fixed the failure by marking Test #297 as expected failure
      when CONFIG_BPF_JIT_ALWAYS_ON is defined.
      Fixes: 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  14. 19 Apr, 2018 1 commit
  15. 31 Mar, 2018 1 commit
  16. 28 Mar, 2018 1 commit
    • Toshi Kani's avatar
      mm/vmalloc: add interfaces to free unmapped page table · acdb4981
      Toshi Kani authored
      commit b6bdb751 upstream.
      On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
      create pud/pmd mappings.  A kernel panic was observed on arm64 systems
      with Cortex-A75 in the following steps as described by Hanjun Guo.
       1. ioremap a 4K size, valid page table will build,
       2. iounmap it, pte0 will set to 0;
       3. ioremap the same address with 2M size, pgd/pmd is unchanged,
          then set the a new value for pmd;
       4. pte0 is leaked;
       5. CPU may meet exception because the old pmd is still in TLB,
          which will lead to kernel panic.
      This panic is not reproducible on x86.  INVLPG, called from iounmap,
      purges all levels of entries associated with purged address on x86.  x86
      still has memory leak.
      The patch changes the ioremap path to free unmapped page table(s) since
      doing so in the unmap path has the following issues:
       - The iounmap() path is shared with vunmap(). Since vmap() only
         supports pte mappings, making vunmap() to free a pte page is an
         overhead for regular vmap users as they do not need a pte page freed
       - Checking if all entries in a pte page are cleared in the unmap path
         is racy, and serializing this check is expensive.
       - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
         Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
      Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
      clear a given pud/pmd entry and free up a page for the lower level
      This patch implements their stub functions on x86 and arm64, which work
      as workaround.
      [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
      Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
      Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
      Reported-by: default avatarLei Li <lious.lilei@hisilicon.com>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  17. 19 Mar, 2018 1 commit
  18. 15 Mar, 2018 1 commit
    • Kees Cook's avatar
      lib/bug.c: exclude non-BUG/WARN exceptions from report_bug() · d50cb5ce
      Kees Cook authored
      commit 1b4cfe3c upstream.
      Commit b8347c21 ("x86/debug: Handle warnings before the notifier
      chain, to fix KGDB crash") changed the ordering of fixups, and did not
      take into account the case of x86 processing non-WARN() and non-BUG()
      exceptions.  This would lead to output of a false BUG line with no other
      In the case of a refcount exception, it would be immediately followed by
      the refcount WARN(), producing very strange double-"cut here":
        lkdtm: attempting bad refcount_inc() overflow
        ------------[ cut here ]------------
        Kernel BUG at 0000000065f29de5 [verbose debug info unavailable]
        ------------[ cut here ]------------
        refcount_t overflow at lkdtm_REFCOUNT_INC_OVERFLOW+0x6b/0x90 in cat[3065], uid/euid: 0/0
        WARNING: CPU: 0 PID: 3065 at kernel/panic.c:657 refcount_error_report+0x9a/0xa4
      In the prior ordering, exceptions were searched first:
         do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
                      if (fixup_exception(regs, trapnr))
                              return 0;
        -               if (fixup_bug(regs, trapnr))
        -                       return 0;
      As a result, fixup_bugs()'s is_valid_bugaddr() didn't take into account
      needing to search the exception list first, since that had already
      So, instead of searching the exception list twice (once in
      is_valid_bugaddr() and then again in fixup_exception()), just add a
      simple sanity check to report_bug() that will immediately bail out if a
      BUG() (or WARN()) entry is not found.
      Link: http://lkml.kernel.org/r/20180301225934.GA34350@beast
      Fixes: b8347c21 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  19. 03 Mar, 2018 1 commit
    • James Hogan's avatar
      lib/mpi: Fix umul_ppmm() for MIPS64r6 · 22d5e20c
      James Hogan authored
      [ Upstream commit bbc25bee ]
      Current MIPS64r6 toolchains aren't able to generate efficient
      DMULU/DMUHU based code for the C implementation of umul_ppmm(), which
      performs an unsigned 64 x 64 bit multiply and returns the upper and
      lower 64-bit halves of the 128-bit result. Instead it widens the 64-bit
      inputs to 128-bits and emits a __multi3 intrinsic call to perform a 128
      x 128 multiply. This is both inefficient, and it results in a link error
      since we don't include __multi3 in MIPS linux.
      For example commit 90a53e44 ("cfg80211: implement regdb signature
      checking") merged in v4.15-rc1 recently broke the 64r6_defconfig and
      64r6el_defconfig builds by indirectly selecting MPILIB. The same build
      errors can be reproduced on older kernels by enabling e.g. CRYPTO_RSA:
      lib/mpi/generic_mpih-mul1.o: In function `mpihelp_mul_1':
      lib/mpi/generic_mpih-mul1.c:50: undefined reference to `__multi3'
      lib/mpi/generic_mpih-mul2.o: In function `mpihelp_addmul_1':
      lib/mpi/generic_mpih-mul2.c:49: undefined reference to `__multi3'
      lib/mpi/generic_mpih-mul3.o: In function `mpihelp_submul_1':
      lib/mpi/generic_mpih-mul3.c:49: undefined reference to `__multi3'
      lib/mpi/mpih-div.o In function `mpihelp_divrem':
      lib/mpi/mpih-div.c:205: undefined reference to `__multi3'
      lib/mpi/mpih-div.c:142: undefined reference to `__multi3'
      Therefore add an efficient MIPS64r6 implementation of umul_ppmm() using
      inline assembly and the DMULU/DMUHU instructions, to prevent __multi3
      calls being emitted.
      Fixes: 7fd08ca5 ("MIPS: Add build support for the MIPS R6 ISA")
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-mips@linux-mips.org
      Cc: linux-crypto@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  20. 25 Feb, 2018 1 commit
  21. 22 Feb, 2018 2 commits
  22. 16 Feb, 2018 3 commits
    • Andrey Ryabinin's avatar
      lib/ubsan: add type mismatch handler for new GCC/Clang · 2617e62c
      Andrey Ryabinin authored
      commit 42440c1f upstream.
      UBSAN=y fails to build with new GCC/clang:
          arch/x86/kernel/head64.o: In function `sanitize_boot_params':
          arch/x86/include/asm/bootparam_utils.h:37: undefined reference to `__ubsan_handle_type_mismatch_v1'
      because Clang and GCC 8 slightly changed ABI for 'type mismatch' errors.
      Compiler now uses new __ubsan_handle_type_mismatch_v1() function with
      slightly modified 'struct type_mismatch_data'.
      Let's add new 'struct type_mismatch_data_common' which is independent from
      compiler's layout of 'struct type_mismatch_data'.  And make
      __ubsan_handle_type_mismatch[_v1]() functions transform compiler-dependent
      type mismatch data to our internal representation.  This way, we can
      support both old and new compilers with minimal amount of change.
      Link: http://lkml.kernel.org/r/20180119152853.16806-1-aryabinin@virtuozzo.comSigned-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarSodagudi Prasad <psodagud@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Andrew Morton's avatar
      lib/ubsan.c: s/missaligned/misaligned/ · 5a5df777
      Andrew Morton authored
      commit b8fe1120 upstream.
      A vist from the spelling fairy.
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Arnd Bergmann's avatar
      kasan: rework Kconfig settings · 062cd346
      Arnd Bergmann authored
      commit e7c52b84 upstream.
      We get a lot of very large stack frames using gcc-7.0.1 with the default
      -fsanitize-address-use-after-scope --param asan-stack=1 options, which can
      easily cause an overflow of the kernel stack, e.g.
        drivers/gpu/drm/i915/gvt/handlers.c:2434:1: warning: the frame size of 46176 bytes is larger than 3072 bytes
        drivers/net/wireless/ralink/rt2x00/rt2800lib.c:5650:1: warning: the frame size of 23632 bytes is larger than 3072 bytes
        lib/atomic64_test.c:250:1: warning: the frame size of 11200 bytes is larger than 3072 bytes
        drivers/gpu/drm/i915/gvt/handlers.c:2621:1: warning: the frame size of 9208 bytes is larger than 3072 bytes
        drivers/media/dvb-frontends/stv090x.c:3431:1: warning: the frame size of 6816 bytes is larger than 3072 bytes
        fs/fscache/stats.c:287:1: warning: the frame size of 6536 bytes is larger than 3072 bytes
      To reduce this risk, -fsanitize-address-use-after-scope is now split out
      into a separate CONFIG_KASAN_EXTRA Kconfig option, leading to stack
      frames that are smaller than 2 kilobytes most of the time on x86_64.  An
      earlier version of this patch also prevented combining KASAN_EXTRA with
      KASAN_INLINE, but that is no longer necessary with gcc-7.0.1.
      All patches to get the frame size below 2048 bytes with CONFIG_KASAN=y
      and CONFIG_KASAN_EXTRA=n have been merged by maintainers now, so we can
      bring back that default now.  KASAN_EXTRA=y still causes lots of
      warnings but now defaults to !COMPILE_TEST to disable it in
      allmodconfig, and it remains disabled in all other defconfigs since it
      is a new option.  I arbitrarily raise the warning limit for KASAN_EXTRA
      to 3072 to reduce the noise, but an allmodconfig kernel still has around
      50 warnings on gcc-7.
      I experimented a bit more with smaller stack frames and have another
      follow-up series that reduces the warning limit for 64-bit architectures
      to 1280 bytes (without CONFIG_KASAN).
      With earlier versions of this patch series, I also had patches to address
      the warnings we get with KASAN and/or KASAN_EXTRA, using a
      "noinline_if_stackbloat" annotation.
      That annotation now got replaced with a gcc-8 bugfix (see
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715) and a workaround for
      older compilers, which means that KASAN_EXTRA is now just as bad as
      before and will lead to an instant stack overflow in a few extreme
      This reverts parts of commit 3f181b4d ("lib/Kconfig.debug: disable
      -Wframe-larger-than warnings with KASAN=y").  Two patches in linux-next
      should be merged first to avoid introducing warnings in an allmodconfig
        3cd890db ("media: dvb-frontends: fix i2c access helpers for KASAN")
        16c3ada8 ("media: r820t: fix r820t_write_reg for KASAN")
      Do we really need to backport this?
      I think we do: without this patch, enabling KASAN will lead to
      unavoidable kernel stack overflow in certain device drivers when built
      with gcc-7 or higher on linux-4.10+ or any version that contains a
      backport of commit c5caf21a.  Most people are probably still on
      older compilers, but it will get worse over time as they upgrade their
      The warnings we get on kernels older than this should all be for code
      that uses dangerously large stack frames, though most of them do not
      cause an actual stack overflow by themselves.The asan-stack option was
      added in linux-4.0, and commit 3f181b4d ("lib/Kconfig.debug:
      disable -Wframe-larger-than warnings with KASAN=y") effectively turned
      off the warning for allmodconfig kernels, so I would like to see this
      fix backported to any kernels later than 4.0.
      I have done dozens of fixes for individual functions with stack frames
      larger than 2048 bytes with asan-stack, and I plan to make sure that
      all those fixes make it into the stable kernels as well (most are
      already there).
      Part of the complication here is that asan-stack (from 4.0) was
      originally assumed to always require much larger stacks, but that
      turned out to be a combination of multiple gcc bugs that we have now
      worked around and fixed, but sanitize-address-use-after-scope (from
      v4.10) has a much higher inherent stack usage and also suffers from at
      least three other problems that we have analyzed but not yet fixed
      upstream, each of them makes the stack usage more severe than it should
      Link: http://lkml.kernel.org/r/20171221134744.2295529-1-arnd@arndb.deSigned-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  23. 03 Feb, 2018 1 commit
  24. 31 Jan, 2018 1 commit
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_JIT_ALWAYS_ON config · 6fde36d5
      Alexei Starovoitov authored
      [ upstream commit 290af866 ]
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      - move __bpf_prog_ret0 under ifdef (Daniel)
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  25. 25 Dec, 2017 1 commit
  26. 14 Dec, 2017 4 commits
  27. 30 Nov, 2017 1 commit
    • Eric Biggers's avatar
      lib/mpi: call cond_resched() from mpi_powm() loop · ce922b7b
      Eric Biggers authored
      commit 1d9ddde1 upstream.
      On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
      largest permitted inputs (16384 bits), the kernel spends 10+ seconds
      doing modular exponentiation in mpi_powm() without rescheduling.  If all
      threads do it, it locks up the system.  Moreover, it can cause
      rcu_sched-stall warnings.
      Notwithstanding the insanity of doing this calculation in kernel mode
      rather than in userspace, fix it by calling cond_resched() as each bit
      from the exponent is processed.  It's still noninterruptible, but at
      least it's preemptible now.
      Do the cond_resched() once per bit rather than once per MPI limb because
      each limb might still easily take 100+ milliseconds on slow CPUs.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>