1. 17 Dec, 2018 4 commits
    • Petr Mladek's avatar
      printk: Hide console waiter logic into helpers · ef433725
      Petr Mladek authored
      [ Upstream commit c162d5b4 ]
      
      The commit ("printk: Add console owner and waiter logic to load balance
      console writes") made vprintk_emit() and console_unlock() even more
      complicated.
      
      This patch extracts the new code into 3 helper functions. They should
      help to keep it rather self-contained. It will be easier to use and
      maintain.
      
      This patch just shuffles the existing code. It does not change
      the functionality.
      
      Link: http://lkml.kernel.org/r/20180112160837.GD24497@linux.suse
      
      
      Cc: akpm@linux-foundation.org
      Cc: linux-mm@kvack.org
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: rostedt@home.goodmis.org
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ef433725
    • Steven Rostedt (VMware)'s avatar
      printk: Add console owner and waiter logic to load balance console writes · 59423114
      Steven Rostedt (VMware) authored
      [ Upstream commit dbdda842 ]
      
      This patch implements what I discussed in Kernel Summit. I added
      lockdep annotation (hopefully correctly), and it hasn't had any splats
      (since I fixed some bugs in the first iterations). It did catch
      problems when I had the owner covering too much. But now that the owner
      is only set when actively calling the consoles, lockdep has stayed
      quiet.
      
      Here's the design again:
      
      I added a "console_owner" which is set to a task that is actively
      writing to the consoles. It is *not* the same as the owner of the
      console_lock. It is only set when doing the calls to the console
      functions. It is protected by a console_owner_lock which is a raw spin
      lock.
      
      There is a console_waiter. This is set when there is an active console
      owner that is not current, and waiter is not set. This too is protected
      by console_owner_lock.
      
      In printk() when it tries to write to the consoles, we have:
      
      	if (console_trylock())
      		console_unlock();
      
      Now I added an else, which will check if there is an active owner, and
      no current waiter. If that is the case, then console_waiter is set, and
      the task goes into a spin until it is no longer set.
      
      When the active console owner finishes writing the current message to
      the consoles, it grabs the console_owner_lock and sees if there is a
      waiter, and clears console_owner.
      
      If there is a waiter, then it breaks out of the loop, clears the waiter
      flag (because that will release the waiter from its spin), and exits.
      Note, it does *not* release the console semaphore. Because it is a
      semaphore, there is no owner. Another task may release it. This means
      that the waiter is guaranteed to be the new console owner! Which it
      becomes.
      
      Then the waiter calls console_unlock() and continues to write to the
      consoles.
      
      If another task comes along and does a printk() it too can become the
      new waiter, and we wash rinse and repeat!
      
      By Petr Mladek about possible new deadlocks:
      
      The thing is that we move console_sem only to printk() call
      that normally calls console_unlock() as well. It means that
      the transferred owner should not bring new type of dependencies.
      As Steven said somewhere: "If there is a deadlock, it was
      there even before."
      
      We could look at it from this side. The possible deadlock would
      look like:
      
      CPU0                            CPU1
      
      console_unlock()
      
        console_owner = current;
      
      				spin_lockA()
      				  printk()
      				    spin = true;
      				    while (...)
      
          call_console_drivers()
            spin_lockA()
      
      This would be a deadlock. CPU0 would wait for the lock A.
      While CPU1 would own the lockA and would wait for CPU0
      to finish calling the console drivers and pass the console_sem
      owner.
      
      But if the above is true than the following scenario was
      already possible before:
      
      CPU0
      
      spin_lockA()
        printk()
          console_unlock()
            call_console_drivers()
      	spin_lockA()
      
      By other words, this deadlock was there even before. Such
      deadlocks are prevented by using printk_deferred() in
      the sections guarded by the lock A.
      
      By Steven Rostedt:
      
      To demonstrate the issue, this module has been shown to lock up a
      system with 4 CPUs and a slow console (like a serial console). It is
      also able to lock up a 8 CPU system with only a fast (VGA) console, by
      passing in "loops=100". The changes in this commit prevent this module
      from locking up the system.
      
       #include <linux/module.h>
       #include <linux/delay.h>
       #include <linux/sched.h>
       #include <linux/mutex.h>
       #include <linux/workqueue.h>
       #include <linux/hrtimer.h>
      
       static bool stop_testing;
       static unsigned int loops = 1;
      
       static void preempt_printk_workfn(struct work_struct *work)
       {
       	int i;
      
       	while (!READ_ONCE(stop_testing)) {
       		for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
       			preempt_disable();
       			pr_emerg("%5d%-75s\n", smp_processor_id(),
       				 " XXX NOPREEMPT");
       			preempt_enable();
       		}
       		msleep(1);
       	}
       }
      
       static struct work_struct __percpu *works;
      
       static void finish(void)
       {
       	int cpu;
      
       	WRITE_ONCE(stop_testing, true);
       	for_each_online_cpu(cpu)
       		flush_work(per_cpu_ptr(works, cpu));
       	free_percpu(works);
       }
      
       static int __init test_init(void)
       {
       	int cpu;
      
       	works = alloc_percpu(struct work_struct);
       	if (!works)
       		return -ENOMEM;
      
       	/*
       	 * This is just a test module. This will break if you
       	 * do any CPU hot plugging between loading and
       	 * unloading the module.
       	 */
      
       	for_each_online_cpu(cpu) {
       		struct work_struct *work = per_cpu_ptr(works, cpu);
      
       		INIT_WORK(work, &preempt_printk_workfn);
       		schedule_work_on(cpu, work);
       	}
      
       	return 0;
       }
      
       static void __exit test_exit(void)
       {
       	finish();
       }
      
       module_param(loops, uint, 0);
       module_init(test_init);
       module_exit(test_exit);
       MODULE_LICENSE("GPL");
      
      Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
      
      
      Cc: akpm@linux-foundation.org
      Cc: linux-mm@kvack.org
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      [pmladek@suse.com: Commit message about possible deadlocks]
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      59423114
    • Sasha Levin's avatar
      Revert "printk: Never set console_may_schedule in console_trylock()" · 62582f67
      Sasha Levin authored
      This reverts commit c9b8d580.
      
      This is just a technical revert to make the printk fix apply cleanly,
      this patch will be re-picked in about 3 commits.
      62582f67
    • Martynas Pumputis's avatar
      bpf: fix check of allowed specifiers in bpf_trace_printk · 9209043b
      Martynas Pumputis authored
      [ Upstream commit 1efb6ee3 ]
      
      A format string consisting of "%p" or "%s" followed by an invalid
      specifier (e.g. "%p%\n" or "%s%") could pass the check which
      would make format_decode (lib/vsprintf.c) to warn.
      
      Fixes: 9c959c86
      
       ("tracing: Allow BPF programs to call bpf_trace_printk()")
      Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
      Signed-off-by: default avatarMartynas Pumputis <m@lambda.lt>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9209043b
  2. 08 Dec, 2018 2 commits
  3. 05 Dec, 2018 8 commits
    • Thomas Gleixner's avatar
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · dae4d590
      Thomas Gleixner authored
      commit 46f7ecb1
      
       upstream
      
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dae4d590
    • Thomas Gleixner's avatar
      x86/speculation: Rework SMT state change · 36a4c5fc
      Thomas Gleixner authored
      commit a74cfffb
      
       upstream
      
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a4c5fc
    • Thomas Gleixner's avatar
      sched/smt: Expose sched_smt_present static key · 0e797117
      Thomas Gleixner authored
      commit 321a874a
      
       upstream
      
      Make the scheduler's 'sched_smt_present' static key globaly available, so
      it can be used in the x86 speculation control code.
      
      Provide a query function and a stub for the CONFIG_SMP=n case.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e797117
    • Peter Zijlstra (Intel)'s avatar
      sched/smt: Make sched_smt_present track topology · 01659361
      Peter Zijlstra (Intel) authored
      commit c5511d03
      
       upstream
      
      Currently the 'sched_smt_present' static key is enabled when at CPU bringup
      SMT topology is observed, but it is never disabled. However there is demand
      to also disable the key when the topology changes such that there is no SMT
      present anymore.
      
      Implement this by making the key count the number of cores that have SMT
      enabled.
      
      In particular, the SMT topology bits are set before interrrupts are enabled
      and similarly, are cleared after interrupts are disabled for the last time
      and the CPU dies.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01659361
    • Jiri Kosina's avatar
      x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · 4741e319
      Jiri Kosina authored
      commit dbfe2953 upstream
      
      Currently, IBPB is only issued in cases when switching into a non-dumpable
      process, the rationale being to protect such 'important and security
      sensitive' processess (such as GPG) from data leaking into a different
      userspace process via spectre v2.
      
      This is however completely insufficient to provide proper userspace-to-userpace
      spectrev2 protection, as any process can poison branch buffers before being
      scheduled out, and the newly scheduled process immediately becomes spectrev2
      victim.
      
      In order to minimize the performance impact (for usecases that do require
      spectrev2 protection), issue the barrier only in cases when switching between
      processess where the victim can't be ptraced by the potential attacker (as in
      such cases, the attacker doesn't have to bother with branch buffers at all).
      
      [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
        PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
        fine-grained ]
      
      Fixes: 18bf3c3e
      
       ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
      Originally-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4741e319
    • Jiri Kosina's avatar
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · 300a6f27
      Jiri Kosina authored
      commit 53c613fe
      
       upstream
      
      STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
      (once enabled) prevents cross-hyperthread control of decisions made by
      indirect branch predictors.
      
      Enable this feature if
      
      - the CPU is vulnerable to spectre v2
      - the CPU supports SMT and has SMT siblings online
      - spectre_v2 mitigation autoselection is enabled (default)
      
      After some previous discussion, this leaves STIBP on all the time, as wrmsr
      on crossing kernel boundary is a no-no. This could perhaps later be a bit
      more optimized (like disabling it in NOHZ, experiment with disabling it in
      idle, etc) if needed.
      
      Note that the synchronization of the mask manipulation via newly added
      spec_ctrl_mutex is currently not strictly needed, as the only updater is
      already being serialized by cpu_add_remove_lock, but let's make this a
      little bit more future-proof.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      300a6f27
    • Peter Zijlstra's avatar
      sched/core: Fix cpu.max vs. cpuhotplug deadlock · e5d981df
      Peter Zijlstra authored
      commit ce48c146
      
       upstream
      
      Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion:
      
        tg_set_cfs_bandwidth()
          get_online_cpus()
            cpus_read_lock()
      
          cfs_bandwidth_usage_inc()
            static_key_slow_inc()
              cpus_read_lock()
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5d981df
    • Alexei Starovoitov's avatar
      bpf: Prevent memory disambiguation attack · 83b570c0
      Alexei Starovoitov authored
      commit af86ca4e
      
       upstream.
      
      Detect code patterns where malicious 'speculative store bypass' can be used
      and sanitize such patterns.
      
       39: (bf) r3 = r10
       40: (07) r3 += -216
       41: (79) r8 = *(u64 *)(r7 +0)   // slow read
       42: (7a) *(u64 *)(r10 -72) = 0  // verifier inserts this instruction
       43: (7b) *(u64 *)(r8 +0) = r3   // this store becomes slow due to r8
       44: (79) r1 = *(u64 *)(r6 +0)   // cpu speculatively executes this load
       45: (71) r2 = *(u8 *)(r1 +0)    // speculatively arbitrary 'load byte'
                                       // is now sanitized
      
      Above code after x86 JIT becomes:
       e5: mov    %rbp,%rdx
       e8: add    $0xffffffffffffff28,%rdx
       ef: mov    0x0(%r13),%r14
       f3: movq   $0x0,-0x48(%rbp)
       fb: mov    %rdx,0x0(%r14)
       ff: mov    0x0(%rbx),%rdi
      103: movzbq 0x0(%rdi),%rsi
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bwh: Backported to 4.14:
       - Add bpf_verifier_env parameter to check_stack_write()
       - Look up stack slot_types with state->stack_slot_type[] rather than
         state->stack[].slot_type[]
       - Drop bpf_verifier_env argument to verbose()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83b570c0
  4. 01 Dec, 2018 3 commits
  5. 27 Nov, 2018 1 commit
    • Valentin Schneider's avatar
      sched/core: Take the hotplug lock in sched_init_smp() · ef8d2a5d
      Valentin Schneider authored
      [ Upstream commit 40fa3780 ]
      
      When running on linux-next (8c60c36d0b8c ("Add linux-next specific files
      for 20181019")) + CONFIG_PROVE_LOCKING=y on a big.LITTLE system (e.g.
      Juno or HiKey960), we get the following report:
      
       [    0.748225] Call trace:
       [    0.750685]  lockdep_assert_cpus_held+0x30/0x40
       [    0.755236]  static_key_enable_cpuslocked+0x20/0xc8
       [    0.760137]  build_sched_domains+0x1034/0x1108
       [    0.764601]  sched_init_domains+0x68/0x90
       [    0.768628]  sched_init_smp+0x30/0x80
       [    0.772309]  kernel_init_freeable+0x278/0x51c
       [    0.776685]  kernel_init+0x10/0x108
       [    0.780190]  ret_from_fork+0x10/0x18
      
      The static_key in question is 'sched_asym_cpucapacity' introduced by
      commit:
      
        df054e84 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
      
      In this particular case, we enable it because smp_prepare_cpus() will
      end up fetching the capacity-dmips-mhz entry from the devicetree,
      so we already have some asymmetry detected when entering sched_init_smp().
      
      This didn't get detected in tip/sched/core because we were missing:
      
        commit cb538267
      
       ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
      
      Calls to build_sched_domains() post sched_init_smp() will hold the
      hotplug lock, it just so happens that this very first call is a
      special case. As stated by a comment in sched_init_smp(), "There's no
      userspace yet to cause hotplug operations" so this is a harmless
      warning.
      
      However, to both respect the semantics of underlying
      callees and make lockdep happy, take the hotplug lock in
      sched_init_smp(). This also satisfies the comment atop
      sched_init_domains() that says "Callers must hold the hotplug lock".
      Reported-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Tested-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar.Eggemann@arm.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: morten.rasmussen@arm.com
      Cc: quentin.perret@arm.com
      Link: http://lkml.kernel.org/r/1540301851-3048-1-git-send-email-valentin.schneider@arm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ef8d2a5d
  6. 23 Nov, 2018 1 commit
  7. 21 Nov, 2018 3 commits
    • Sergey Senozhatsky's avatar
      printk: Never set console_may_schedule in console_trylock() · c9b8d580
      Sergey Senozhatsky authored
      commit fd5f7cde upstream.
      
      This patch, basically, reverts commit 6b97a20d ("printk:
      set may_schedule for some of console_trylock() callers").
      That commit was a mistake, it introduced a big dependency
      on the scheduler, by enabling preemption under console_sem
      in printk()->console_unlock() path, which is rather too
      critical. The patch did not significantly reduce the
      possibilities of printk() lockups, but made it possible to
      stall printk(), as has been reported by Tetsuo Handa [1].
      
      Another issues is that preemption under console_sem also
      messes up with Steven Rostedt's hand off scheme, by making
      it possible to sleep with console_sem both in console_unlock()
      and in vprintk_emit(), after acquiring the console_sem
      ownership (anywhere between printk_safe_exit_irqrestore() in
      console_trylock_spinning() and printk_safe_enter_irqsave()
      in console_unlock()). This makes hand off less likely and,
      at the same time, may result in a significant amount of
      pending logbuf messages. Preempted console_sem owner makes
      it impossible for other CPUs to emit logbuf messages, but
      does not make it impossible for other CPUs to append new
      messages to the logbuf.
      
      Reinstate the old behavior and make printk() non-preemptible.
      Should any printk() lockup reports arrive they must be handled
      in a different way.
      
      [1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
      Fixes: 6b97a20d ("printk: set may_schedule for some of console_trylock() callers")
      Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
      
      
      To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: linux-mm@kvack.org
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9b8d580
    • Christophe Leroy's avatar
      kdb: print real address of pointers instead of hashed addresses · dedde93b
      Christophe Leroy authored
      commit 568fb6f4 upstream.
      
      Since commit ad67b74d ("printk: hash addresses printed with %p"),
      all pointers printed with %p are printed with hashed addresses
      instead of real addresses in order to avoid leaking addresses in
      dmesg and syslog. But this applies to kdb too, with is unfortunate:
      
          Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
          kdb> ps
          15 sleeping system daemon (state M) processes suppressed,
          use 'ps A' to see all.
          Task Addr       Pid   Parent [*] cpu State Thread     Command
          0x(ptrval)      329      328  1    0   R  0x(ptrval) *sh
      
          0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
          0x(ptrval)        3        2  0    0   D  0x(ptrval)  rcu_gp
          0x(ptrval)        4        2  0    0   D  0x(ptrval)  rcu_par_gp
          0x(ptrval)        5        2  0    0   D  0x(ptrval)  kworker/0:0
          0x(ptrval)        6        2  0    0   D  0x(ptrval)  kworker/0:0H
          0x(ptrval)        7        2  0    0   D  0x(ptrval)  kworker/u2:0
          0x(ptrval)        8        2  0    0   D  0x(ptrval)  mm_percpu_wq
          0x(ptrval)       10        2  0    0   D  0x(ptrval)  rcu_preempt
      
      The whole purpose of kdb is to debug, and for debugging real addresses
      need to be known. In addition, data displayed by kdb doesn't go into
      dmesg.
      
      This patch replaces all %p by %px in kdb in order to display real
      addresses.
      
      Fixes: ad67b74d
      
       ("printk: hash addresses printed with %p")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dedde93b
    • Christophe Leroy's avatar
      kdb: use correct pointer when 'btc' calls 'btt' · ce583650
      Christophe Leroy authored
      commit dded2e15 upstream.
      
      On a powerpc 8xx, 'btc' fails as follows:
      
      Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
      kdb> btc
      btc: cpu status: Currently on cpu 0
      Available cpus: 0
      kdb_getarea: Bad address 0x0
      
      when booting the kernel with 'debug_boot_weak_hash', it fails as well
      
      Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
      kdb> btc
      btc: cpu status: Currently on cpu 0
      Available cpus: 0
      kdb_getarea: Bad address 0xba99ad80
      
      On other platforms, Oopses have been observed too, see
      https://github.com/linuxppc/linux/issues/139
      
      This is due to btc calling 'btt' with %p pointer as an argument.
      
      This patch replaces %p by %px to get the real pointer value as
      expected by 'btt'
      
      Fixes: ad67b74d
      
       ("printk: hash addresses printed with %p")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce583650
  8. 13 Nov, 2018 11 commits
  9. 10 Nov, 2018 2 commits
    • Phil Auld's avatar
      sched/fair: Fix throttle_list starvation with low CFS quota · 1220de22
      Phil Auld authored
      commit baa9be4f upstream.
      
      With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
      distribute_cfs_runtime may not empty the throttled_list before it runs
      out of runtime to distribute. In that case, due to the change from
      c06f04c7
      
       to put throttled entries at the head of the list, later entries
      on the list will starve.  Essentially, the same X processes will get pulled
      off the list, given CPU time and then, when expired, get put back on the
      head of the list where distribute_cfs_runtime will give runtime to the same
      set of processes leaving the rest.
      
      Fix the issue by setting a bit in struct cfs_bandwidth when
      distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
      decide to put the throttled entry on the tail or the head of the list.  The
      bit is set/cleared by the callers of distribute_cfs_runtime while they hold
      cfs_bandwidth->lock.
      
      This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
      the live system. In some cases you can simply look at the throttled list and
      see the later entries are not changing:
      
        crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
          1     ffff90b56cb2d200  -976050
          2     ffff90b56cb2cc00  -484925
          3     ffff90b56cb2bc00  -658814
          4     ffff90b56cb2ba00  -275365
          5     ffff90b166a45600  -135138
          6     ffff90b56cb2da00  -282505
          7     ffff90b56cb2e000  -148065
          8     ffff90b56cb2fa00  -872591
          9     ffff90b56cb2c000  -84687
         10     ffff90b56cb2f000  -87237
         11     ffff90b166a40a00  -164582
      
        crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
          1     ffff90b56cb2d200  -994147
          2     ffff90b56cb2cc00  -306051
          3     ffff90b56cb2bc00  -961321
          4     ffff90b56cb2ba00  -24490
          5     ffff90b166a45600  -135138
          6     ffff90b56cb2da00  -282505
          7     ffff90b56cb2e000  -148065
          8     ffff90b56cb2fa00  -872591
          9     ffff90b56cb2c000  -84687
         10     ffff90b56cb2f000  -87237
         11     ffff90b166a40a00  -164582
      
      Sometimes it is easier to see by finding a process getting starved and looking
      at the sched_info:
      
        crash> task ffff8eb765994500 sched_info
        PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
          sched_info = {
            pcount = 8,
            run_delay = 697094208,
            last_arrival = 240260125039,
            last_queued = 240260327513
          },
        crash> task ffff8eb765994500 sched_info
        PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
          sched_info = {
            pcount = 8,
            run_delay = 697094208,
            last_arrival = 240260125039,
            last_queued = 240260327513
          },
      Signed-off-by: default avatarPhil Auld <pauld@redhat.com>
      Reviewed-by: default avatarBen Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
      Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1220de22
    • Daniel Borkmann's avatar
      bpf: fix partial copy of map_ptr when dst is scalar · eb9b195c
      Daniel Borkmann authored
      commit 0962590e upstream.
      
      ALU operations on pointers such as scalar_reg += map_value_ptr are
      handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
      and range in the register state share a union, so transferring state
      through dst_reg->range = ptr_reg->range is just buggy as any new
      map_ptr in the dst_reg is then truncated (or null) for subsequent
      checks. Fix this by adding a raw member and use it for copying state
      over to dst_reg.
      
      Fixes: f1174f77
      
       ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Edward Cree <ecree@solarflare.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eb9b195c
  10. 04 Nov, 2018 4 commits
    • John Fastabend's avatar
      bpf: sockmap, map_release does not hold refcnt for pinned maps · 3c0cff34
      John Fastabend authored
      [ Upstream commit ba6b8de4 ]
      
      Relying on map_release hook to decrement the reference counts when a
      map is removed only works if the map is not being pinned. In the
      pinned case the ref is decremented immediately and the BPF programs
      released. After this BPF programs may not be in-use which is not
      what the user would expect.
      
      This patch moves the release logic into bpf_map_put_uref() and brings
      sockmap in-line with how a similar case is handled in prog array maps.
      
      Fixes: 3d9e9526
      
       ("bpf: sockmap, fix leaking maps with attached but not detached progs")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c0cff34
    • Guenter Roeck's avatar
      locking/ww_mutex: Fix runtime warning in the WW mutex selftest · 45894023
      Guenter Roeck authored
      [ Upstream commit e4a02ed2
      
       ]
      
      If CONFIG_WW_MUTEX_SELFTEST=y is enabled, booting an image
      in an arm64 virtual machine results in the following
      traceback if 8 CPUs are enabled:
      
        DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
        WARNING: CPU: 2 PID: 537 at kernel/locking/mutex.c:1033 __mutex_unlock_slowpath+0x1a8/0x2e0
        ...
        Call trace:
         __mutex_unlock_slowpath()
         ww_mutex_unlock()
         test_cycle_work()
         process_one_work()
         worker_thread()
         kthread()
         ret_from_fork()
      
      If requesting b_mutex fails with -EDEADLK, the error variable
      is reassigned to the return value from calling ww_mutex_lock
      on a_mutex again. If this call fails, a_mutex is not locked.
      It is, however, unconditionally unlocked subsequently, causing
      the reported warning. Fix the problem by using two error variables.
      
      With this change, the selftest still fails as follows:
      
        cyclic deadlock not resolved, ret[7/8] = -35
      
      However, the traceback is gone.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: d1b42b80 ("locking/ww_mutex: Add kselftests for resolving ww_mutex cyclic deadlocks")
      Link: http://lkml.kernel.org/r/1538516929-9734-1-git-send-email-linux@roeck-us.net
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      45894023
    • Jiri Olsa's avatar
      perf/ring_buffer: Prevent concurent ring buffer access · a18e2159
      Jiri Olsa authored
      [ Upstream commit cd6fb677
      
       ]
      
      Some of the scheduling tracepoints allow the perf_tp_event
      code to write to ring buffer under different cpu than the
      code is running on.
      
      This results in corrupted ring buffer data demonstrated in
      following perf commands:
      
        # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging
        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver processes per group
        # 10 groups == 400 processes run
      
             Total time: 0.383 [sec]
        [ perf record: Woken up 8 times to write data ]
        0x42b890 [0]: failed to process type: -1765585640
        [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ]
      
        # perf report --stdio
        0x42b890 [0]: failed to process type: -1765585640
      
      The reason for the corruption are some of the scheduling tracepoints,
      that have __perf_task dfined and thus allow to store data to another
      cpu ring buffer:
      
        sched_waking
        sched_wakeup
        sched_wakeup_new
        sched_stat_wait
        sched_stat_sleep
        sched_stat_iowait
        sched_stat_blocked
      
      The perf_tp_event function first store samples for current cpu
      related events defined for tracepoint:
      
          hlist_for_each_entry_rcu(event, head, hlist_entry)
            perf_swevent_event(event, count, &data, regs);
      
      And then iterates events of the 'task' and store the sample
      for any task's event that passes tracepoint checks:
      
        ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);
      
        list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
          if (event->attr.type != PERF_TYPE_TRACEPOINT)
            continue;
          if (event->attr.config != entry->type)
            continue;
      
          perf_swevent_event(event, count, &data, regs);
        }
      
      Above code can race with same code running on another cpu,
      ending up with 2 cpus trying to store under the same ring
      buffer, which is specifically not allowed.
      
      This patch prevents the problem, by allowing only events with the same
      current cpu to receive the event.
      
      NOTE: this requires the use of (per-task-)per-cpu buffers for this
      feature to work; perf-record does this.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      [peterz: small edits to Changelog]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrew Vagin <avagin@openvz.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: e6dab5ff ("perf/trace: Add ability to set a target task for events")
      Link: http://lkml.kernel.org/r/20180923161343.GB15054@krava
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a18e2159
    • Peter Zijlstra's avatar
      perf/core: Fix perf_pmu_unregister() locking · ffc3cb56
      Peter Zijlstra authored
      [ Upstream commit a9f97721
      
       ]
      
      When we unregister a PMU, we fail to serialize the @pmu_idr properly.
      Fix that by doing the entire thing under pmu_lock.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 2e80a82a
      
       ("perf: Dynamic pmu types")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ffc3cb56
  11. 20 Oct, 2018 1 commit