1. 14 Sep, 2016 2 commits
  2. 06 Sep, 2016 1 commit
    • Kees Cook's avatar
      x86/uaccess: force copy_*_user() to be inlined · e6971009
      Kees Cook authored
      As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
      Without this, the checks for things like __builtin_const_p() won't work
      consistently in either hardened usercopy nor the recent adjustments for
      detecting usercopy overflows at compile time.
      The change in kernel text size is detectable, but very small:
       text      data     bss     dec      hex     filename
      12118735  5768608 14229504 32116847 1ea106f vmlinux.before
      12120207  5768608 14229504 32118319 1ea162f vmlinux.after
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
  3. 30 Aug, 2016 1 commit
    • Josh Poimboeuf's avatar
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf authored
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      1) "copy_from_user() buffer size is too small" compile warning/error
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
      2) "copy_from_user() buffer size is not provably correct" compile warning
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      3) "Buffer overflow detected" runtime warning
         This is a runtime warning where object size is const, and copy size >
         object size.
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
       ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 11 Aug, 2016 4 commits
    • Andy Lutomirski's avatar
      x86/boot: Rework reserve_real_mode() to allow multiple tries · 5ff3e2c3
      Andy Lutomirski authored
      If reserve_real_mode() fails, panicing immediately means we're
      doomed.  Make it safe to try more than once to allocate the
       - Degrade a failure from panic() to pr_info().  (If we make it to
         setup_real_mode() without reserving the trampoline, we'll panic
       - Factor out helpers so that platform code can supply a specific
         address to try.
       - Warn if reserve_real_mode() is called after we're done with the
         memblock allocator.  If that were to happen, we would behave
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Andy Lutomirski's avatar
      x86/boot: Defer setup_real_mode() to early_initcall time · d0de0f68
      Andy Lutomirski authored
      There's no need to run setup_real_mode() as early as we run it.
      Defer it to the same early_initcall that sets up the page
      permissions for the real mode code.
      This should be a code size reduction.  More importantly, it give us
      a longer window in which we can allocate the real mode trampoline.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Aaron Lu's avatar
      x86/irq: Do not substract irq_tlb_count from irq_call_count · 82ba4fac
      Aaron Lu authored
      Since commit:
        52aec330 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")
      the TLB remote shootdown is done through call function vector. That
      commit didn't take care of irq_tlb_count, which a later commit:
        fd0f5869 ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")
      ... tried to fix.
      The fix assumes every increase of irq_tlb_count has a corresponding
      increase of irq_call_count. So the irq_call_count is always bigger than
      irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
      Unfortunately this is not true for the smp_call_function_single() case.
      The IPI is only sent if the target CPU's call_single_queue is empty when
      adding a csd into it in generic_exec_single. That means if two threads
      are both adding flush tlb csds to the same CPU's call_single_queue, only
      one IPI is sent. In other words, the irq_call_count is incremented by 1
      but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
      bigger than irq_call_count and the substract will produce a very large
      irq_call_count value due to overflow.
      Considering that:
        1) it's not worth to send more IPIs for the sake of accurate counting of
           irq_call_count in generic_exec_single();
        2) it's not easy to tell if the call function interrupt is for TLB
           shootdown in __smp_call_function_single_interrupt().
      Not to exclude TLB shootdown from call function count seems to be the
      simplest fix and this patch just does that.
      This bug was found by LKP's cyclic performance regression tracking recently
      with the vm-scalability test suite. I have bisected to commit:
       ("mm/rmap: share the i_mmap_rwsem")
      This commit didn't do anything wrong but revealed the irq_call_count
      problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
      concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
      then multiple threads could queue flush TLB to the same CPU but only
      one IPI will be sent.
      Since the commit was added in Linux v3.19, the counting problem only
      shows up from v3.19 onwards.
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
      Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Dave Hansen's avatar
      x86/mm: Fix swap entry comment and macro · ace7fab7
      Dave Hansen authored
      A recent patch changed the format of a swap PTE.
      The comment explaining the format of the swap PTE is wrong about
      the bits used for the swap type field.  Amusingly, the ASCII art
      and the patch description are correct, but the comment itself
      is wrong.
      As I was looking at this, I also noticed that the
      SWP_OFFSET_FIRST_BIT has an off-by-one error.  This does not
      really hurt anything.  It just wasted a bit of space in the PTE,
      giving us 2^59 bytes of addressable space in our swapfiles
      instead of 2^60.  But, it doesn't match with the comments, and it
      wastes a bit of space, so fix it.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Fixes: 00839ee3 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
      Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  5. 10 Aug, 2016 3 commits
    • Mike Travis's avatar
      x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values · 22ac2bca
      Mike Travis authored
      There are some circumstances where the UV4 BIOS cannot provide the
      correct Proximity Node values to associate with specific Sockets and
      Physical Nodes.  The decision was made to remove these values from BIOS
      and for the kernel to get these values from the standard ACPI tables.
      Tested-by: default avatarFrank Ramsay <framsay@sgi.com>
      Tested-by: default avatarJohn Estabrook <estabrook@sgi.com>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Reviewed-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: default avatarNathan Zimmer <nzimmer@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160801184050.414210079@asylum.americas.sgi.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Sebastian Andrzej Siewior's avatar
      x86/mm: Disable preemption during CR3 read+write · 5cf0791d
      Sebastian Andrzej Siewior authored
      There's a subtle preemption race on UP kernels:
      Usually current->mm (and therefore mm->pgd) stays the same during the
      lifetime of a task so it does not matter if a task gets preempted during
      the read and write of the CR3.
      But then, there is this scenario on x86-UP:
      TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:
       -> mmput()
       -> exit_mmap()
       -> tlb_finish_mmu()
       -> tlb_flush_mmu()
       -> tlb_flush_mmu_tlbonly()
       -> tlb_flush()
       -> flush_tlb_mm_range()
       -> __flush_tlb_up()
       -> __flush_tlb()
       ->  __native_flush_tlb()
      At this point current->mm is NULL but current->active_mm still points to
      the "old" mm.
      Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
      own mm so CR3 has changed.
      Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
      mm and so CR3 remains unchanged. Once taskA gets active it continues
      where it was interrupted and that means it writes its old CR3 value
      back. Everything is fine because userland won't need its memory
      Now the fun part:
      Let's preempt taskA one more time and get back to taskB. This
      time switch_mm() won't do a thing because oldmm (->active_mm)
      is the same as mm (as per context_switch()). So we remain
      with a bad CR3 / PGD and return to userland.
      The next thing that happens is handle_mm_fault() with an address for
      the execution of its code in userland. handle_mm_fault() realizes that
      it has a PTE with proper rights so it returns doing nothing. But the
      CPU looks at the wrong PGD and insists that something is wrong and
      faults again. And again. And one more time…
      This pagefault circle continues until the scheduler gets tired of it and
      puts another task on the CPU. It gets little difficult if the task is a
      RT task with a high priority. The system will either freeze or it gets
      fixed by the software watchdog thread which usually runs at RT-max prio.
      But waiting for the watchdog will increase the latency of the RT task
      which is no good.
      Fix this by disabling preemption across the critical code section.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
      [ Prettified the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Nicolai Stange's avatar
      x86/timers/apic: Inform TSC deadline clockevent device about recalibration · 6731b0d6
      Nicolai Stange authored
      This patch eliminates a source of imprecise APIC timer interrupts,
      which imprecision may result in double interrupts or even late
      The TSC deadline clockevent devices' configuration and registration
      happens before the TSC frequency calibration is refined in
      This results in the TSC clocksource and the TSC deadline clockevent
      devices being configured with slightly different frequencies: the former
      gets the refined one and the latter are configured with the inaccurate
      frequency detected earlier by means of the "Fast TSC calibration using PIT".
      Within the APIC code, introduce the notifier function
      lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
      clockevent devices with the current tsc_khz.
      Call it from the TSC code after TSC calibration refinement has happened.
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
      [ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  6. 08 Aug, 2016 2 commits
    • Rafael J. Wysocki's avatar
      x86/power/64: Always create temporary identity mapping correctly · e4630fdd
      Rafael J. Wysocki authored
      The low-level resume-from-hibernation code on x86-64 uses
      kernel_ident_mapping_init() to create the temoprary identity mapping,
      but that function assumes that the offset between kernel virtual
      addresses and physical addresses is aligned on the PGD level.
      However, with a randomized identity mapping base, it may be aligned
      on the PUD level and if that happens, the temporary identity mapping
      created by set_up_temporary_mappings() will not reflect the actual
      kernel identity mapping and the image restoration will fail as a
      result (leading to a kernel panic most of the time).
      To fix this problem, rework kernel_ident_mapping_init() to support
      unaligned offsets between KVA and PA up to the PMD level and make
      set_up_temporary_mappings() use it as approprtiate.
      Reported-and-tested-by: default avatarThomas Garnier <thgarnie@google.com>
      Reported-by: default avatarBorislav Petkov <bp@suse.de>
      Suggested-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
    • Linus Torvalds's avatar
      unsafe_[get|put]_user: change interface to use a error target label · 1bd4403d
      Linus Torvalds authored
      When I initially added the unsafe_[get|put]_user() helpers in commit
       ("Add 'unsafe' user access functions for batched
      accesses"), I made the mistake of modeling the interface on our
      traditional __[get|put]_user() functions, which return zero on success,
      or -EFAULT on failure.
      That interface is fairly easy to use, but it's actually fairly nasty for
      good code generation, since it essentially forces the caller to check
      the error value for each access.
      In particular, since the error handling is already internally
      implemented with an exception handler, and we already use "asm goto" for
      various other things, we could fairly easily make the error cases just
      jump directly to an error label instead, and avoid the need for explicit
      checking after each operation.
      So switch the interface to pass in an error label, rather than checking
      the error value in the caller.  Best do it now before we start growing
      more users (the signal handling code in particular would be a good place
      to use the new interface).
      So rather than
      	if (unsafe_get_user(x, ptr))
      		... handle error ..
      the interface is now
      	unsafe_get_user(x, ptr, label);
      where an error during the user mode fetch will now just cause a jump to
      'label' in the caller.
      Right now the actual _implementation_ of this all still ends up being a
      "if (err) goto label", and does not take advantage of any exception
      label tricks, but for "unsafe_put_user()" in particular it should be
      fairly straightforward to convert to using the exception table model.
      Note that "unsafe_get_user()" is much harder to convert to a clever
      exception table model, because current versions of gcc do not allow the
      use of "asm goto" (for the exception) with output values (for the actual
      value to be fetched).  But that is hopefully not a limitation in the
      long term.
      [ Also note that it might be a good idea to switch unsafe_get_user() to
        actually _return_ the value it fetches from user space, but this
        commit only changes the error handling semantics ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 04 Aug, 2016 3 commits
    • Krzysztof Kozlowski's avatar
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski authored
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      Semantic patches for this change (at least most of them):
          virtual patch
          virtual context
          identifier f, attrs;
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          identifier r.f;
          - NULL
          + 0
          // Options: --all-includes
          virtual patch
          virtual context
          identifier f, attrs;
          type t;
          t f(..., struct dma_attrs *attrs);
          identifier r.f;
          - NULL
          + 0
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Acked-by: default avatarHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Masahiro Yamada's avatar
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada authored
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      I noticed two cases where config_enabled() is used against a tristate
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Paolo Bonzini's avatar
      pvclock: introduce seqcount-like API · 3aed64f6
      Paolo Bonzini authored
      The version field in struct pvclock_vcpu_time_info basically implements
      a seqcount.  Wrap it with the usual read_begin and read_retry functions,
      and use these APIs instead of peppering the code with smp_rmb()s.
      While at it, change it to the more pedantically correct virt_rmb().
      With this change, __pvclock_read_cycles can be simplified noticeably.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  8. 02 Aug, 2016 1 commit
    • Andy Lutomirski's avatar
      signal: consolidate {TS,TLF}_RESTORE_SIGMASK code · 7e781418
      Andy Lutomirski authored
      In general, there's no need for the "restore sigmask" flag to live in
      ti->flags.  alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
      tile, and x86 use essentially identical alternative implementations,
      placing the flag in ti->status.
      Replace those optimized implementations with an equally good common
      implementation that stores it in a bitfield in struct task_struct and
      drop the custom implementations.
      Additional architectures can opt in by removing their
      TIF_RESTORE_SIGMASK defines.
      Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  9. 27 Jul, 2016 2 commits
  10. 26 Jul, 2016 3 commits
  11. 25 Jul, 2016 2 commits
  12. 23 Jul, 2016 2 commits
    • Dan Williams's avatar
      x86/insn: remove pcommit · fd1d961d
      Dan Williams authored
      The pcommit instruction is being deprecated in favor of either ADR
      (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
      posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
      Firmware Interface Table).
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Dan Williams's avatar
      Revert "KVM: x86: add pcommit support" · dfa169bb
      Dan Williams authored
      This reverts commit 8b3e34e4
      Given the deprecation of the pcommit instruction, the relevant VMX
      features and CPUID bits are not going to be rolled into the SDM.  Remove
      their usage from KVM.
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  13. 22 Jul, 2016 1 commit
  14. 21 Jul, 2016 2 commits
    • Adrian Hunter's avatar
      x86/insn: Add AVX-512 support to the instruction decoder · 25af37f4
      Adrian Hunter authored
      Add support for Intel's AVX-512 instructions to the instruction decoder.
      AVX-512 instructions are documented in Intel Architecture Instruction
      Set Extensions Programming Reference (February 2016).
      AVX-512 instructions are identified by a EVEX prefix which, for the
      purpose of instruction decoding, can be treated as though it were a
      4-byte VEX prefix.
      Existing instructions which can now accept an EVEX prefix need not be
      further annotated in the op code map (x86-opcode-map.txt). In the case
      of new instructions, the op code map is updated accordingly.
      Also add associated Mask Instructions that are used to manipulate mask
      registers used in AVX-512 instructions.
      The 'perf tools' instruction decoder is updated in a subsequent patch.
      And a representative set of instructions is added to the perf tools new
      instructions test in a subsequent patch.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: X86 ML <x86@kernel.org>
      Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    • Ingo Molnar's avatar
      x86/boot: Reorganize and clean up the BIOS area reservation code · edce2121
      Ingo Molnar authored
      So the reserve_ebda_region() code has accumulated a number of
      problems over the years that make it really difficult to read
      and understand:
      - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
        interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...
      - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
        parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
        super confusing to read.
      - It does not help at all that we have various memory range markers, half of which
        are 'start of range', half of which are 'end of range' - but this crucial
        property is not obvious in the naming at all ... gave me a headache trying to
        understand all this.
      - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
        obvious, all values here are addresses!), while it does not highlight that it's
        the _start_ of the EBDA region ...
      - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
        that is a pointer to a value, not a memory range address!
      - The function name itself is a misnomer: it says 'reserve_ebda_region()' while
        its main purpose is to reserve all the firmware ROM typically between 640K and
        1MB, while the 'EBDA' part is only a small part of that ...
      - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
        too should be about whether to reserve firmware areas in the paravirt case.
      - In fact thinking about this as 'end of RAM' is confusing: what this function
        *really* wants to reserve is firmware data and code areas! Once the thinking is
        inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
        'reserved area' notion everything becomes a lot clearer.
      To improve all this rewrite the whole code (without changing the logic):
      - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
        and propagate this concept through all the variable names and constants.
      	ebda_start			// was: ebda_addr
      	bios_start			// was: lowmem
      	BIOS_START_MAX			// was: LOWMEM_CAP
      - Then clean up the name of the function itself by renaming it
        to reserve_bios_regions() and renaming the ::ebda_search paravirt
        flag to ::reserve_bios_regions.
      - Fix up all the comments (fix typos), harmonize and simplify their
        formulation and remove comments that become unnecessary due to
        the much better naming all around.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  15. 20 Jul, 2016 1 commit
  16. 15 Jul, 2016 9 commits
  17. 14 Jul, 2016 1 commit
    • Paul Gortmaker's avatar
      x86: Audit and remove any remaining unnecessary uses of module.h · eb008eb6
      Paul Gortmaker authored
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  In the case of
      some of these which are modular, we can extend that to also include
      files that are building basic support functionality but not related
      to loading or registering the final module; such files also have
      no need whatsoever for module.h
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each instance for the
      presence of either and replace as needed.
      In the case of crypto/glue_helper.c we delete a redundant instance
      of MODULE_LICENSE in order to delete module.h -- the license info
      is already present at the top of the file.
      The uncore change warrants a mention too; it is uncore.c that uses
      module.h and not uncore.h; hence the relocation done there.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>