1. 14 Sep, 2016 4 commits
  2. 10 Sep, 2016 1 commit
    • Dan Williams's avatar
      mm: fix cache mode of dax pmd mappings · 9049771f
      Dan Williams authored
      
      
      track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
      uncacheable rendering them impractical for application usage.  DAX-pte
      mappings are cached and the goal of establishing DAX-pmd mappings is to
      attain more performance, not dramatically less (3 orders of magnitude).
      
      track_pfn_insert() relies on a previous call to reserve_memtype() to
      establish the expected page_cache_mode for the range.  While memremap()
      arranges for reserve_memtype() to be called, devm_memremap_pages() does
      not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
      tracking without a vma, and arrange for devm_memremap_pages() to
      establish the write-back-cache reservation in the memtype tree.
      
      Cc: <stable@vger.kernel.org>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reported-by: default avatarKai Zhang <kai.ka.zhang@oracle.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      9049771f
  3. 07 Sep, 2016 1 commit
    • Mickaël Salaün's avatar
      um/ptrace: Fix the syscall number update after a ptrace · ce29856a
      Mickaël Salaün authored
      Update the syscall number after each PTRACE_SETREGS on ORIG_*AX.
      
      This is needed to get the potentially altered syscall number in the
      seccomp filters after RET_TRACE.
      
      This fix four seccomp_bpf tests:
      > [ RUN      ] TRACE_syscall.skip_after_RET_TRACE
      > seccomp_bpf.c:1560:TRACE_syscall.skip_after_RET_TRACE:Expected -1 (18446744073709551615) == syscall(39) (26)
      > seccomp_bpf.c:1561:TRACE_syscall.skip_after_RET_TRACE:Expected 1 (1) == (*__errno_location ()) (22)
      > [     FAIL ] TRACE_syscall.skip_after_RET_TRACE
      > [ RUN      ] TRACE_syscall.kill_after_RET_TRACE
      > TRACE_syscall.kill_after_RET_TRACE: Test exited normally instead of by signal (code: 1)
      > [     FAIL ] TRACE_syscall.kill_after_RET_TRACE
      > [ RUN      ] TRACE_syscall.skip_after_ptrace
      > seccomp_bpf.c:1622:TRACE_syscall.skip_after_ptrace:Expected -1 (18446744073709551615) == syscall(39) (26)
      > seccomp_bpf.c:1623:TRACE_syscall.skip_after_ptrace:Expected 1 (1) == (*__errno_location ()) (22)
      > [     FAIL ] TRACE_syscall.skip_after_ptrace
      > [ RUN      ] TRACE_syscall.kill_after_ptrace
      > TRACE_syscall.kill_after_ptrace: Test exited normally instead of by signal (code: 1)
      > [     FAIL ] TRACE_syscall.kill_after_ptrace
      
      Fixes: 26703c63
      
       ("um/ptrace: run seccomp after ptrace")
      Signed-off-by: default avatarMickaël Salaün <mic@digikod.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: James Morris <jmorris@namei.org>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      ce29856a
  4. 06 Sep, 2016 1 commit
    • Kees Cook's avatar
      x86/uaccess: force copy_*_user() to be inlined · e6971009
      Kees Cook authored
      
      
      As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
      Without this, the checks for things like __builtin_const_p() won't work
      consistently in either hardened usercopy nor the recent adjustments for
      detecting usercopy overflows at compile time.
      
      The change in kernel text size is detectable, but very small:
      
       text      data     bss     dec      hex     filename
      12118735  5768608 14229504 32116847 1ea106f vmlinux.before
      12120207  5768608 14229504 32118319 1ea162f vmlinux.after
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      e6971009
  5. 02 Sep, 2016 3 commits
    • Emanuel Czirai's avatar
      x86/AMD: Apply erratum 665 on machines without a BIOS fix · d1992996
      Emanuel Czirai authored
      
      
      AMD F12h machines have an erratum which can cause DIV/IDIV to behave
      unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
      there is no BIOS update containing that workaround so let's do it
      ourselves unconditionally. It is simple enough.
      
      [ Borislav: Wrote commit message. ]
      Signed-off-by: default avatarEmanuel Czirai <icanrealizeum@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Yaowu Xu <yaowu@google.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d1992996
    • Steven Rostedt's avatar
      x86/paravirt: Do not trace _paravirt_ident_*() functions · 15301a57
      Steven Rostedt authored
      
      
      Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
      after enabling function tracer. I asked him to bisect the functions within
      available_filter_functions, which he did and it came down to three:
      
        _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()
      
      It was found that this is only an issue when noreplace-paravirt is added
      to the kernel command line.
      
      This means that those functions are most likely called within critical
      sections of the funtion tracer, and must not be traced.
      
      In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
      longer an issue.  But both _paravirt_ident_{32,64}() causes the
      following splat when they are traced:
      
       mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
       mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
       mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
       mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
       NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
       Modules linked in: e1000e
       CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
       RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
       RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
       RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
       RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
       R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
       R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
       FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
       Call Trace:
         _raw_spin_lock+0x27/0x30
         handle_pte_fault+0x13db/0x16b0
         handle_mm_fault+0x312/0x670
         __do_page_fault+0x1b1/0x4e0
         do_page_fault+0x22/0x30
         page_fault+0x28/0x30
         __vfs_read+0x28/0xe0
         vfs_read+0x86/0x130
         SyS_read+0x46/0xa0
         entry_SYSCALL_64_fastpath+0x1e/0xa8
       Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b
      Reported-by: default avatarŁukasz Daniluk <lukasz.daniluk@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15301a57
    • Arnd Bergmann's avatar
      kconfig: tinyconfig: provide whole choice blocks to avoid warnings · 236dec05
      Arnd Bergmann authored
      Using "make tinyconfig" produces a couple of annoying warnings that show
      up for build test machines all the time:
      
          .config:966:warning: override: NOHIGHMEM changes choice state
          .config:965:warning: override: SLOB changes choice state
          .config:963:warning: override: KERNEL_XZ changes choice state
          .config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
          .config:933:warning: override: SLOB changes choice state
          .config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
          .config:870:warning: override: SLOB changes choice state
          .config:868:warning: override: KERNEL_XZ changes choice state
          .config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
      
      I've made a previous attempt at fixing them and we discussed a number of
      alternatives.
      
      I tried changing the Makefile to use "merge_config.sh -n
      $(fragment-list)" but couldn't get that to work properly.
      
      This is yet another approach, based on the observation that we do want
      to see a warning for conflicting 'choice' options, and that we can
      simply make them non-conflicting by listing all other options as
      disabled.  This is a trivial patch that we can apply independent of
      plans for other changes.
      
      Link: http://lkml.kernel.org/r/20160829214952.1334674-2-arnd@arndb.de
      Link: https://storage.kernelci.org/mainline/v4.7-rc6/x86-tinyconfig/build.log
      https://patchwork.kernel.org/patch/9212749/
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      236dec05
  6. 30 Aug, 2016 1 commit
    • Josh Poimboeuf's avatar
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf authored
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      
      1) "copy_from_user() buffer size is too small" compile warning/error
      
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
      
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
      
      2) "copy_from_user() buffer size is not provably correct" compile warning
      
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
      
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
      
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      
      3) "Buffer overflow detected" runtime warning
      
         This is a runtime warning where object size is const, and copy size >
         object size.
      
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
      
        2fb0815c
      
       ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      needed.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d025d27
  7. 27 Aug, 2016 1 commit
  8. 24 Aug, 2016 2 commits
  9. 23 Aug, 2016 1 commit
  10. 18 Aug, 2016 5 commits
    • Peter Feiner's avatar
      kvm: nVMX: fix nested tsc scaling · c95ba92a
      Peter Feiner authored
      
      
      When the host supported TSC scaling, L2 would use a TSC multiplier of
      0, which causes a VM entry failure. Now L2's TSC uses the same
      multiplier as L1.
      Signed-off-by: default avatarPeter Feiner <pfeiner@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c95ba92a
    • Radim Krčmář's avatar
      KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write · dccbfcf5
      Radim Krčmář authored
      If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
      write with vmcs02 as the current VMCS.
      This will incorrectly apply modifications intended for vmcs01 to vmcs02
      and L2 can use it to gain access to L0's x2APIC registers by disabling
      virtualized x2APIC while using msr bitmap that assumes enabled.
      
      Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
      current VMCS.  An alternative solution would temporarily make vmcs01 the
      current VMCS, but it requires more care.
      
      Fixes: 8d14695f
      
       ("x86, apicv: add virtual x2apic support")
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      dccbfcf5
    • Radim Krčmář's avatar
      KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC · d048c098
      Radim Krčmář authored
      msr bitmap can be used to avoid a VM exit (interception) on guest MSR
      accesses.  In some configurations of VMX controls, the guest can even
      directly access host's x2APIC MSRs.  See SDM 29.5 VIRTUALIZING MSR-BASED
      APIC ACCESSES.
      
      L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
      To do so, L1 would first trick KVM to disable all possible interceptions
      by enabling APICv features and then would turn those features off;
      nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
      not intercept previously enabled MSRs even though they were not safe
      with the new configuration.
      
      Correctly re-enabling interceptions is not enough as a second bug would
      still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
      VMCSs, so L1 could trigger a race to get the desired combination of msr
      bitmap and VMX controls.
      
      This fix allocates a msr bitmap for every L1 VCPU, allows only safe
      x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
      have to intercept everything anyway.
      
      Fixes: 3af18d9c
      
       ("KVM: nVMX: Prepare for using hardware MSR bitmap")
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarWincy Van <fanwenyi0529@gmail.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      d048c098
    • Jiri Olsa's avatar
      x86/smp: Fix __max_logical_packages value setup · 7b0501b1
      Jiri Olsa authored
      
      
      Frank reported kernel panic when he disabled several cores in BIOS
      via following option:
      
        Core Disable Bitmap(Hex)   [0]
      
      with number 0xFFE, which leaves 16 CPUs in system (out of 48).
      
      The kernel panic below goes along with following messages:
      
       smpboot: Max logical packages: 2^M
       smpboot: APIC(0) Converting physical 0 to logical package 0^M
       smpboot: APIC(20) Converting physical 1 to logical package 1^M
       smpboot: APIC(40) Package 2 exceeds logical package map^M
       smpboot: CPU 8 APICId 40 disabled^M
       smpboot: APIC(60) Package 3 exceeds logical package map^M
       smpboot: CPU 12 APICId 60 disabled^M
       ...
       general protection fault: 0000 [#1] SMP^M
       Modules linked in:^M
       CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
       Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
       task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M
       RIP: 0010:[<ffffffff81014d54>]  [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
       ...
        [<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
        [<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
        [<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
        [<ffffffff81002190>] do_one_initcall+0x50/0x190^M
        [<ffffffff810ab193>] ? parse_args+0x293/0x480^M
        [<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
        [<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
        [<ffffffff816dc19e>] kernel_init+0xe/0x110^M
        [<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
        [<ffffffff816dc190>] ? rest_init+0x80/0x80^M
      
      The reason for the panic is wrong value of __max_logical_packages,
      which lets logical_package_map uninitialized and the uncore code
      relying on this map being properly initialized (maybe we should
      add some safety checks there as well).
      
      The __max_logical_packages is computed as:
      
        DIV_ROUND_UP(total_cpus, ncpus);
        - ncpus being number of cores
      
      With above BIOS setup we get total_cpus == 16 which set
      __max_logical_packages to 2 (ncpus is 12).
      
      Once topology_update_package_map processes CPU with logical
      pkg over 2 we display above messages and fail to initialize
      the physical_to_logical_pkg map, which makes the uncore code
      crash.
      
      The fix is to remove logical_package_map bitmap completely
      and keep and update the logical_packages number instead.
      
      After we enumerate all the present CPUs, we check if the
      enumerated logical packages count is within its computed
      maximum from BIOS data.
      
      If it's not the case, we set this maximum to the new enumerated
      value and freeze any new addition of logical packages.
      
      The freeze is because lot of init code like uncore/rapl/cqm
      depends on having maximum logical package value set to allocate
      their data, so we can't change it later on.
      
      Prarit Bhargava tested the patch and confirms that it solves
      the problem:
      
        From dmidecode:
                Core Count: 24
                Core Enabled: 24
                Thread Count: 48
      
      Orig kernel boot log:
      
       [    0.464981] smpboot: Max logical packages: 19
       [    0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3
      
      1.  nr_cpus=8, should stop enumerating in package 0:
      
       [    0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.539596] smpboot: Max logical packages: 19
      
      2.  max_cpus=8, should still enumerate all packages:
      
       [    0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
       [    0.550524] smpboot: Max logical packages: 19
      
      3.  nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
          package 2:
      
       [    0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.539368] smpboot: Max logical packages: 19
      
      4.  maxcpus=49, should still enumerate all packages:
      
       [    0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
       [    0.549624] smpboot: Max logical packages: 19
      
      5.  kdump (nr_cpus=1) works as well.
      Reported-by: default avatarFrank Ramsay <framsay@redhat.com>
      Tested-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7b0501b1
    • Borislav Petkov's avatar
      x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y · 88b2f634
      Borislav Petkov authored
      Similar to:
      
        efaad554
      
       ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")
      
      ... fix microcode loading from the initrd on AMD by adding the
      randomization offset to the microcode patch container within the initrd.
      Reported-and-tested-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-tip-commits@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnic
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      88b2f634
  11. 16 Aug, 2016 2 commits
  12. 15 Aug, 2016 1 commit
  13. 12 Aug, 2016 3 commits
    • Kan Liang's avatar
      perf/x86/intel/uncore: Add enable_box for client MSR uncore · 95f3be79
      Kan Liang authored
      
      
      There are bug reports about miscounting uncore counters on some
      client machines like Sandybridge, Broadwell and Skylake. It is
      very likely to be observed on idle systems.
      
      This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
      cleared after Package C7, and nothing will be count.
      The related errata (HSD 158) could be found in:
      
        www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
      
      This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
      in ->enable_box(). The workaround does not cover all cases. It helps for new
      events after returning from C7. But it cannot prevent C7, it will still
      miscount if a counter is already active.
      
      There is no drawback in leaving it enabled, so it does not need
      disable_box() here.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      95f3be79
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix uncore num_counters · 10e9e7bd
      Kan Liang authored
      
      
      Some uncore boxes' num_counters value for Haswell server and
      Broadwell server are not correct (too large, off by one).
      
      This issue was found by comparing the code with the document. Although
      there is no bug report from users yet, accessing non-existent counters
      is dangerous and the behavior is undefined: it may cause miscounting or
      even crashes.
      
      This patch makes them consistent with the uncore document.
      Reported-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10e9e7bd
    • Denys Vlasenko's avatar
      uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions · 68187872
      Denys Vlasenko authored
      
      
      Since instruction decoder now supports EVEX-encoded instructions, two fixes
      are needed to correctly handle them in uprobes.
      
      Extended bits for MODRM.rm field need to be sanitized just like we do it
      for VEX3, to avoid encoding wrong register for register-relative access.
      
      EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
      ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
      paranoid here: proper encoding for register-relative access
      should have EVEX.x = 1.
      
      Secondly, we should fetch vex.vvvv for EVEX too.
      This is now super easy because instruction decoder populates
      vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.1+
      Fixes: 8a764a87 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
      Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68187872
  14. 11 Aug, 2016 10 commits
    • Sebastian Andrzej Siewior's avatar
      x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe() · d52c0569
      Sebastian Andrzej Siewior authored
      
      
      I made a mistake while converting the driver to the hotplug state
      machine and as a result x2apic_cluster_probe() was accessing
      cpus_in_cluster before allocating it.
      
      This patch fixes it by setting the cpumask after the allocation the
      memory succeeded.
      
      While at it, I marked two functions static which are only used within
      this file.
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6b2c2847 ("x86/x2apic: Convert to CPU hotplug state machine")
      Link: http://lkml.kernel.org/r/1470924515-9444-1-git-send-email-bigeasy@linutronix.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d52c0569
    • Alex Thorlton's avatar
      x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case · f72075c9
      Alex Thorlton authored
      
      
      This problem has actually been in the UV code for a while, but we didn't
      catch it until recently, because we had been relying on EFI_OLD_MEMMAP
      to allow our systems to boot for a period of time.  We noticed the issue
      when trying to kexec a recent community kernel, where we hit this NULL
      pointer dereference in efi_sync_low_kernel_mappings():
      
       [    0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880
       [    0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0
      
      The problem doesn't show up with EFI_OLD_MEMMAP because we skip the
      chunk of setup_efi_state() that sets the efi_loader_signature for the
      kexec'd kernel.  When the kexec'd kernel boots, it won't set EFI_BOOT in
      setup_arch, so we completely avoid the bug.
      
      We always kexec with noefi on the command line, so this shouldn't be an
      issue, but since we're not actually checking for efi_runtime_disabled in
      uv_bios_init(), we end up trying to do EFI runtime callbacks when we
      shouldn't be. This patch just adds a check for efi_runtime_disabled in
      uv_bios_init() so that we don't map in uv_systab when runtime_disabled ==
      true.
      Signed-off-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org> # v4.7
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.uk
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f72075c9
    • Andy Lutomirski's avatar
      x86/efi: Allocate a trampoline if needed in efi_free_boot_services() · 5bc653b7
      Andy Lutomirski authored
      
      
      On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot
      Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks
      like:
      
       efi: mem00: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
       efi: mem01: [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
       efi: mem02: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
       efi: mem03: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
       efi: mem04: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
       efi: mem05: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
       efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
       efi: mem07: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
       efi: mem08: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)
      
      My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and
      up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit
      in the loader data range at 0x28000.
      
      Without this patch, it panics due to a failure to allocate the
      trampoline.  With this patch, it works:
      
       [  +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5bc653b7
    • Andy Lutomirski's avatar
      x86/boot: Rework reserve_real_mode() to allow multiple tries · 5ff3e2c3
      Andy Lutomirski authored
      
      
      If reserve_real_mode() fails, panicing immediately means we're
      doomed.  Make it safe to try more than once to allocate the
      trampoline:
      
       - Degrade a failure from panic() to pr_info().  (If we make it to
         setup_real_mode() without reserving the trampoline, we'll panic
         them.)
      
       - Factor out helpers so that platform code can supply a specific
         address to try.
      
       - Warn if reserve_real_mode() is called after we're done with the
         memblock allocator.  If that were to happen, we would behave
         unpredictably.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5ff3e2c3
    • Andy Lutomirski's avatar
      x86/boot: Defer setup_real_mode() to early_initcall time · d0de0f68
      Andy Lutomirski authored
      
      
      There's no need to run setup_real_mode() as early as we run it.
      Defer it to the same early_initcall that sets up the page
      permissions for the real mode code.
      
      This should be a code size reduction.  More importantly, it give us
      a longer window in which we can allocate the real mode trampoline.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d0de0f68
    • Andy Lutomirski's avatar
      x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly · 18bc7bd5
      Andy Lutomirski authored
      
      
      The initialization process for trampoline_cr4_features and
      mmu_cr4_features was confusing.  The intent is for mmu_cr4_features
      and *trampoline_cr4_features to stay in sync, but
      trampoline_cr4_features is NULL until setup_real_mode() runs.  The
      old code synchronized *trampoline_cr4_features *twice*, once in
      setup_real_mode() and once in setup_arch().  It also initialized
      mmu_cr4_features in setup_real_mode(), which causes the actual value
      of mmu_cr4_features to potentially depend on when setup_real_mode()
      is called.
      
      With this patch, mmu_cr4_features is initialized directly in
      setup_arch(), and *trampoline_cr4_features is synchronized to
      mmu_cr4_features when the trampoline is set up.
      
      After this patch, it should be safe to defer setup_real_mode().
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/d48a263f9912389b957dd495a7127b009259ffe0.1470821230.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      18bc7bd5
    • Andy Lutomirski's avatar
      x86/boot: Run reserve_bios_regions() after we initialize the memory map · 007b7560
      Andy Lutomirski authored
      
      
      reserve_bios_regions() is a quirk that reserves memory that we might
      otherwise think is available.  There's no need to run it so early,
      and running it before we have the memory map initialized with its
      non-quirky inputs makes it hard to make reserve_bios_regions() more
      intelligent.
      
      Move it right after we populate the memblock state.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/59f58618911005c799c6c9979ce6ae4881d907c2.1470821230.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      007b7560
    • Aaron Lu's avatar
      x86/irq: Do not substract irq_tlb_count from irq_call_count · 82ba4fac
      Aaron Lu authored
      Since commit:
      
        52aec330 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")
      
      the TLB remote shootdown is done through call function vector. That
      commit didn't take care of irq_tlb_count, which a later commit:
      
        fd0f5869 ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")
      
      ... tried to fix.
      
      The fix assumes every increase of irq_tlb_count has a corresponding
      increase of irq_call_count. So the irq_call_count is always bigger than
      irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
      
      Unfortunately this is not true for the smp_call_function_single() case.
      The IPI is only sent if the target CPU's call_single_queue is empty when
      adding a csd into it in generic_exec_single. That means if two threads
      are both adding flush tlb csds to the same CPU's call_single_queue, only
      one IPI is sent. In other words, the irq_call_count is incremented by 1
      but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
      bigger than irq_call_count and the substract will produce a very large
      irq_call_count value due to overflow.
      
      Considering that:
      
        1) it's not worth to send more IPIs for the sake of accurate counting of
           irq_call_count in generic_exec_single();
      
        2) it's not easy to tell if the call function interrupt is for TLB
           shootdown in __smp_call_function_single_interrupt().
      
      Not to exclude TLB shootdown from call function count seems to be the
      simplest fix and this patch just does that.
      
      This bug was found by LKP's cyclic performance regression tracking recently
      with the vm-scalability test suite. I have bisected to commit:
      
        3dec0ba0
      
       ("mm/rmap: share the i_mmap_rwsem")
      
      This commit didn't do anything wrong but revealed the irq_call_count
      problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
      concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
      then multiple threads could queue flush TLB to the same CPU but only
      one IPI will be sent.
      
      Since the commit was added in Linux v3.19, the counting problem only
      shows up from v3.19 onwards.
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
      Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82ba4fac
    • Dave Hansen's avatar
      x86/mm: Fix swap entry comment and macro · ace7fab7
      Dave Hansen authored
      
      
      A recent patch changed the format of a swap PTE.
      
      The comment explaining the format of the swap PTE is wrong about
      the bits used for the swap type field.  Amusingly, the ASCII art
      and the patch description are correct, but the comment itself
      is wrong.
      
      As I was looking at this, I also noticed that the
      SWP_OFFSET_FIRST_BIT has an off-by-one error.  This does not
      really hurt anything.  It just wasted a bit of space in the PTE,
      giving us 2^59 bytes of addressable space in our swapfiles
      instead of 2^60.  But, it doesn't match with the comments, and it
      wastes a bit of space, so fix it.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Fixes: 00839ee3 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
      Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ace7fab7
    • Nicolas Iooss's avatar
      x86/mm/kaslr: Fix -Wformat-security warning · 62d16b5a
      Nicolas Iooss authored
      
      
      debug_putstr() is used to output strings without using printf-like
      formatting but debug_putstr(v) is defined as early_printk(v) in
      arch/x86/lib/kaslr.c.
      
      This makes clang reports the following warning when building
      with -Wformat-security:
      
          arch/x86/lib/kaslr.c:57:15: warning: format string is not a string
          literal (potentially insecure) [-Wformat-security]
                  debug_putstr(purpose);
                               ^~~~~~~
      
      Fix this by using "%s" in early_printk().
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160806102039.27221-1-nicolas.iooss_linux@m4x.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      62d16b5a
  15. 10 Aug, 2016 4 commits
    • Dave Hansen's avatar
      x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation · b79daf85
      Dave Hansen authored
      The Memory Protection Keys "rights register" (PKRU) is
      XSAVE-managed, and is saved/restored along with the FPU state.
      
      When kernel code accesses FPU regsisters, it does a delicate
      dance with preempt.  Otherwise, the context switching code can
      get confused as to whether the most up-to-date state is in the
      registers themselves or in the XSAVE buffer.
      
      But, PKRU is not a normal FPU register.  Using it does not
      generate the normal device-not-available (#NM) exceptions which
      means we can not manage it lazily, and the kernel completley
      disallows using lazy mode when it is enabled.
      
      The dance with preempt *only* occurs when managing the FPU
      lazily.  Since we never manage PKRU lazily, we do not have to do
      the dance with preempt; we can access it directly.  Doing it
      this way saves a ton of complicated code (and is faster too).
      
      Further, the XSAVES reenabling failed to patch a bit of code
      in fpu__xfeature_set_state() the checked for compacted buffers.
      That check caused fpu__xfeature_set_state() to silently refuse to
      work when the kernel is using compacted XSAVE buffers.  This
      broke execute-only and future pkey_mprotect() support when using
      compact XSAVE buffers.
      
      But, removing fpu__xfeature_set_state() gets rid of this issue,
      in addition to the nice cleanup and speedup.
      
      This fixes the same thing as a fix that Sai posted:
      
        https://lkml.org/lkml/2016/7/25/637
      
      
      
      The fix that he posted is a much more obviously correct, but I
      think we should just do this instead.
      Reported-by: default avatarSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-Cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20160727232040.7D060DAD@viggo.jf.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b79daf85
    • Valdis Kletnieks's avatar
      x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables · 5e44258d
      Valdis Kletnieks authored
      
      
      Building an X86_64 kernel with W=1 throws a total of 9,948 lines of warnings of
      this form for both 32-bit and 64-bit syscall tables. Given that the entire rest
      of the build for my config only generates 8,375 lines of output, this is a big
      reduction in the warnings generated.
      
      The warnings follow this pattern:
      
        ./arch/x86/include/generated/asm/syscalls_32.h:885:21: warning: initialized field overwritten [-Woverride-init]
         __SYSCALL_I386(379, compat_sys_pwritev2, )
                           ^
        arch/x86/entry/syscall_32.c:13:46: note: in definition of macro '__SYSCALL_I386'
         #define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
                                                    ^~~
        ./arch/x86/include/generated/asm/syscalls_32.h:885:21: note: (near initialization for 'ia32_sys_call_table[379]')
         __SYSCALL_I386(379, compat_sys_pwritev2, )
                           ^
        arch/x86/entry/syscall_32.c:13:46: note: in definition of macro '__SYSCALL_I386'
         #define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
      
      Since we intentionally build the syscall tables this way, ignore that one
      warning in the two files.
      Signed-off-by: default avatarValdis Kletnieks <valdis.kletnieks@vt.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/7464.1470021890@turing-police.cc.vt.edu
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5e44258d
    • Mike Travis's avatar
      x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems · 5a52e8f8
      Mike Travis authored
      
      
      The latest UV kernel support panics when RHEL7 kexec's the kdump kernel
      to make a dumpfile.  This patch fixes the problem by turning off all UV
      support if NUMA is off.
      Tested-by: default avatarFrank Ramsay <framsay@sgi.com>
      Tested-by: default avatarJohn Estabrook <estabrook@sgi.com>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Reviewed-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: default avatarNathan Zimmer <nzimmer@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160801184050.577755634@asylum.americas.sgi.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5a52e8f8
    • Mike Travis's avatar
      x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values · 22ac2bca
      Mike Travis authored
      
      
      There are some circumstances where the UV4 BIOS cannot provide the
      correct Proximity Node values to associate with specific Sockets and
      Physical Nodes.  The decision was made to remove these values from BIOS
      and for the kernel to get these values from the standard ACPI tables.
      Tested-by: default avatarFrank Ramsay <framsay@sgi.com>
      Tested-by: default avatarJohn Estabrook <estabrook@sgi.com>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Reviewed-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: default avatarNathan Zimmer <nzimmer@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160801184050.414210079@asylum.americas.sgi.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      22ac2bca